Below is an example of a query that can be used to analyze a query job at the job grain.
Search in the query for “TODO” to find lines where common
changes to the query can be made. Script variables can be modified as
needed. The common table expression jobs_selected can be modified as needed.
Choose an appropriate source view: JOBS_BY_USER, JOBS_BY_PROJECT, JOBS_BY_ORGANIZATION, or an archive of
one of these. Modify the filters as needed. If you have access to a
ASSIGNMENT_CHANGES_BY_PROJECT table,
you can uncomment and comment the appropriate lines in the query
below. Note that the beta version of the jobs views lack certain
fields that were available in the alpha version and exist in the
underlying metadata. In other words, the beta version does not contain
all fields available through a bq --format
prettyjson show -j command.
-- TODO: modify script variables as needed
declare threshold_compute_ms_max int64 default
1 -- minutes
* 60 -- seconds/minute
* 1000 -- milliseconds/second
;
declare threshold_compute_skew float64 default
0.90 -- dimensionless
;
with
jobs_selected as
(
select
cast(j.creation_time as date) as creation_date
,j.project_id
,j.job_id
,j.reservation_id
,j.job_type
,j.statement_type
,j.destination_table
,j.cache_hit
,j.state
,j.error_result.reason as error_reason
,j.error_result.message as error_message
,j.user_email
,j.total_slot_ms
,j.total_bytes_processed
,j.creation_time
,j.start_time
,j.end_time
,j.timeline
,j.job_stages
-- TODO: comment/uncomment lines as needed to choose jobs source
-- from `PROJECT_ID.region-us.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION` j
-- from `PROJECT_ID.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` j
-- from `PROJECT_ID.region-us.INFORMATION_SCHEMA.JOBS_BY_USER` j
-- from `PROJECT_ID.ARCHIVE_DATASET.JOBS` j
-- default to this query's billing project
-- from `region-us.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION` j
-- from `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` j
from `region-us.INFORMATION_SCHEMA.JOBS_BY_USER` j
where
j.job_type = 'QUERY' -- LOAD, QUERY, COPY, or EXTRACT
-- TODO: modify filters as needed
-- JOBS_BY_ tables partitioned on creation_time by day;
-- creation_time is UTC and current_date defaults to UTC
-- and date(j.creation_time, 'America/New_York') = current_date('America/New_York')
and date(j.creation_time) = current_date
-- and j.project_id = 'PROJECT_ID' -- only needed by JOBS_BY_ORGANIZATION
-- and j.job_id = 'JOB_ID'
-- and j.state = 'RUNNING' -- PENDING, RUNNING, or DONE
)
,jobs_1 as
(
select
j.creation_date
,j.project_id
,j.job_id
,j.reservation_id
,j.job_type
,j.statement_type
,j.destination_table
,j.cache_hit
,j.state
,j.error_reason
,j.error_message
,j.user_email
,j.total_slot_ms
,j.total_bytes_processed
,j.creation_time
,j.start_time
,j.end_time
,(
select
struct(
timestamp_millis(
min(js.start_ms)
) as stages_start_time
,timestamp_millis(
max(js.end_ms)
) as stages_end_time
,max(
js.end_ms - js.start_ms
) as max_stages_elapsed_ms
,sum(
js.shuffle_output_bytes
) as shuffle_output_bytes
,sum(
js.shuffle_output_bytes_spilled
) as shuffle_output_bytes_spilled
,countif(
js.status = 'COMPLETE'
) as stages_complete_count
,countif(
js.status = 'RUNNING'
) as stages_running_count
,count(*) as stages_count
)
from unnest(j.job_stages) js
) as stages_statistics
,(
select
struct(
js.name
,js.start_time
,js.end_time
,js.elapsed_ms
)
from (
select
js.name
,timestamp_millis(
js.start_ms
) as start_time
,timestamp_millis(
js.end_ms
) as end_time
,js.end_ms - js.start_ms as elapsed_ms
,row_number() over (
order by
js.end_ms - js.start_ms desc
) as row_number
from unnest(j.job_stages) js
) js
where
js.row_number = 1
) as stages_max_elapsed_ms_statistics
,(
select
case
when
js.compute_skew >= threshold_compute_skew
and js.compute_ms_max >= threshold_compute_ms_max
and js.start_ms is not null -- metrics unreliable if start_ms is null
then
struct(
js.name
,js.compute_ms_avg
,js.compute_ms_max
,js.compute_skew
)
end
from (
select
js.name
,js.compute_ms_avg
,js.compute_ms_max
,1 - safe_divide(
js.compute_ms_avg
,js.compute_ms_max
) as compute_skew
,js.start_ms
,row_number() over (
order by
case
when
1 - safe_divide(
js.compute_ms_avg
,js.compute_ms_max
) >= threshold_compute_skew
and js.compute_ms_max >= threshold_compute_ms_max
and js.start_ms is not null -- metrics unreliable if start_ms is null
then
js.compute_ms_max
end
desc
) as row_number
from unnest(j.job_stages) js
) js
where
js.row_number = 1
) as stages_significant_compute_skew_statistics
,j.job_stages[
safe_ordinal(
array_length(j.job_stages)
)
].records_written as last_records_written
,case
when array_length(j.job_stages) > 0
then
(
select
array_agg(
struct(
js.start_time
,js.end_time
,js.wait_ms_max
)
order by
js.start_time
)
from (
-- merge intervals logic from
-- https://stackoverflow.com/questions/57173813/merge-overlapping-intervals-with-containing-null-values
select
min(js.start_time) as start_time
,nullif(
max(js.end_time)
,j.end_time
) as end_time
,max(js.wait_ms_max) as wait_ms_max
from (
select
js.*
,countif(
js.prev_max_end_time is null
or js.prev_max_end_time < start_time
) over (
order by
js.start_time
) as group_nbr
from (
select
js.* except (end_time)
,coalesce(
js.end_time
,j.end_time
) as end_time
,max(
coalesce(
js.end_time
,j.end_time
)
) over (
order by
js.start_time
rows between unbounded preceding and 1 preceding
) as prev_max_end_time
from (
select
timestamp_millis(
js.start_ms
) as start_time
,timestamp_millis(
js.end_ms
) as end_time
,js.wait_ms_max
from unnest(job_stages) js
) js
) js
) js
group by
js.group_nbr
) js
)
else null
end as stages_intervals
,j.timeline[
safe_ordinal(
array_length(j.timeline)
)
] as last_timeline
from jobs_selected j
)
,jobs_2 as
(
select
j.creation_date
,j.project_id
,j.job_id
,j.reservation_id
,j.job_type
,j.statement_type
,j.destination_table
,j.cache_hit
,j.state
,case
when j.state = 'DONE' then 'DONE'
when j.state = 'PENDING' then 'PENDING'
when
(
j.state = 'RUNNING'
and j.stages_statistics.stages_running_count = 0
and j.stages_statistics.stages_complete_count = 0
)
then 'UNSTAGED'
when
(
j.state = 'RUNNING'
and j.stages_statistics.stages_running_count = 0
and j.stages_statistics.stages_complete_count > 0
)
then 'INACTIVE'
when
(
j.state = 'RUNNING'
and j.stages_statistics.stages_running_count > 0
)
then 'ACTIVE'
end as enhanced_state
,j.error_reason
,j.error_message
,j.user_email
,j.stages_statistics.stages_running_count
,j.stages_statistics.stages_complete_count
,j.stages_statistics.stages_count
,j.stages_statistics.max_stages_elapsed_ms
,j.stages_max_elapsed_ms_statistics.name as max_elapsed_ms_name
,j.stages_max_elapsed_ms_statistics.start_time as max_elapsed_ms_start_time
,j.stages_max_elapsed_ms_statistics.end_time as max_elapsed_ms_end_time
,j.stages_max_elapsed_ms_statistics.elapsed_ms as max_elapsed_ms
,(
select
max(si.wait_ms_max)
from unnest(j.stages_intervals) si
) as max_wait_ms_max
,(
select
sum(si.wait_ms_max)
from unnest(j.stages_intervals) si
) as sum_wait_ms_max
,j.stages_significant_compute_skew_statistics.name as max_significant_compute_name
,j.stages_significant_compute_skew_statistics.compute_ms_avg as max_significant_compute_ms_avg
,j.stages_significant_compute_skew_statistics.compute_ms_max as max_significant_compute_ms_max
,j.stages_significant_compute_skew_statistics.compute_skew as max_significant_compute_skew
,j.last_timeline.pending_units as timeline_pending_units
,j.last_timeline.active_units as timeline_active_units
,j.last_timeline.completed_units as timeline_completed_units
,j.last_timeline.total_slot_ms as timeline_total_slot_ms
,j.total_slot_ms
,j.total_bytes_processed
,j.stages_statistics.shuffle_output_bytes
,j.stages_statistics.shuffle_output_bytes_spilled
,j.last_records_written
,j.creation_time
,j.start_time
,j.stages_statistics.stages_start_time
,j.stages_statistics.stages_end_time
,timestamp_add(
j.start_time
,interval
j.last_timeline.elapsed_ms
millisecond
) as timeline_end_time
,j.end_time
,timestamp_diff(
j.end_time
,j.creation_time
,millisecond
) as creation_to_end_ms
,timestamp_diff(
j.end_time
,j.start_time
,millisecond
) as start_to_end_ms
,timestamp_diff(
case
when j.start_time is not null
then j.start_time
when j.state = 'PENDING'
then current_timestamp
end
,j.creation_time
,millisecond
) as pending_ms
,timestamp_diff(
case
when j.stages_statistics.stages_start_time is not null
then j.stages_statistics.stages_start_time
when j.state = 'RUNNING'
then current_timestamp
end
,j.start_time
,millisecond
) as unstaged_ms
,timestamp_diff(
j.stages_statistics.stages_end_time
,j.stages_statistics.stages_start_time
,millisecond
) as staged_ms
,timestamp_diff(
j.end_time
,j.stages_statistics.stages_end_time
,millisecond
) as ending_ms
,(
select
sum(
timestamp_diff(
si.end_time
,si.start_time
,millisecond
)
)
from unnest(j.stages_intervals) si
) as staged_active_ms
from jobs_1 j
)
-- TODO: uncomment lines if have access to ASSIGNMENT_CHANGES_BY_PROJECT
-- ,assignments_history_stage as
-- (
-- select
-- ac.change_timestamp as start_time
-- ,coalesce(
-- timestamp_add(
-- lead(ac.change_timestamp) over (
-- partition by
-- ac.project_id
-- ,ac.assignment_id
-- order by
-- ac.change_timestamp
-- )
-- ,interval -1 microsecond
-- )
-- ,'9999-12-31 23:59:59.999999'
-- ) as end_time
-- ,ac.action
-- ,ac.project_id
-- ,ac.reservation_name
-- ,ac.assignee_type
-- ,ac.assignee_id
-- -- TODO: modify source as needed
-- from `ADMIN_PROJECT.region-us.INFORMATION_SCHEMA.ASSIGNMENT_CHANGES_BY_PROJECT` ac
-- )
-- ,assignments_history as
-- (
-- select
-- ahs.* except(
-- action
-- )
-- from assignments_history_stage ahs
-- where
-- ahs.action <> 'DELETE'
-- )
select
j.creation_date
,j.project_id
,j.job_id
-- TODO: comment line and uncomment lines if have access to ASSIGNMENT_CHANGES_BY_PROJECT
,j.reservation_id
-- ,coalesce(
-- j.reservation_id
-- ,concat(
-- ahp.project_id
-- ,':US.'
-- ,ahp.reservation_name
-- )
-- /* need to add logic for handling folder assignments... */
-- ,concat(
-- aho.project_id
-- ,':US.'
-- ,aho.reservation_name
-- )
-- ) as reservation_id
,j.job_type
,j.statement_type
,j.destination_table.project_id as destination_project_id
,j.destination_table.dataset_id as destination_dataset_id
,j.destination_table.table_id as destination_table_id
,j.cache_hit
,j.state
,j.enhanced_state
,j.error_reason
,j.error_message
,j.user_email
,j.stages_running_count
,j.stages_complete_count
,j.stages_count
,j.max_elapsed_ms_name
,j.max_elapsed_ms_start_time
,j.max_elapsed_ms_end_time
,j.max_elapsed_ms
,j.max_wait_ms_max
,j.sum_wait_ms_max
,j.max_significant_compute_name
,j.max_significant_compute_ms_avg
,j.max_significant_compute_ms_max
,j.max_significant_compute_skew
,j.timeline_pending_units
,j.timeline_active_units
,j.timeline_completed_units
,j.timeline_total_slot_ms
,j.total_slot_ms
,j.total_bytes_processed
,j.shuffle_output_bytes
,j.shuffle_output_bytes_spilled
,j.last_records_written
-- with this format, value can be treated as datetime in Excel;
-- format in Excel as yyyy-mm-dd hh:mm:ss.000
,format_timestamp(
'%Y-%m-%d %H:%M:%E*S'
,j.creation_time
) as creation_time
,format_timestamp(
'%Y-%m-%d %H:%M:%E*S'
,j.start_time
) as start_time
,format_timestamp(
'%Y-%m-%d %H:%M:%E*S'
,j.stages_start_time
) as stages_start_time
,format_timestamp(
'%Y-%m-%d %H:%M:%E*S'
,j.stages_end_time
) as stages_end_time
,format_timestamp(
'%Y-%m-%d %H:%M:%E*S'
,j.timeline_end_time
) as timeline_end_time
,format_timestamp(
'%Y-%m-%d %H:%M:%E*S'
,j.end_time
) as end_time
,j.creation_to_end_ms
,j.start_to_end_ms
,j.pending_ms
,j.unstaged_ms
,j.staged_ms
,j.staged_active_ms
,case
when j.staged_ms is not null
then j.staged_ms - j.staged_active_ms
when
j.enhanced_state = 'INACTIVE'
and current_timestamp > j.timeline_end_time
then
timestamp_diff(
current_timestamp
,j.timeline_end_time
,millisecond
)
end as staged_inactive_ms
,j.ending_ms
,j.creation_to_end_ms
- j.staged_active_ms
as delay_ms
from jobs_2 j
-- TODO: uncomment lines if have access to ASSIGNMENT_CHANGES_BY_PROJECT
-- left join assignments_history ahp
-- on
-- ahp.assignee_id = j.project_id
-- and j.creation_time between ahp.start_time and ahp.end_time
-- and ahp.assignee_type = 'PROJECT'
-- left join assignments_history aho
-- on
-- j.creation_time between aho.start_time and aho.end_time
-- and aho.assignee_type = 'ORGANIZATION'
order by
j.creation_time
,j.project_id
,j.job_id
;Fields
Cast creation_time as date in UTC time zone.
Job project_id.
Job job_id.
null on the job information if one or
more of the conditions below is true:
'COPY''SCRIPT' (parent query job)
true'PENDING' or
'RUNNING'
See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobstatistics.
Job job_type.
Job statement_type.
Destionation table project ID.
Destionation table dataset ID.
Destionation table table ID.
Job cache_hit.
Job state:
'PENDING'
Interval between job creation_time and job start_time.
'RUNNING'
Interval between job start_time and job end_time.
'DONE'
After job end_time.
Computed using job field state and job_stages subfields. Values:
'PENDING'
Job state is
'PENDING'; interval
between job creation_time and
job start_time.
'UNSTAGED'
Job state is
'RUNNING' and no job
stages recorded; interval between job start_time and
earliest job_stages.start_ms.
'ACTIVE'
Job state is
'RUNNING' and one or
more job stages have status
'RUNNING'; interval
between earliest job_stages.start_ms
and last job_stages.end_ms
when at least one stage has status
'RUNNING'.
'INACTIVE'
Job state is
'RUNNING' and one or
more job stages have status
'COMPLETE' and no job
stages have status
'RUNNING'; interval
between earliest job_stages.start_ms
and last job_stages.end_ms
when no stage has status
'RUNNING'.
'ENDING'
Not certain if possible to identify this state until after the query job is done; interval between last job_stages.end_ms and job end_time.
'DONE'
Job state is
'DONE'; after job
end_time.
Job error_result.reason.
Job error_result.message.
Job user_email.
Count of stages with status set to
'RUNNING'.
Count of stages with status set to
'COMPLETE'.
Count of stages.
Name of the stage in which the maximum stages elapsed time
(end_ms - start_ms) occurs.
Start time of the stage in which the maximum stages elapsed
time (end_ms - start_ms) occurs.
End time of the stage in which the maximum stages elapsed
time (end_ms - start_ms) occurs.
Maximum stages elapsed time (end_ms -
start_ms).
Maximum stages wait_ms_max.
Merge job stages into “island” intervals. Find maximum wait_ms_max for each “interval”. Sum these maximums over all intervals.
Name of the stage in which the maximum compute_ms_max occurs for
“significant” stages --- compute_ms_max
>= threshold_compute_ms_max and
compute_skew >= threshold_compute_skew.
compute_ms_avg for the
stage in which the maximum compute_ms_max occurs for
“significant” stages --- compute_ms_max
>= threshold_compute_ms_max and
compute_skew >= threshold_compute_skew.
compute_ms_max for the
stage in which the maximum compute_ms_max occurs for
“significant” stages --- compute_ms_max
>= threshold_compute_ms_max and
compute_skew >= threshold_compute_skew.
compute_skew (1 -
compute_ms_avg / compute_ms_max) for the stage in
which the maximum compute_ms_max occurs for
“significant” stages --- compute_ms_max
>= threshold_compute_ms_max and
compute_skew >= threshold_compute_skew.
Last timeline element pending_units.
Last timeline element active_units.
Last timeline element completed_units.
Last timeline element total_slot_ms.
Job total_slot_ms. For
a parent job, the value will be the sum of the values of
the child jobs. For any job, null if
one or more of the conditions below is true:
'COPY'true'PENDING''CREATE_FUNCTION'
/ any
'CREATE_PROCEDURE'
/ any
'CREATE_TABLE'
/ any
'CREATE_TABLE_AS_SELECT'
/ 'SKIP''CREATE_VIEW'
/ any
'DROP_TABLE'
/ any
'DROP_VIEW'
/ any
Job total_bytes_processed.
Sum of job stages shuffle_output_bytes.
Sum of job stages shuffle_output_bytes_spilled.
Last job stage records_written.
Job creation_time.
Job start_time.
Minimum stages start_ms
converted to timestamp. start_ms for every stage appears
null until the job is done.
Maximum stages end_ms
converted to timestamp. end_ms for every stage appears
null until the job is done.
Last moment of time implied by the last timeline element.
Job end_time.
Elapsed time between job creation_time and job end_time.
Elapsed time between job start_time and job end_time.
Elapsed time in a 'PENDING' enhanced_state; between job
creation_time and job
start_time.
Elapsed time in an 'UNSTAGED' enhanced_state; between job
start_time and stages_start_time.
Elapsed time in a 'STAGED' enhanced_state; between job
stages_start_time and
stages_end_time.
Elapsed time in a 'STAGED' enhanced_state during which one or
more stages were running. Merge job stages into
“island” intervals. Sum elapsed time in each
interval over all intervals.
Elapsed time in a 'STAGED' enhanced_state during which no
stages were running. staged_ms -
staged_active_ms.
Elapsed time in an 'ENDING' enhanced_state; between stages_end_time and end_time.
Difference between creation_to_end_ms and staged_active_ms.