Below is an example of a query that can be used to analyze a query job at the job grain.
Search in the query for “TODO” to find lines where common
changes to the query can be made. Script variables can be modified as
needed. The common table expression jobs_selected can be modified as needed.
Choose an appropriate source view: JOBS_BY_USER, JOBS_BY_PROJECT, JOBS_BY_ORGANIZATION, or an archive of
one of these. Modify the filters as needed. If you have access to a
ASSIGNMENT_CHANGES_BY_PROJECT table,
you can uncomment and comment the appropriate lines in the query
below. Note that the beta version of the jobs views lack certain
fields that were available in the alpha version and exist in the
underlying metadata. In other words, the beta version does not contain
all fields available through a bq --format
prettyjson show -j
command.
-- TODO: modify script variables as needed declare threshold_compute_ms_max int64 default 1 -- minutes * 60 -- seconds/minute * 1000 -- milliseconds/second ; declare threshold_compute_skew float64 default 0.90 -- dimensionless ; with jobs_selected as ( select cast(j.creation_time as date) as creation_date ,j.project_id ,j.job_id ,j.reservation_id ,j.job_type ,j.statement_type ,j.destination_table ,j.cache_hit ,j.state ,j.error_result.reason as error_reason ,j.error_result.message as error_message ,j.user_email ,j.total_slot_ms ,j.total_bytes_processed ,j.creation_time ,j.start_time ,j.end_time ,j.timeline ,j.job_stages -- TODO: comment/uncomment lines as needed to choose jobs source -- from `PROJECT_ID.region-us.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION` j -- from `PROJECT_ID.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` j -- from `PROJECT_ID.region-us.INFORMATION_SCHEMA.JOBS_BY_USER` j -- from `PROJECT_ID.ARCHIVE_DATASET.JOBS` j -- default to this query's billing project -- from `region-us.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION` j -- from `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` j from `region-us.INFORMATION_SCHEMA.JOBS_BY_USER` j where j.job_type = 'QUERY' -- LOAD, QUERY, COPY, or EXTRACT -- TODO: modify filters as needed -- JOBS_BY_ tables partitioned on creation_time by day; -- creation_time is UTC and current_date defaults to UTC -- and date(j.creation_time, 'America/New_York') = current_date('America/New_York') and date(j.creation_time) = current_date -- and j.project_id = 'PROJECT_ID' -- only needed by JOBS_BY_ORGANIZATION -- and j.job_id = 'JOB_ID' -- and j.state = 'RUNNING' -- PENDING, RUNNING, or DONE ) ,jobs_1 as ( select j.creation_date ,j.project_id ,j.job_id ,j.reservation_id ,j.job_type ,j.statement_type ,j.destination_table ,j.cache_hit ,j.state ,j.error_reason ,j.error_message ,j.user_email ,j.total_slot_ms ,j.total_bytes_processed ,j.creation_time ,j.start_time ,j.end_time ,( select struct( timestamp_millis( min(js.start_ms) ) as stages_start_time ,timestamp_millis( max(js.end_ms) ) as stages_end_time ,max( js.end_ms - js.start_ms ) as max_stages_elapsed_ms ,sum( js.shuffle_output_bytes ) as shuffle_output_bytes ,sum( js.shuffle_output_bytes_spilled ) as shuffle_output_bytes_spilled ,countif( js.status = 'COMPLETE' ) as stages_complete_count ,countif( js.status = 'RUNNING' ) as stages_running_count ,count(*) as stages_count ) from unnest(j.job_stages) js ) as stages_statistics ,( select struct( js.name ,js.start_time ,js.end_time ,js.elapsed_ms ) from ( select js.name ,timestamp_millis( js.start_ms ) as start_time ,timestamp_millis( js.end_ms ) as end_time ,js.end_ms - js.start_ms as elapsed_ms ,row_number() over ( order by js.end_ms - js.start_ms desc ) as row_number from unnest(j.job_stages) js ) js where js.row_number = 1 ) as stages_max_elapsed_ms_statistics ,( select case when js.compute_skew >= threshold_compute_skew and js.compute_ms_max >= threshold_compute_ms_max and js.start_ms is not null -- metrics unreliable if start_ms is null then struct( js.name ,js.compute_ms_avg ,js.compute_ms_max ,js.compute_skew ) end from ( select js.name ,js.compute_ms_avg ,js.compute_ms_max ,1 - safe_divide( js.compute_ms_avg ,js.compute_ms_max ) as compute_skew ,js.start_ms ,row_number() over ( order by case when 1 - safe_divide( js.compute_ms_avg ,js.compute_ms_max ) >= threshold_compute_skew and js.compute_ms_max >= threshold_compute_ms_max and js.start_ms is not null -- metrics unreliable if start_ms is null then js.compute_ms_max end desc ) as row_number from unnest(j.job_stages) js ) js where js.row_number = 1 ) as stages_significant_compute_skew_statistics ,j.job_stages[ safe_ordinal( array_length(j.job_stages) ) ].records_written as last_records_written ,case when array_length(j.job_stages) > 0 then ( select array_agg( struct( js.start_time ,js.end_time ,js.wait_ms_max ) order by js.start_time ) from ( -- merge intervals logic from -- https://stackoverflow.com/questions/57173813/merge-overlapping-intervals-with-containing-null-values select min(js.start_time) as start_time ,nullif( max(js.end_time) ,j.end_time ) as end_time ,max(js.wait_ms_max) as wait_ms_max from ( select js.* ,countif( js.prev_max_end_time is null or js.prev_max_end_time < start_time ) over ( order by js.start_time ) as group_nbr from ( select js.* except (end_time) ,coalesce( js.end_time ,j.end_time ) as end_time ,max( coalesce( js.end_time ,j.end_time ) ) over ( order by js.start_time rows between unbounded preceding and 1 preceding ) as prev_max_end_time from ( select timestamp_millis( js.start_ms ) as start_time ,timestamp_millis( js.end_ms ) as end_time ,js.wait_ms_max from unnest(job_stages) js ) js ) js ) js group by js.group_nbr ) js ) else null end as stages_intervals ,j.timeline[ safe_ordinal( array_length(j.timeline) ) ] as last_timeline from jobs_selected j ) ,jobs_2 as ( select j.creation_date ,j.project_id ,j.job_id ,j.reservation_id ,j.job_type ,j.statement_type ,j.destination_table ,j.cache_hit ,j.state ,case when j.state = 'DONE' then 'DONE' when j.state = 'PENDING' then 'PENDING' when ( j.state = 'RUNNING' and j.stages_statistics.stages_running_count = 0 and j.stages_statistics.stages_complete_count = 0 ) then 'UNSTAGED' when ( j.state = 'RUNNING' and j.stages_statistics.stages_running_count = 0 and j.stages_statistics.stages_complete_count > 0 ) then 'INACTIVE' when ( j.state = 'RUNNING' and j.stages_statistics.stages_running_count > 0 ) then 'ACTIVE' end as enhanced_state ,j.error_reason ,j.error_message ,j.user_email ,j.stages_statistics.stages_running_count ,j.stages_statistics.stages_complete_count ,j.stages_statistics.stages_count ,j.stages_statistics.max_stages_elapsed_ms ,j.stages_max_elapsed_ms_statistics.name as max_elapsed_ms_name ,j.stages_max_elapsed_ms_statistics.start_time as max_elapsed_ms_start_time ,j.stages_max_elapsed_ms_statistics.end_time as max_elapsed_ms_end_time ,j.stages_max_elapsed_ms_statistics.elapsed_ms as max_elapsed_ms ,( select max(si.wait_ms_max) from unnest(j.stages_intervals) si ) as max_wait_ms_max ,( select sum(si.wait_ms_max) from unnest(j.stages_intervals) si ) as sum_wait_ms_max ,j.stages_significant_compute_skew_statistics.name as max_significant_compute_name ,j.stages_significant_compute_skew_statistics.compute_ms_avg as max_significant_compute_ms_avg ,j.stages_significant_compute_skew_statistics.compute_ms_max as max_significant_compute_ms_max ,j.stages_significant_compute_skew_statistics.compute_skew as max_significant_compute_skew ,j.last_timeline.pending_units as timeline_pending_units ,j.last_timeline.active_units as timeline_active_units ,j.last_timeline.completed_units as timeline_completed_units ,j.last_timeline.total_slot_ms as timeline_total_slot_ms ,j.total_slot_ms ,j.total_bytes_processed ,j.stages_statistics.shuffle_output_bytes ,j.stages_statistics.shuffle_output_bytes_spilled ,j.last_records_written ,j.creation_time ,j.start_time ,j.stages_statistics.stages_start_time ,j.stages_statistics.stages_end_time ,timestamp_add( j.start_time ,interval j.last_timeline.elapsed_ms millisecond ) as timeline_end_time ,j.end_time ,timestamp_diff( j.end_time ,j.creation_time ,millisecond ) as creation_to_end_ms ,timestamp_diff( j.end_time ,j.start_time ,millisecond ) as start_to_end_ms ,timestamp_diff( case when j.start_time is not null then j.start_time when j.state = 'PENDING' then current_timestamp end ,j.creation_time ,millisecond ) as pending_ms ,timestamp_diff( case when j.stages_statistics.stages_start_time is not null then j.stages_statistics.stages_start_time when j.state = 'RUNNING' then current_timestamp end ,j.start_time ,millisecond ) as unstaged_ms ,timestamp_diff( j.stages_statistics.stages_end_time ,j.stages_statistics.stages_start_time ,millisecond ) as staged_ms ,timestamp_diff( j.end_time ,j.stages_statistics.stages_end_time ,millisecond ) as ending_ms ,( select sum( timestamp_diff( si.end_time ,si.start_time ,millisecond ) ) from unnest(j.stages_intervals) si ) as staged_active_ms from jobs_1 j ) -- TODO: uncomment lines if have access to ASSIGNMENT_CHANGES_BY_PROJECT -- ,assignments_history_stage as -- ( -- select -- ac.change_timestamp as start_time -- ,coalesce( -- timestamp_add( -- lead(ac.change_timestamp) over ( -- partition by -- ac.project_id -- ,ac.assignment_id -- order by -- ac.change_timestamp -- ) -- ,interval -1 microsecond -- ) -- ,'9999-12-31 23:59:59.999999' -- ) as end_time -- ,ac.action -- ,ac.project_id -- ,ac.reservation_name -- ,ac.assignee_type -- ,ac.assignee_id -- -- TODO: modify source as needed -- from `ADMIN_PROJECT.region-us.INFORMATION_SCHEMA.ASSIGNMENT_CHANGES_BY_PROJECT` ac -- ) -- ,assignments_history as -- ( -- select -- ahs.* except( -- action -- ) -- from assignments_history_stage ahs -- where -- ahs.action <> 'DELETE' -- ) select j.creation_date ,j.project_id ,j.job_id -- TODO: comment line and uncomment lines if have access to ASSIGNMENT_CHANGES_BY_PROJECT ,j.reservation_id -- ,coalesce( -- j.reservation_id -- ,concat( -- ahp.project_id -- ,':US.' -- ,ahp.reservation_name -- ) -- /* need to add logic for handling folder assignments... */ -- ,concat( -- aho.project_id -- ,':US.' -- ,aho.reservation_name -- ) -- ) as reservation_id ,j.job_type ,j.statement_type ,j.destination_table.project_id as destination_project_id ,j.destination_table.dataset_id as destination_dataset_id ,j.destination_table.table_id as destination_table_id ,j.cache_hit ,j.state ,j.enhanced_state ,j.error_reason ,j.error_message ,j.user_email ,j.stages_running_count ,j.stages_complete_count ,j.stages_count ,j.max_elapsed_ms_name ,j.max_elapsed_ms_start_time ,j.max_elapsed_ms_end_time ,j.max_elapsed_ms ,j.max_wait_ms_max ,j.sum_wait_ms_max ,j.max_significant_compute_name ,j.max_significant_compute_ms_avg ,j.max_significant_compute_ms_max ,j.max_significant_compute_skew ,j.timeline_pending_units ,j.timeline_active_units ,j.timeline_completed_units ,j.timeline_total_slot_ms ,j.total_slot_ms ,j.total_bytes_processed ,j.shuffle_output_bytes ,j.shuffle_output_bytes_spilled ,j.last_records_written -- with this format, value can be treated as datetime in Excel; -- format in Excel as yyyy-mm-dd hh:mm:ss.000 ,format_timestamp( '%Y-%m-%d %H:%M:%E*S' ,j.creation_time ) as creation_time ,format_timestamp( '%Y-%m-%d %H:%M:%E*S' ,j.start_time ) as start_time ,format_timestamp( '%Y-%m-%d %H:%M:%E*S' ,j.stages_start_time ) as stages_start_time ,format_timestamp( '%Y-%m-%d %H:%M:%E*S' ,j.stages_end_time ) as stages_end_time ,format_timestamp( '%Y-%m-%d %H:%M:%E*S' ,j.timeline_end_time ) as timeline_end_time ,format_timestamp( '%Y-%m-%d %H:%M:%E*S' ,j.end_time ) as end_time ,j.creation_to_end_ms ,j.start_to_end_ms ,j.pending_ms ,j.unstaged_ms ,j.staged_ms ,j.staged_active_ms ,case when j.staged_ms is not null then j.staged_ms - j.staged_active_ms when j.enhanced_state = 'INACTIVE' and current_timestamp > j.timeline_end_time then timestamp_diff( current_timestamp ,j.timeline_end_time ,millisecond ) end as staged_inactive_ms ,j.ending_ms ,j.creation_to_end_ms - j.staged_active_ms as delay_ms from jobs_2 j -- TODO: uncomment lines if have access to ASSIGNMENT_CHANGES_BY_PROJECT -- left join assignments_history ahp -- on -- ahp.assignee_id = j.project_id -- and j.creation_time between ahp.start_time and ahp.end_time -- and ahp.assignee_type = 'PROJECT' -- left join assignments_history aho -- on -- j.creation_time between aho.start_time and aho.end_time -- and aho.assignee_type = 'ORGANIZATION' order by j.creation_time ,j.project_id ,j.job_id ;
Fields
Cast creation_time as date in UTC time zone.
Job project_id.
Job job_id.
null
on the job information if one or
more of the conditions below is true:
'COPY'
'SCRIPT'
(parent query job)
true
'PENDING'
or
'RUNNING'
See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobstatistics.
Job job_type.
Job statement_type.
Destionation table project ID.
Destionation table dataset ID.
Destionation table table ID.
Job cache_hit.
Job state:
'PENDING'
Interval between job creation_time and job start_time.
'RUNNING'
Interval between job start_time and job end_time.
'DONE'
After job end_time.
Computed using job field state and job_stages subfields. Values:
'PENDING'
Job state is
'PENDING'
; interval
between job creation_time and
job start_time.
'UNSTAGED'
Job state is
'RUNNING'
and no job
stages recorded; interval between job start_time and
earliest job_stages.start_ms.
'ACTIVE'
Job state is
'RUNNING'
and one or
more job stages have status
'RUNNING'
; interval
between earliest job_stages.start_ms
and last job_stages.end_ms
when at least one stage has status
'RUNNING'
.
'INACTIVE'
Job state is
'RUNNING'
and one or
more job stages have status
'COMPLETE'
and no job
stages have status
'RUNNING'
; interval
between earliest job_stages.start_ms
and last job_stages.end_ms
when no stage has status
'RUNNING'
.
'ENDING'
Not certain if possible to identify this state until after the query job is done; interval between last job_stages.end_ms and job end_time.
'DONE'
Job state is
'DONE'
; after job
end_time.
Job error_result.reason.
Job error_result.message.
Job user_email.
Count of stages with status set to
'RUNNING'
.
Count of stages with status set to
'COMPLETE'
.
Count of stages.
Name of the stage in which the maximum stages elapsed time
(end_ms - start_ms
) occurs.
Start time of the stage in which the maximum stages elapsed
time (end_ms - start_ms
) occurs.
End time of the stage in which the maximum stages elapsed
time (end_ms - start_ms
) occurs.
Maximum stages elapsed time (end_ms -
start_ms
).
Maximum stages wait_ms_max.
Merge job stages into “island” intervals. Find maximum wait_ms_max for each “interval”. Sum these maximums over all intervals.
Name of the stage in which the maximum compute_ms_max occurs for
“significant” stages --- compute_ms_max
>= threshold_compute_ms_max
and
compute_skew >= threshold_compute_skew
.
compute_ms_avg for the
stage in which the maximum compute_ms_max occurs for
“significant” stages --- compute_ms_max
>= threshold_compute_ms_max
and
compute_skew >= threshold_compute_skew
.
compute_ms_max for the
stage in which the maximum compute_ms_max occurs for
“significant” stages --- compute_ms_max
>= threshold_compute_ms_max
and
compute_skew >= threshold_compute_skew
.
compute_skew (1 -
compute_ms_avg / compute_ms_max
) for the stage in
which the maximum compute_ms_max occurs for
“significant” stages --- compute_ms_max
>= threshold_compute_ms_max
and
compute_skew >= threshold_compute_skew
.
Last timeline element pending_units.
Last timeline element active_units.
Last timeline element completed_units.
Last timeline element total_slot_ms.
Job total_slot_ms. For
a parent job, the value will be the sum of the values of
the child jobs. For any job, null
if
one or more of the conditions below is true:
'COPY'
true
'PENDING'
'CREATE_FUNCTION'
/ any
'CREATE_PROCEDURE'
/ any
'CREATE_TABLE'
/ any
'CREATE_TABLE_AS_SELECT'
/ 'SKIP'
'CREATE_VIEW'
/ any
'DROP_TABLE'
/ any
'DROP_VIEW'
/ any
Job total_bytes_processed.
Sum of job stages shuffle_output_bytes.
Sum of job stages shuffle_output_bytes_spilled.
Last job stage records_written.
Job creation_time.
Job start_time.
Minimum stages start_ms
converted to timestamp. start_ms for every stage appears
null
until the job is done.
Maximum stages end_ms
converted to timestamp. end_ms for every stage appears
null
until the job is done.
Last moment of time implied by the last timeline element.
Job end_time.
Elapsed time between job creation_time and job end_time.
Elapsed time between job start_time and job end_time.
Elapsed time in a 'PENDING'
enhanced_state; between job
creation_time and job
start_time.
Elapsed time in an 'UNSTAGED'
enhanced_state; between job
start_time and stages_start_time.
Elapsed time in a 'STAGED'
enhanced_state; between job
stages_start_time and
stages_end_time.
Elapsed time in a 'STAGED'
enhanced_state during which one or
more stages were running. Merge job stages into
“island” intervals. Sum elapsed time in each
interval over all intervals.
Elapsed time in a 'STAGED'
enhanced_state during which no
stages were running. staged_ms -
staged_active_ms
.
Elapsed time in an 'ENDING'
enhanced_state; between stages_end_time and end_time.
Difference between creation_to_end_ms and staged_active_ms.