6.2. Query Jobs

Below is an example of a query that can be used to analyze a query job at the job grain.

Search in the query for TODO to find lines where common changes to the query can be made. Script variables can be modified as needed. The common table expression jobs_selected can be modified as needed. Choose an appropriate source view: JOBS_BY_USER, JOBS_BY_PROJECT, JOBS_BY_ORGANIZATION, or an archive of one of these. Modify the filters as needed. If you have access to a ASSIGNMENT_CHANGES_BY_PROJECT table, you can uncomment and comment the appropriate lines in the query below. Note that the beta version of the jobs views lack certain fields that were available in the alpha version and exist in the underlying metadata. In other words, the beta version does not contain all fields available through a bq --format prettyjson show -j command.

-- TODO: modify script variables as needed
declare threshold_compute_ms_max int64 default
    1      -- minutes
    * 60   -- seconds/minute
    * 1000 -- milliseconds/second
;
declare threshold_compute_skew float64 default
    0.90 -- dimensionless
;

with
jobs_selected as
(
    select
        cast(j.creation_time as date) as creation_date
        ,j.project_id
        ,j.job_id
        ,j.reservation_id
        ,j.job_type
        ,j.statement_type
        ,j.destination_table
        ,j.cache_hit
        ,j.state
        ,j.error_result.reason as error_reason
        ,j.error_result.message as error_message
        ,j.user_email
        ,j.total_slot_ms
        ,j.total_bytes_processed
        ,j.creation_time
        ,j.start_time
        ,j.end_time
        ,j.timeline
        ,j.job_stages
-- TODO: comment/uncomment lines as needed to choose jobs source
--     from `PROJECT_ID.region-us.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION` j
--     from `PROJECT_ID.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` j
--     from `PROJECT_ID.region-us.INFORMATION_SCHEMA.JOBS_BY_USER` j
--     from `PROJECT_ID.ARCHIVE_DATASET.JOBS` j
    -- default to this query's billing project
--     from `region-us.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION` j
--     from `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` j
    from `region-us.INFORMATION_SCHEMA.JOBS_BY_USER` j
    where
        j.job_type = 'QUERY' -- LOAD, QUERY, COPY, or EXTRACT
-- TODO: modify filters as needed
        -- JOBS_BY_ tables partitioned on creation_time by day;
        -- creation_time is UTC and current_date defaults to UTC
--         and date(j.creation_time, 'America/New_York') = current_date('America/New_York')
        and date(j.creation_time) = current_date
--         and j.project_id = 'PROJECT_ID' -- only needed by JOBS_BY_ORGANIZATION
--         and j.job_id = 'JOB_ID'
--         and j.state = 'RUNNING' -- PENDING, RUNNING, or DONE
)
,jobs_1 as
(
    select
        j.creation_date
        ,j.project_id
        ,j.job_id
        ,j.reservation_id
        ,j.job_type
        ,j.statement_type
        ,j.destination_table
        ,j.cache_hit
        ,j.state
        ,j.error_reason
        ,j.error_message
        ,j.user_email
        ,j.total_slot_ms
        ,j.total_bytes_processed
        ,j.creation_time
        ,j.start_time
        ,j.end_time
        ,(
            select
                struct(
                    timestamp_millis(
                        min(js.start_ms)
                    ) as stages_start_time
                    ,timestamp_millis(
                        max(js.end_ms)
                    ) as stages_end_time
                    ,max(
                        js.end_ms - js.start_ms
                    ) as max_stages_elapsed_ms
                    ,sum(
                        js.shuffle_output_bytes
                    ) as shuffle_output_bytes
                    ,sum(
                        js.shuffle_output_bytes_spilled
                    ) as shuffle_output_bytes_spilled
                    ,countif(
                        js.status = 'COMPLETE'
                    ) as stages_complete_count
                    ,countif(
                        js.status = 'RUNNING'
                    ) as stages_running_count
                    ,count(*) as stages_count
                )
            from unnest(j.job_stages) js
        ) as stages_statistics
        ,(
            select
                struct(
                    js.name
                    ,js.start_time
                    ,js.end_time
                    ,js.elapsed_ms
                )
            from (
                select
                    js.name
                    ,timestamp_millis(
                        js.start_ms
                    ) as start_time
                    ,timestamp_millis(
                        js.end_ms
                    ) as end_time
                    ,js.end_ms - js.start_ms as elapsed_ms
                    ,row_number() over (
                        order by
                            js.end_ms - js.start_ms desc
                    ) as row_number
                from unnest(j.job_stages) js
            ) js
            where
                js.row_number = 1
        ) as stages_max_elapsed_ms_statistics
        ,(
            select
                case
                    when
                        js.compute_skew >= threshold_compute_skew
                        and js.compute_ms_max >= threshold_compute_ms_max
                        and js.start_ms is not null -- metrics unreliable if start_ms is null
                        then
                        struct(
                            js.name
                            ,js.compute_ms_avg
                            ,js.compute_ms_max
                            ,js.compute_skew
                        )
                end
            from (
                select
                    js.name
                    ,js.compute_ms_avg
                    ,js.compute_ms_max
                    ,1 - safe_divide(
                        js.compute_ms_avg
                        ,js.compute_ms_max
                    ) as compute_skew
                    ,js.start_ms
                    ,row_number() over (
                        order by
                            case
                                when
                                    1 - safe_divide(
                                        js.compute_ms_avg
                                        ,js.compute_ms_max
                                    ) >= threshold_compute_skew
                                    and js.compute_ms_max >= threshold_compute_ms_max
                                    and js.start_ms is not null -- metrics unreliable if start_ms is null
                                    then
                                    js.compute_ms_max
                            end
                            desc
                    ) as row_number
                from unnest(j.job_stages) js
            ) js
            where
                js.row_number = 1
        ) as stages_significant_compute_skew_statistics
        ,j.job_stages[
            safe_ordinal(
                array_length(j.job_stages)
            )
        ].records_written as last_records_written
        ,case
            when array_length(j.job_stages) > 0
                then
                    (
                        select
                            array_agg(
                                struct(
                                    js.start_time
                                    ,js.end_time
                                    ,js.wait_ms_max
                                )
                                order by
                                    js.start_time
                            )
                        from (
                            -- merge intervals logic from
                            -- https://stackoverflow.com/questions/57173813/merge-overlapping-intervals-with-containing-null-values
                            select
                                min(js.start_time) as start_time
                                ,nullif(
                                    max(js.end_time)
                                    ,j.end_time
                                ) as end_time
                                ,max(js.wait_ms_max) as wait_ms_max
                            from (
                                select
                                    js.*
                                    ,countif(
                                        js.prev_max_end_time is null
                                        or js.prev_max_end_time < start_time
                                    ) over (
                                        order by
                                            js.start_time
                                    ) as group_nbr
                                from (
                                    select
                                        js.* except (end_time)
                                        ,coalesce(
                                            js.end_time
                                            ,j.end_time
                                        ) as end_time
                                        ,max(
                                            coalesce(
                                                js.end_time
                                                ,j.end_time
                                            )
                                        ) over (
                                            order by
                                                js.start_time
                                            rows between unbounded preceding and 1 preceding
                                        ) as prev_max_end_time
                                    from (
                                        select
                                            timestamp_millis(
                                                js.start_ms
                                            ) as start_time
                                            ,timestamp_millis(
                                                js.end_ms
                                            ) as end_time
                                            ,js.wait_ms_max
                                        from unnest(job_stages) js
                                    ) js
                                ) js
                            ) js
                            group by
                                js.group_nbr
                        ) js
                    )
            else null
        end as stages_intervals
        ,j.timeline[
            safe_ordinal(
                array_length(j.timeline)
            )
        ] as last_timeline
    from jobs_selected j
)
,jobs_2 as
(
    select
        j.creation_date
        ,j.project_id
        ,j.job_id
        ,j.reservation_id
        ,j.job_type
        ,j.statement_type
        ,j.destination_table
        ,j.cache_hit
        ,j.state
        ,case
            when j.state = 'DONE' then 'DONE'
            when j.state = 'PENDING' then 'PENDING'
            when
                (
                    j.state = 'RUNNING'
                    and j.stages_statistics.stages_running_count = 0
                    and j.stages_statistics.stages_complete_count = 0
                )
                then 'UNSTAGED'
            when
                (
                    j.state = 'RUNNING'
                    and j.stages_statistics.stages_running_count = 0
                    and j.stages_statistics.stages_complete_count > 0
                )
                then 'INACTIVE'
            when
                (
                    j.state = 'RUNNING'
                    and j.stages_statistics.stages_running_count > 0
                )
                then 'ACTIVE'
        end as enhanced_state
        ,j.error_reason
        ,j.error_message
        ,j.user_email
        ,j.stages_statistics.stages_running_count
        ,j.stages_statistics.stages_complete_count
        ,j.stages_statistics.stages_count
        ,j.stages_statistics.max_stages_elapsed_ms
        ,j.stages_max_elapsed_ms_statistics.name as max_elapsed_ms_name
        ,j.stages_max_elapsed_ms_statistics.start_time as max_elapsed_ms_start_time
        ,j.stages_max_elapsed_ms_statistics.end_time as max_elapsed_ms_end_time
        ,j.stages_max_elapsed_ms_statistics.elapsed_ms as max_elapsed_ms
        ,(
            select
                max(si.wait_ms_max)
            from unnest(j.stages_intervals) si
        ) as max_wait_ms_max
        ,(
            select
                sum(si.wait_ms_max)
            from unnest(j.stages_intervals) si
        ) as sum_wait_ms_max
        ,j.stages_significant_compute_skew_statistics.name as max_significant_compute_name
        ,j.stages_significant_compute_skew_statistics.compute_ms_avg as max_significant_compute_ms_avg
        ,j.stages_significant_compute_skew_statistics.compute_ms_max as max_significant_compute_ms_max
        ,j.stages_significant_compute_skew_statistics.compute_skew as max_significant_compute_skew
        ,j.last_timeline.pending_units as timeline_pending_units
        ,j.last_timeline.active_units as timeline_active_units
        ,j.last_timeline.completed_units as timeline_completed_units
        ,j.last_timeline.total_slot_ms as timeline_total_slot_ms
        ,j.total_slot_ms
        ,j.total_bytes_processed
        ,j.stages_statistics.shuffle_output_bytes
        ,j.stages_statistics.shuffle_output_bytes_spilled
        ,j.last_records_written
        ,j.creation_time
        ,j.start_time
        ,j.stages_statistics.stages_start_time
        ,j.stages_statistics.stages_end_time
        ,timestamp_add(
            j.start_time
            ,interval
                j.last_timeline.elapsed_ms
                millisecond
        ) as timeline_end_time
        ,j.end_time
        ,timestamp_diff(
            j.end_time
            ,j.creation_time
            ,millisecond
        ) as creation_to_end_ms
        ,timestamp_diff(
            j.end_time
            ,j.start_time
            ,millisecond
        ) as start_to_end_ms
        ,timestamp_diff(
            case
                when j.start_time is not null
                    then j.start_time
                when j.state = 'PENDING'
                    then current_timestamp
            end
            ,j.creation_time
            ,millisecond
        ) as pending_ms
        ,timestamp_diff(
            case
                when j.stages_statistics.stages_start_time is not null
                    then j.stages_statistics.stages_start_time
                when j.state = 'RUNNING'
                    then current_timestamp
            end
            ,j.start_time
            ,millisecond
        ) as unstaged_ms
        ,timestamp_diff(
            j.stages_statistics.stages_end_time
            ,j.stages_statistics.stages_start_time
            ,millisecond
        ) as staged_ms
        ,timestamp_diff(
            j.end_time
            ,j.stages_statistics.stages_end_time
            ,millisecond
        ) as ending_ms
        ,(
            select
                sum(
                    timestamp_diff(
                        si.end_time
                        ,si.start_time
                        ,millisecond
                    )
                )
            from unnest(j.stages_intervals) si
        ) as staged_active_ms
    from jobs_1 j
)
-- TODO: uncomment lines if have access to ASSIGNMENT_CHANGES_BY_PROJECT
-- ,assignments_history_stage as
-- (
--     select
--         ac.change_timestamp as start_time
--         ,coalesce(
--             timestamp_add(
--                 lead(ac.change_timestamp) over (
--                     partition by
--                         ac.project_id
--                         ,ac.assignment_id
--                     order by
--                         ac.change_timestamp
--                 )
--                 ,interval -1 microsecond
--             )
--             ,'9999-12-31 23:59:59.999999'
--         ) as end_time
--         ,ac.action
--         ,ac.project_id
--         ,ac.reservation_name
--         ,ac.assignee_type
--         ,ac.assignee_id
--     -- TODO: modify source as needed
--     from `ADMIN_PROJECT.region-us.INFORMATION_SCHEMA.ASSIGNMENT_CHANGES_BY_PROJECT` ac
-- )
-- ,assignments_history as
-- (
--     select
--         ahs.* except(
--             action
--         )
--     from assignments_history_stage ahs
--     where
--         ahs.action <> 'DELETE'
-- )
select
    j.creation_date
    ,j.project_id
    ,j.job_id
-- TODO: comment line and uncomment lines if have access to ASSIGNMENT_CHANGES_BY_PROJECT
    ,j.reservation_id
--     ,coalesce(
--         j.reservation_id
--         ,concat(
--             ahp.project_id
--             ,':US.'
--             ,ahp.reservation_name
--         )
--         /* need to add logic for handling folder assignments... */
--         ,concat(
--             aho.project_id
--             ,':US.'
--             ,aho.reservation_name
--         )
--     ) as reservation_id
    ,j.job_type
    ,j.statement_type
    ,j.destination_table.project_id as destination_project_id
    ,j.destination_table.dataset_id as destination_dataset_id
    ,j.destination_table.table_id as destination_table_id
    ,j.cache_hit
    ,j.state
    ,j.enhanced_state
    ,j.error_reason
    ,j.error_message
    ,j.user_email
    ,j.stages_running_count
    ,j.stages_complete_count
    ,j.stages_count
    ,j.max_elapsed_ms_name
    ,j.max_elapsed_ms_start_time
    ,j.max_elapsed_ms_end_time
    ,j.max_elapsed_ms
    ,j.max_wait_ms_max
    ,j.sum_wait_ms_max
    ,j.max_significant_compute_name
    ,j.max_significant_compute_ms_avg
    ,j.max_significant_compute_ms_max
    ,j.max_significant_compute_skew
    ,j.timeline_pending_units
    ,j.timeline_active_units
    ,j.timeline_completed_units
    ,j.timeline_total_slot_ms
    ,j.total_slot_ms
    ,j.total_bytes_processed
    ,j.shuffle_output_bytes
    ,j.shuffle_output_bytes_spilled
    ,j.last_records_written
    -- with this format, value can be treated as datetime in Excel;
    -- format in Excel as yyyy-mm-dd hh:mm:ss.000
    ,format_timestamp(
        '%Y-%m-%d %H:%M:%E*S'
        ,j.creation_time
    ) as creation_time
    ,format_timestamp(
        '%Y-%m-%d %H:%M:%E*S'
        ,j.start_time
    ) as start_time
    ,format_timestamp(
        '%Y-%m-%d %H:%M:%E*S'
        ,j.stages_start_time
    ) as stages_start_time
    ,format_timestamp(
        '%Y-%m-%d %H:%M:%E*S'
        ,j.stages_end_time
    ) as stages_end_time
    ,format_timestamp(
        '%Y-%m-%d %H:%M:%E*S'
        ,j.timeline_end_time
    ) as timeline_end_time
    ,format_timestamp(
        '%Y-%m-%d %H:%M:%E*S'
        ,j.end_time
    ) as end_time
    ,j.creation_to_end_ms
    ,j.start_to_end_ms
    ,j.pending_ms
    ,j.unstaged_ms
    ,j.staged_ms
    ,j.staged_active_ms
    ,case
        when j.staged_ms is not null
            then j.staged_ms - j.staged_active_ms
        when
            j.enhanced_state = 'INACTIVE'
            and current_timestamp > j.timeline_end_time
            then
                timestamp_diff(
                    current_timestamp
                    ,j.timeline_end_time
                    ,millisecond
                )
    end as staged_inactive_ms
    ,j.ending_ms
    ,j.creation_to_end_ms
        - j.staged_active_ms
        as delay_ms
from jobs_2 j
-- TODO: uncomment lines if have access to ASSIGNMENT_CHANGES_BY_PROJECT
-- left join assignments_history ahp
-- on
--     ahp.assignee_id = j.project_id
--     and j.creation_time between ahp.start_time and ahp.end_time
--     and ahp.assignee_type = 'PROJECT'
-- left join assignments_history aho
-- on
--     j.creation_time between aho.start_time and aho.end_time
--     and aho.assignee_type = 'ORGANIZATION'
order by
    j.creation_time
    ,j.project_id
    ,j.job_id
;

Fields

creation_date

Cast creation_time as date in UTC time zone.

project_id

Job project_id.

job_id

Job job_id.

reservation_id

null on the job information if one or more of the conditions below is true:

  • job_type is 'COPY'
  • statement_type is 'SCRIPT' (parent query job)
  • cache_hit is true
  • state is 'PENDING' or 'RUNNING'
  • certain error conditions occurred

See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobstatistics.

job_type

Job job_type.

statement_type

Job statement_type.

destination_project_id

Destionation table project ID.

destination_dataset_id

Destionation table dataset ID.

destination_table_id

Destionation table table ID.

cache_hit

Job cache_hit.

state

Job state:

'PENDING'

Interval between job creation_time and job start_time.

'RUNNING'

Interval between job start_time and job end_time.

'DONE'

After job end_time.

enhanced_state

Computed using job field state and job_stages subfields. Values:

'PENDING'

Job state is 'PENDING'; interval between job creation_time and job start_time.

'UNSTAGED'

Job state is 'RUNNING' and no job stages recorded; interval between job start_time and earliest job_stages.start_ms.

'ACTIVE'

Job state is 'RUNNING' and one or more job stages have status 'RUNNING'; interval between earliest job_stages.start_ms and last job_stages.end_ms when at least one stage has status 'RUNNING'.

'INACTIVE'

Job state is 'RUNNING' and one or more job stages have status 'COMPLETE' and no job stages have status 'RUNNING'; interval between earliest job_stages.start_ms and last job_stages.end_ms when no stage has status 'RUNNING'.

'ENDING'

Not certain if possible to identify this state until after the query job is done; interval between last job_stages.end_ms and job end_time.

'DONE'

Job state is 'DONE'; after job end_time.

error_reason

Job error_result.reason.

error_message

Job error_result.message.

user_email

Job user_email.

stages_running_count

Count of stages with status set to 'RUNNING'.

stages_complete_count

Count of stages with status set to 'COMPLETE'.

stages_count

Count of stages.

max_elapsed_ms_name

Name of the stage in which the maximum stages elapsed time (end_ms - start_ms) occurs.

max_elapsed_ms_start_time

Start time of the stage in which the maximum stages elapsed time (end_ms - start_ms) occurs.

max_elapsed_ms_end_time

End time of the stage in which the maximum stages elapsed time (end_ms - start_ms) occurs.

max_elapsed_ms

Maximum stages elapsed time (end_ms - start_ms).

max_wait_ms_max

Maximum stages wait_ms_max.

sum_wait_ms_max

Merge job stages into island intervals. Find maximum wait_ms_max for each interval. Sum these maximums over all intervals.

max_significant_compute_name

Name of the stage in which the maximum compute_ms_max occurs for significant stages --- compute_ms_max >= threshold_compute_ms_max and compute_skew >= threshold_compute_skew.

max_significant_compute_ms_avg

compute_ms_avg for the stage in which the maximum compute_ms_max occurs for significant stages --- compute_ms_max >= threshold_compute_ms_max and compute_skew >= threshold_compute_skew.

max_significant_compute_ms_max

compute_ms_max for the stage in which the maximum compute_ms_max occurs for significant stages --- compute_ms_max >= threshold_compute_ms_max and compute_skew >= threshold_compute_skew.

max_significant_compute_skew

compute_skew (1 - compute_ms_avg / compute_ms_max) for the stage in which the maximum compute_ms_max occurs for significant stages --- compute_ms_max >= threshold_compute_ms_max and compute_skew >= threshold_compute_skew.

timeline_pending_units

Last timeline element pending_units.

timeline_active_units

Last timeline element active_units.

timeline_completed_units

Last timeline element completed_units.

timeline_total_slot_ms

Last timeline element total_slot_ms.

total_slot_ms

Job total_slot_ms. For a parent job, the value will be the sum of the values of the child jobs. For any job, null if one or more of the conditions below is true:

  • job_type is 'COPY'
  • cache_hit is true
  • state is 'PENDING'
  • statement_type and ddl_operation_performed are:
    • 'CREATE_FUNCTION' / any
    • 'CREATE_PROCEDURE' / any
    • 'CREATE_TABLE' / any
    • 'CREATE_TABLE_AS_SELECT' / 'SKIP'
    • 'CREATE_VIEW' / any
    • 'DROP_TABLE' / any
    • 'DROP_VIEW' / any
  • certain error conditions occurred

total_bytes_processed

Job total_bytes_processed.

shuffle_output_bytes

Sum of job stages shuffle_output_bytes.

shuffle_output_bytes_spilled

Sum of job stages shuffle_output_bytes_spilled.

last_records_written

Last job stage records_written.

creation_time

Job creation_time.

start_time

Job start_time.

stages_start_time

Minimum stages start_ms converted to timestamp. start_ms for every stage appears null until the job is done.

stages_end_time

Maximum stages end_ms converted to timestamp. end_ms for every stage appears null until the job is done.

timeline_end_time

Last moment of time implied by the last timeline element.

end_time

Job end_time.

creation_to_end_ms

Elapsed time between job creation_time and job end_time.

start_to_end_ms

Elapsed time between job start_time and job end_time.

pending_ms

Elapsed time in a 'PENDING' enhanced_state; between job creation_time and job start_time.

unstaged_ms

Elapsed time in an 'UNSTAGED' enhanced_state; between job start_time and stages_start_time.

staged_ms

Elapsed time in a 'STAGED' enhanced_state; between job stages_start_time and stages_end_time.

staged_active_ms

Elapsed time in a 'STAGED' enhanced_state during which one or more stages were running. Merge job stages into island intervals. Sum elapsed time in each interval over all intervals.

staged_inactive_ms

Elapsed time in a 'STAGED' enhanced_state during which no stages were running. staged_ms - staged_active_ms.

ending_ms

Elapsed time in an 'ENDING' enhanced_state; between stages_end_time and end_time.

delay_ms

Difference between creation_to_end_ms and staged_active_ms.