6.4. Query Jobs Delay Categories

6.4. Query Jobs Delay Categories
Prev	6. Queries	Next
Below is an example of a query to find the number of jobs delayed within certain delay categories within given time buckets. For example, it finds the jobs in flight during a given hour (e.g., 9 AM to 10 AM). It computes the amount of delay experienced by each job within that hour. And it counts the number of jobs experiencing delay within certain delay categories. So, you can say that N_30-60 jobs were delayed between 30 to 60 minutes, N_15-30 jobs were delayed between 15 and 30 minutes, etc.
The script variables can be modified as needed. The common table expression jobs_selected can be modified as needed. Choose an appropriate source view: JOBS_BY_USER, JOBS_BY_PROJECT, JOBS_BY_ORGANIZATION, or an archive of one of these. Modify the filters as needed. If you have access to a project_reservation_mapping table, you can uncomment and comment the appropriate lines in the query below.
declare selected_date date default current_date;
declare selected_days int64 default 1;
declare selected_bucket_minutes int64 default 60;

with
jobs_selected as
(
    select
        j.project_id
        ,j.job_id
        ,j.reservation_id
        ,j.creation_time
        ,j.start_time
        ,j.end_time
        ,j.job_stages
    -- from `PROJECT_ID.region-us.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION` j
    -- from `PROJECT_ID.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` j
    -- from `PROJECT_ID.region-us.INFORMATION_SCHEMA.JOBS_BY_USER` j
    -- default to this query's billing project
    from `region-us.INFORMATION_SCHEMA.JOBS_BY_USER` j
    where
        j.job_type = 'QUERY' -- LOAD, QUERY, COPY, or EXTRACT
        -- JOBS_BY_ tables partitioned on creation_time by day;
        -- creation_time is UTC and current_date defaults to UTC
        and date(j.creation_time) between
            /* need to include day before to capture all jobs in timeframe */
            date_add(
                selected_date
                ,interval -1 day
            )
            and date_add(
                selected_date
                ,interval selected_days - 1 day
            )
)
,jobs_intervals as
(
    select
        j.project_id
        ,j.job_id
        ,j.reservation_id
        ,j.creation_time
        ,j.start_time
        ,j.end_time
        ,case
            when array_length(j.job_stages) > 0
                then
                    timestamp_millis(
                        (
                            select
                                min(js.start_ms)
                            from unnest(j.job_stages) js
                        )
                    )
            else null
        end as min_stage_start_time
        ,case
            when array_length(j.job_stages) > 0
                then
                    timestamp_millis(
                        (
                            select
                                max(js.end_ms)
                            from unnest(j.job_stages) js
                        )
                    )
            else null
        end as max_stage_end_time
        ,case
            when array_length(j.job_stages) > 0
                then
                    (
                        select
                            array_agg(
                                struct(
                                    js.start_time
                                    ,js.end_time
                                )
                                order by
                                    js.start_time
                            )
                        from (
                            /* merge intervals logic from */
                            /* https://stackoverflow.com/questions/57173813/merge-overlapping-intervals-with-containing-null-values */
                            select
                                min(js.start_time) as start_time
                                ,nullif(
                                    max(js.end_time)
                                    ,j.end_time
                                ) as end_time
                            from (
                                select
                                    js.*
                                    ,countif(
                                        js.prev_max_end_time is null
                                        or js.prev_max_end_time < start_time
                                    ) over (
                                        order by
                                            js.start_time
                                    ) as group_nbr
                                from (
                                    select
                                        js.* except (end_time)
                                        ,coalesce(
                                            js.end_time
                                            ,j.end_time
                                        ) as end_time
                                        ,max(
                                            coalesce(
                                                js.end_time
                                                ,j.end_time
                                            )
                                        ) over (
                                            order by
                                                js.start_time
                                            rows between unbounded preceding and 1 preceding
                                        ) as prev_max_end_time
                                    from (
                                        select
                                            timestamp_millis(
                                                js.start_ms
                                            ) as start_time
                                            ,timestamp_millis(
                                                js.end_ms
                                            ) as end_time
                                        from unnest(j.job_stages) js
                                    ) js
                                ) js
                            ) js
                            group by
                                js.group_nbr
                        ) js
                    )
            else null
        end as stage_intervals
    from jobs_selected j
)
,time_buckets as
(
    select
        start_time
        ,timestamp_add(
            start_time
            ,interval selected_bucket_minutes minute
        ) as end_time
    from unnest(
        generate_timestamp_array(
            timestamp(selected_date)
            ,timestamp(
                date_add(
                    selected_date
                    ,interval selected_days day
                )
            )
            ,interval selected_bucket_minutes minute
        )
    ) start_time
    where
        start_time <
            timestamp(
                date_add(
                    selected_date
                    ,interval selected_days day
                )
            )
)
/* note that elapsed times calculated here may go negative, */
/* will need to only include those elapsed times that are positive */
,jobs_intervals_buckets as
(
    select
        tb.start_time as bucket_start_time
        ,j.project_id
        ,j.creation_time
        ,j.reservation_id
        ,timestamp_diff(
            least(
                j.start_time
                ,tb.end_time
            )
            ,greatest(
                j.creation_time
                ,tb.start_time
            )
            ,millisecond
        ) as job_start_delay_ms
        ,timestamp_diff(
            least(
                j.min_stage_start_time
                ,tb.end_time
            )
            ,greatest(
                j.start_time
                ,tb.start_time
            )
            ,millisecond
        ) as first_stage_delay_ms
        ,timestamp_diff(
            least(
                j.max_stage_end_time
                ,tb.end_time
            )
            ,greatest(
                j.min_stage_start_time
                ,tb.start_time
            )
            ,millisecond
        ) as stages_elapsed_ms
        ,(
            select
                sum(
                    /* handling negatives here because I have to */
                    case
                        when
                            least(
                                si.end_time
                                ,tb.end_time
                            ) >
                            greatest(
                                si.start_time
                                ,tb.start_time
                            )
                            then
                                timestamp_diff(
                                    least(
                                        si.end_time
                                        ,tb.end_time
                                    )
                                    ,greatest(
                                        si.start_time
                                        ,tb.start_time
                                    )
                                    ,millisecond
                                )
                        else 0
                    end
                )
            from unnest(j.stage_intervals) si
        ) as stages_active_elapsed_ms
        ,timestamp_diff(
            least(
                j.end_time
                ,tb.end_time
            )
            ,greatest(
                j.max_stage_end_time
                ,tb.start_time
            )
            ,millisecond
        ) as job_end_delay_ms
    from jobs_intervals j
    join time_buckets tb
    on
        tb.start_time < j.end_time
        and tb.end_time > j.creation_time
)
/* handling negative values here for most of the ms values */
,jobs_ms_buckets as
(
    select
        j.bucket_start_time
        ,j.project_id
        ,j.creation_time
        ,j.reservation_id
        ,case
            when not (j.job_start_delay_ms > 0) then 0
            else j.job_start_delay_ms
        end as job_start_delay_ms
        ,case
            when not (j.first_stage_delay_ms > 0) then 0
            else j.first_stage_delay_ms
        end as first_stage_delay_ms
        ,case
            when not (j.stages_elapsed_ms > 0) then 0
            else j.stages_elapsed_ms
        end as stages_elapsed_ms
        ,stages_active_elapsed_ms
        ,case
            when not (j.job_end_delay_ms > 0) then 0
            else j.job_end_delay_ms
        end as job_end_delay_ms
    from jobs_intervals_buckets j
)
,jobs_delay_buckets as
(
    select
        j.bucket_start_time
        ,j.project_id
        ,j.creation_time
        ,j.reservation_id
        ,j.job_start_delay_ms
            + j.first_stage_delay_ms
            + (
                j.stages_elapsed_ms
                - j.stages_active_elapsed_ms
            )
            + j.job_end_delay_ms
            as delay_ms
    from jobs_ms_buckets j
)
-- ,project_reservation_mapping as
-- (
--     select
--         prm.project_id
--         ,nullif(
--             prm.reservation
--             ,'<deleted>'
--         ) as reservation
--         ,prm.timestamp as start_time
--         ,coalesce(
--             timestamp_add(
--                 lead(prm.timestamp) over (
--                     partition by
--                         prm.project_id
--                     order by
--                         timestamp
--                 )
--                 ,interval -1 microsecond
--             )
--             ,cast('9999-12-31 23:59:59.999999' as timestamp)
--         ) as end_time
--     from `RESERVATIONS_PROJECT_ID.RESERVATIONS_DATASET.project_reservation_mapping` prm
-- )
,jobs_delay_categories_buckets as
(
    select
        j.bucket_start_time
        ,j.project_id
        /* clean up reservation before grouping to ensure no duplicates */
        /* remove admin project and region qualifiers (e.g., ADMIN_PROJECT_ID:US.) */
        ,regexp_replace(
            coalesce(
                /* Google adds 'pac-' prefix in tree hierarchy; remove it */
                regexp_replace(
                    /* Google inconsistent about including 'root/' in flat hierarchy; remove it */
                    regexp_replace(
                        j.reservation_id
                        ,r'^([^:]*):([^.]*)\.root/'
                        ,r'\1:\2.'
                    )
                    ,r'^pac-'
                    ,''
                )
                ,'<unknown>'
            )
            ,r'^[^:]+:[^.]+\.'
            ,''
        ) as reservation
--         /* remove admin project and region qualifiers (e.g., ADMIN_PROJECT_ID:US.) */
--         ,regexp_replace(
--             coalesce(
--                 /* Google adds 'pac-' prefix in tree hierarchy; remove it */
--                 regexp_replace(
--                     /* Google inconsistent about including 'root/' in flat hierarchy; remove it */
--                     regexp_replace(
--                         j.reservation_id
--                         ,r'^([^:]*):([^.]*)\.root/'
--                         ,r'\1:\2.'
--                     )
--                     ,r'^pac-'
--                     ,''
--                 )
--                 ,prm.reservation
--                 ,'default'
--             )
--             ,r'^[^:]+:[^.]+\.'
--             ,''
--         ) as reservation
        ,case
            when coalesce(j.delay_ms, 0) < 1 * 1000 then '1. [0s-1s)'
            when j.delay_ms < 10 * 1000             then '2. [1s-10s)'
            when j.delay_ms < 1 * 60 * 1000         then '3. [10s-1m)'
            when j.delay_ms < 5 * 60 * 1000         then '4. [1m-5m)'
            when j.delay_ms < 15 * 60 * 1000        then '5. [5m-15m)'
            when j.delay_ms < 30 * 60 * 1000        then '6. [15m-30m)'
            else '7. [30m-60m]'
        end as delay_category
        ,j.delay_ms
    from jobs_delay_buckets j
--     left join project_reservation_mapping prm
--     on
--         prm.project_id = j.project_id
--         and j.creation_time between prm.start_time and prm.end_time
)
select
    j.bucket_start_time
    ,j.project_id
    ,j.reservation
    ,j.delay_category
    ,coalesce(
        sum(j.delay_ms) / 1000
        ,0
    ) as delay_seconds
    ,coalesce(
        max(j.delay_ms) / 1000
        ,0
    ) as max_delay_seconds
    ,count(*) as job_count
from jobs_delay_categories_buckets j
group by
    j.bucket_start_time
    ,j.project_id
    ,j.reservation
    ,j.delay_category
order by
    j.bucket_start_time
    ,j.project_id
    ,j.reservation
    ,j.delay_category
;
Prev	Up	Next
6.3. Query Job Stages	Home \| ToC	6.5. Projects Query Jobs Elapsed Time Analysis