Prerequisites:
Gather information using sections below. Note that some
fields may be made available in one source but not the
other. Note that some fields are not populated or not
populated accurately until the job is
DONE
.
Familiarize self with concepts and terms discussed in Section 4.2.3, “Interpreting Query Job Information”.
Instructions for troubleshooting a query:
Check the BigQuery performance for the organization, reservation, and/or project.
Use the queries in Section 6.5, “Projects Query Jobs Elapsed Time Analysis” to evaluate organization, reservation, and/or project performance based on the elapsed query times of recent queries and matching prior queries.
Check the job-level information.
File job.csv
can be examined in a
spreadsheet application.
Check job state and enhanced state.
Note that the state and enhanced state may impact the availability and accuracy of other fields.
Check cache hit.
If cacheHit is
true
, then BigQuery was able to
respond to the query with results from a prior query
job. No slots were consumed.
Check priority.
If priority is
BATCH
, then BigQuery could take a
long time to start the job, depending on available
slots. See https://cloud.google.com/bigquery/docs/running-queries#batch.
Check for errors.
If Google field error_reason is not
null
, then an error occurred. See
also Google field error_message. Other error
fields also exist. Some types of errors are registered
before slots can be consumed. Some types of error are
registered after slots have been consumed.
Check if simple, parent, or child job.
See https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting
A simple job is unrelated to another job.
A parent job will have statementType set to
SCRIPT
. Additional comments on
fields:
Normal value you would expect.
The parent job ends after the last child job.
If in-flight, then
null
.
If DONE
, then sum of
the child jobs totalSlotMs or
null
if no child
jobs consumed slots.
Note that, if you include parent jobs and child jobs in an aggregate, you will double count the actual number of slot milliseconds consumed.
If in-flight, then
null
.
If DONE
, then number
of child jobs or
null
if no child
executed.
Zero length array.
null
.
A child job will have parentJobId set to the job ID of the parent job. Most fields are treated just like fields for a simple query job. Additional comments on fields:
Portion of a script that is a DML or DDL statement.
set
statements with
a SELECT statement will result in child
query text that contains two
semicolon-delimited statements:
CREATE TEMP
FUNCTION
SELECT
statement that wraps the
given SELECT
statement
'script_job_' + long
random-looking hexadecimal number + '_'
+ zero-based child number
Check for long delays, when no slots are assigned to a stage.
The PENDING
interval is a delay
between the job creationTime and job startTime. For
INTERACTIVE
queries this delay is
almost always small, typically, less than 1 second. It
is highly unusual for this delay to last longer than a
couple minutes. Derived field pending_ms indicates the
length of this delay.
The UNSTAGED
interval is a delay
between the job startTime and stages_start_time. This delay
can be quite large, possibly hours. While a job is
in-flight, it can be determined to be
UNSTAGED
if the job state is
RUNNING
(or equivalently, there is
some value for startTime) and there are no
job_stages elements
(or equivalently, derived field stages_count is zero). The
difference between the current time and the startTime would be the current
delay. After a query job is DONE
,
derived field unstaged_ms indicates the
length of this delay.
The zero, one, or more INACTIVE
intervals is a delay between job stages. This delay
can be quite large, possibly hours. While a job is
in-flight, it can be determined to be
INACTIVE
(or possibly
ENDING
) if there are some number of
COMPLETE
stages (derived field
stages_complete_count is
greater than zero) but no RUNNING
stages (stages_running_count is zero).
There is currently no good way of measuring how long
this delay is just by looking at the job information.
And there is no good way of determining if the job was
INACTIVE
previously. After a query
job is DONE
, derived field staged_inactive_ms indicates
the length of these delays.
The ENDING
interval is a delay
between the stages_end_time and job
endTime. This delay
is almost always small, typically, less than 1
second. For most query jobs, it is unusual for this
delay to last longer than a few minutes. For
DELETE
, MERGE
,
and UPDATE
query jobs, it is
possible for this delay to be hours. While a job is
in-flight, it can be determined to be
ENDING
(or possibly
INACTIVE
) if there are some number
of COMPLETE
stages but no
RUNNING
stages. There is currently
no good way of measuring how long this delay is just by
looking at the job information. After a query job is
DONE
, derived field ending_ms indicates the length
of this delay.
Factors involved in delays may include:
Number of slots assigned to the reservation containing the project to which the job is billed.
In the billing project, number of query stages simultaneously requesting slots and the number of slots requested.
In the reservation containing the project to which the job is billed, number of projects requesting slots and the number of slots requested.
In the organization, number of query stages simultaneously requesting slots and the number of slots requested. If you are lucky, your job may automatically be assigned some number of slots borrowed from another reservation.
For a description of BigQuery slot pricing and scheduling, see:
Some options to consider:
Learn to live with some possible delay.
Work with teammates to schedule query jobs to avoid contention for slots in the billing project and the reservation.
It might be possible to optimize a query or the targeted views or tables to reduce the amount of resources the query requires.
Obtain more slots for your project.
Some possibilities:
Have your project assigned to a new or different reservation.
Pay for additional slots for the reservation containing your project.
Change your project from using flat pricing to using on-demand pricing.
Check statistics for the longest stage.
These fields are only valid after the query job is
DONE
. Google fields max_elapsed_ms_name, max_elapsed_ms_start_time, and
max_elapsed_ms_end_time and
derived field max_elapsed_ms are shown for
the job stage with maximum elapsed_ms. These fields
can indicate if one stage uses a disproportionate
fraction of the job time. You may also want to begin
job stage troubleshooting with this stage.
Check wait statistics.
These fields are only valid after the query job is
DONE
. A job stage slot is waiting
for some interval of time when it is unable to execute
useful work (read, compute, or write). This may be
because of I/O limitations or time spent waiting on
Google's shuffle service. A high value for a wait
statistic may indicate a problem. See derived fields
max_wait_ms_max and
sum_wait_ms_max.
Check compute skew.
These fields are only valid after the query job is
DONE
. Google fields max_significant_compute_name,
max_significant_compute_ms_avg,
and max_significant_compute_ms_max
and derived field max_significant_compute_skew
are shown for the job stage with maximum
“significant” compute_ms_max.
“Significant” job stages have
compute_ms_max >=
threshold_compute_ms_max
and
compute_skew >= threshold_compute_skew
.
If Google field compute_ms_max is large (many
minutes) and derived field compute_skew is large
(approaching 1), this indicates a skew problem, that a
small number of the total number of slots are
responsible for a large amount of the work done. Look
into ways to reduce skew. TBD...
Check the elapsed time during which BigQuery is actively working on the query and check total slot milliseconds.
Derived field staged_active_ms is a sum of the elapsed time of all stage islands, where a stage island is an interval containing one or more contiguous or overlapping stages. So, this field is a measure of how many seconds one or more slots was allocated. This type of measure is notoriously inconsistent because system load can easily cause it to vary between two jobs with the same query targeting the same data.
Google field total_slot_ms is a sum of slot milliseconds consumed by the query. For various reasons, this measure can vary by as much as a factor of two or more between two jobs with the same query targeting the same data. For example, a common strategy for evaluating two different query strategies is to execute the same queries 10 times in approximately the same timeframe and consider the average slot milliseconds of each of the two queries.
Check total bytes processed, shuffle bytes, and records written.
Google field total_bytes_processed reflects how many bytes were read for the query job. Derived field shuffle_output_bytes is the sum of shuffle_output_bytes over all job stages. Derived field last_records_written is the records_written of the last stage. So, these fields represent the input, intermediate output, and final output of the query job. And they can be useful for gauging the performance of one run of a query to similar prior runs of the same query.
Derived field shuffle_output_bytes_spilled is the sum of shuffle_output_bytes_spilled over all job stages. This measures how many bytes in shuffle had to be spilled to storage instead of remaining in memory. Read and writes to storage are considerably slower than to memory. So, if this value is large, the job may take longer than otherwise.
Check the job stage-level information.
File job_stages.csv
can be examined in a
spreadsheet application.
If the job stage-level information is computed using Section 6.3, “Query Job Stages”, there will be fields pertaining to “primary” job stages and “related” job stages. Refer to that section for further details.
Check job stage status.
Check job stage status field.
Check parallel inputs fields.
While in-flight, you can gauge the progress of
RUNNING
stages by looking at Google
fields completed_parallel_inputs and
parallel_inputs and
derived field completed_parallel_inputs_percent.
Google indicates that it is possible that not all
inputs within a stage need to be completed for the
stage to complete. For more information, see https://cloud.google.com/bigquery/query-plan-explanation#stage_overview.
Check the elapsed time during which BigQuery is actively working on the query and check total slot milliseconds.
Especially, when there is a large number of stages, focusing on those stages with high elapsed milliseconds and/or high slot milliseconds can speed troubleshooting. Note that the root cause of the high cost of a given stage may be found in a prior stage. So, you may need to carefully backtrack some number of stages to find the root cause. Google fields steps and input_stages can be used for this.
Another thing to consider is that, if the stages consuming the vast majority of elapsed milliseconds and/or slot milliseconds can be reasonably confirmed to be operating correctly and efficiently, then any attempt to optimize other stages will have only a very small impact on overall cost. For example, if 99% of the cost of the query is found to reside in the final output stage and the query results are verified to be correct, then any optimization attempts can impact no more than 1% of the cost of the query.
Derived field elapsed_ms is the difference between Google fields end_ms and start_ms.
Derived field related_slot_ms is a sum of slot milliseconds consumed by the related stages.
Check for large wait statistics.
These fields are only valid after the query job is
DONE
. A job stage slot is waiting
for some interval of time when it is unable to execute
useful work (read, compute, or write). This may be
because of I/O limitations or time spent waiting on
Google's shuffle service. A high value for a wait
statistic may indicate a problem. See Google fields
wait_ms_avg and
wait_ms_max.
Check for large shuffle bytes spilled.
A stage with a large shuffle_output_bytes_spilled may cost more in elapsed time and slot milliseconds. If shuffle_output_bytes_spilled is large, then there may also be large wait_ms_avg and wait_ms_max. There could be an underlying problem with the query or the shuffle service. Or it could just be that there are too many queries contending for reservation shuffle resources at the same time.
Check for large compute_ms_max and compute skew.
These fields are only valid after the query job is
DONE
. If Google field compute_ms_max is large (many
minutes) and derived field compute_skew is large
(approaching 1), this indicates a skew problem, that a
small number of the total number of slots are
responsible for a large amount of the work done. Look
into ways to reduce skew. TBD...
Check that partition elimination occurs as expected.
Google field steps
and derived field read_from will indicate any
tables that may be read from in the stage. For a
partitioned table, you expect the
READ
step for the table to indicate
filtering on the partitioning field. Also, Google
field records_read
indicates the sum of the number of records read from
each table and spool referenced in the stage. So, if
you know the total number of records in the table, you
may be able to infer from this field whether or not
partition elimination has occurred.
For additional information about partition elimination, see Section 4.1, “Partition Elimination”.
Check for extraneous data.
The smaller the amount of data input to a stage, the
less the cost of the subsequent operations. So, reduce
the data as much as possible and reduce the data as
early as possible in the query plan. For example, by
filtering significantly in a READ
step, you can reduce the cost of a subsequent
JOIN
step.
Where possible, reduce both the number of fields and the number of rows. Check the query text. Check Google fields recordsRead and recordsWritten for each stage.
Check join conditions.
A common mistake in queries is that join conditions are incorrect. In addition to yielding incorrect results, this can cause performance problems.
Inspect the query text to see if join conditions are as intended and/or make sense.
In some cases, you may be missing one or more join conditions.
In some cases, you may be joining to a data at the wrong grain. For example, maybe for the query to make sense, you should join to data at a (date, location) grain. But you are joining data that is at a (date, location, sublocation) grain. Maybe you need to aggregate the data to (date, location) grain before joining. Or maybe you need to somehow select rows with filtering to ensure the data is at (date, location) grain prior to joining.
Check Google field recordsWritten to see if the result of the join makes sense.
You can use Section 6.1, “Primary Key Test” to see whether or not the join results violate the expected primary key. You can also use the same section to test separately the primary key of each table that is joined.
Check join strategies.
Check that BigQuery is choosing efficient join strategies given the nature of the data.
One join strategy is to distribute rows of a large
table (or spool) across slots and copy an entire small
table (or spool) to each slot. If this is the case,
the Google field steps will indicate an
EACH WITH ALL
join. This can also
be referred to as a broadcast join. This is only
possible if the subset of the small table (or spool) is
small enough to fit in the memory of each slot
alongside some portion of the large table (or spool).
The other join strategy is to distribute rows of both
tables (or spools) by a hash of the join condition
values across slots. If this is the case, the Google
field steps will
indicate an EACH WITH EACH
join.