As of 2020-02-20, BigQuery does not support WITH RECURSIVE
syntax. See https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#with-clause.
However, with BigQuery
scripting, it is possible to work around this limitation with
SQL that is not as terse but clearer.
The WITH RECURSIVE
syntax is useful for
listing ancestors in a tree
structure. The table will have a primary key that uniquely
identifies each node. The table may have a field that identifies zero
or one parent nodes. The table may have a field (possibly an array)
that identifies zero, one, or more child nodes. There will be one or
more root nodes where a field identifying the parent node contains a
value that by some convention is different from the primary key value
for another node. It could be null
or the same as
the primary key value for the node in question or some other value that
is guaranteed not to be a primary key value (e.g.,
0
, -1
, etc.). If there is a
field identifying child nodes, then a leaf node will indicate that it
has no children (e.g., an empty array). Some examples of tree
structure tables:
Each row corresponds to an employee. There is a field to uniquely identify the employee. And a field to identify a single manager over the employee. The CEO is a single root node.
Each row corresponds to a table or view. There is a field to uniquely identify the table or view. And a field to identify zero, one, or more tables or views that a given view targets directly. Tables are leaf nodes but may be orphans. For a view, each targeted table or view is a child node. A view may be a root node. It may be a child node.
The SQL below creates a table in which the rows are related in a tree
structure. In general, the values for a field like name would be arbitrary (e.g.,
Tom
, Dick
, and
Harry
). But here, I have chosen to make the names
reflect the tree structure to make the intended relationships clearer.
In this example, null
is used for the parent_id of the root node.
create or replace table `recursive.tree` as select * from unnest( [ struct<id int64, name string, parent_id int64>(1, '1', null) ,struct<id int64, name string, parent_id int64>(2, '1-1', 1) ,struct<id int64, name string, parent_id int64>(3, '1-2', 1) ,struct<id int64, name string, parent_id int64>(4, '1-3', 1) ,struct<id int64, name string, parent_id int64>(5, '1-1-1', 2) ,struct<id int64, name string, parent_id int64>(6, '1-2-1', 3) ,struct<id int64, name string, parent_id int64>(7, '1-2-2', 3) ] ) x ;
The SQL below generates a row for each id and ancestor.
declare prior_ancestor_generation int64 default 1; create temp table ancestors -- depending on size of data, partitioning by ancestor_generation -- might improve performance partition by range_bucket( ancestor_generation ,generate_array( 0 -- minimum ancestor_generation ,50 -- data-dependent maximum ancestor_generation ,1 -- bucket steps ) ) as select t.id ,t.parent_id as ancestor_id ,case -- root condition when t.parent_id is null then 0 -- not the root condition else 1 end as ancestor_generation from `recursive.tree` t ; loop insert into ancestors select pa.id ,ca.parent_id as ancestor_id ,pa.ancestor_generation + 1 as ancestor_generation -- prior ancestor from ancestors pa -- current ancestor join `recursive.tree` ca on ca.id = pa.ancestor_id where -- only include rows for prior generation pa.ancestor_generation = prior_ancestor_generation -- exclude root condition and ca.parent_id is not null ; if @@row_count = 0 then break; end if ; set prior_ancestor_generation = prior_ancestor_generation + 1; end loop ; select a.id ,t.name ,a.ancestor_generation ,a.ancestor_id ,ta.name as ancestor_name from ancestors a join `recursive.tree` t on t.id = a.id -- must be a left join, not inner join, -- if root parent_id is not an id in tree left join `recursive.tree` ta on ta.id = a.ancestor_id order by a.id ,a.ancestor_generation ;
Results are below.
id | name | ancestor_generation | ancestor_id | ancestor_name |
---|---|---|---|---|
1 | 1 | 0 | null | null |
2 | 1-1 | 1 | 1 | 1 |
3 | 1-2 | 1 | 1 | 1 |
4 | 1-3 | 1 | 1 | 1 |
5 | 1-1-1 | 1 | 2 | 1-1 |
5 | 1-1-1 | 2 | 1 | 1 |
6 | 1-2-1 | 1 | 3 | 1-2 |
6 | 1-2-1 | 2 | 1 | 1 |
7 | 1-2-2 | 1 | 3 | 1-2 |
7 | 1-2-2 | 2 | 1 | 1 |