The BigQuery Cloud SQL federated queries feature enables a BigQuery query to collect data from Cloud SQL query results and treat it as if it were a BigQuery table that had been targeted. This is documented at https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries.
The most difficult aspect of using BigQuery Cloud SQL federated queries
revolves around how locations for resources are handled. Cloud SQL
instances are always created with a location that is a specific region
(technically, even more specific --- specific to a zone within a
region). It is possible for BigQuery resources to be region-specific.
But it is common for BigQuery resources to be multi-region. E.g.,
datasets and jobs are created in US
multi-region.
And slot reservations are configured in the US
multi-region location. You cannot use a US
multi-region job to access a dataset in a region-specific location,
even if that region is geographically within the United States.
However, Google does support using a US
multi-region
job to access Cloud SQL data for certain supported region-specific
locations. The documentation indicates that US
multi-region jobs involving a Cloud SQL federates query will perform
best if the Cloud SQL instance is in the us-central1
region. For more details see, https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries#regions.
Note that BigQuery support was added to region
us-east1
on 2019-11-20. See https://cloud.google.com/bigquery/docs/release-notes#November_20_2019.
The ability to execute BigQuery Cloud SQL federated queries against a
us-east1
Cloud SQL instance was confirmed. However,
it may take some time for the Google documentation to reflect
support.
Below are instructions for testing BigQuery Cloud SQL federated queries
against a MySQL instance. Note that an assumption is made that
BigQuery datasets and jobs are created in location
US
multi-region.
Connect to Cloud Shell.
Note that, in order to connect to a Cloud SQL instance through Cloud Shell, you need sufficient permissions (e.g., role Cloud SQL Client).
For information about Cloud Shell, see https://cloud.google.com/shell/docs/.
Set variables.
Note that how the CONNECTION_LOCATION
value
is set depends on the Cloud SQL location. Location
us-central1
is special.
PROJECT='my_project' CLOUD_SQL_INSTANCE='my-cloud-sql-instance' CLOUD_SQL_LOCATION='us-central1' CLOUD_SQL_DATABASE='test' CONNECTION="${CLOUD_SQL_INSTANCE}-${CLOUD_SQL_DATABASE}" if [ "${CLOUD_SQL_LOCATION}" = 'us-central1' ] then CONNECTION_LOCATION='us' else CONNECTION_LOCATION="${CLOUD_SQL_LOCATION}" fi
Set Google Cloud project.
gcloud \ config \ set \ project "${PROJECT}"
List Cloud SQL instances.
gcloud \ sql \ instances \ list
Create Cloud SQL instance.
In the command below, replace text
CLOUD_SQL_ROOT_PASSWORD
as needed.
gcloud \ sql \ instances \ create \ --region "${CLOUD_SQL_LOCATION}" \ --database-version MYSQL_5_7 \ --root-password CLOUD_SQL_ROOT_PASSWORD \ "${CLOUD_SQL_INSTANCE}"
Connect to Cloud SQL instance.
After executing the command below, when prompted, provide the Cloud SQL root password.
gcloud \ sql \ connect "${CLOUD_SQL_INSTANCE}" \ --user root
Execute SQL commands to create database entities.
In the commands below, replace text
CLOUD_SQL_READER_PASSWORD
as needed.
CREATE DATABASE `test`;
create user 'reader'@'%' identified by 'CLOUD_SQL_READER_PASSWORD';
grant select on test.* to 'reader'@'%';
use test;
CREATE TABLE `foo` (
`name` varchar(255)
);
insert into test.foo (name) values ('asdf');
quit
List BigQuery connections.
bq \ ls \ --connection \ --location "${CONNECTION_LOCATION}"
Create BigQuery connection.
In the command below, replace text
CLOUD_SQL_READER_PASSWORD
as needed.
bq \ mk \ --connection \ --connection_type='CLOUD_SQL' \ --properties='{"instanceId":"'"${PROJECT}:${CLOUD_SQL_LOCATION}:${CLOUD_SQL_INSTANCE}"'","database":"'"${CLOUD_SQL_DATABASE}"'","type":"MYSQL"}' \ --connection_credential='{"username":"reader", "password":"CLOUD_SQL_READER_PASSWORD"}' \ --location="${CONNECTION_LOCATION}" \ "${CONNECTION}"
Execute a BigQuery Cloud SQL federated query.
Note that, for the query to work, you may need to
explicitly indicate that the job be executed in the
US
multi-region location so it does not
default to the location in which the connection was
created. In the bq command below, the
--location us
option is provided. To set
this in the web UI, click button
, select , and, under label
Processing location, select
.
bq \ query \ --nouse_legacy_sql \ --location us \ 'select * from external_query("'"${PROJECT}.${CONNECTION_LOCATION}.${CONNECTION}"'", "select * from '"${CLOUD_SQL_DATABASE}"'.foo;");'
Remove BigQuery connection.
bq \ rm \ --connection \ --location="${CONNECTION_LOCATION}" \ "${CONNECTION}"
Delete Cloud SQL instance.
Note that, after a Cloud SQL instance is deleted, it may be some time before a Cloud SQL instance with the same name can be created.
gcloud \ sql \ instances \ delete \ "${CLOUD_SQL_INSTANCE}"