3.7. Cloud SQL Federated Queries

The BigQuery Cloud SQL federated queries feature enables a BigQuery query to collect data from Cloud SQL query results and treat it as if it were a BigQuery table that had been targeted. This is documented at https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries.

The most difficult aspect of using BigQuery Cloud SQL federated queries revolves around how locations for resources are handled. Cloud SQL instances are always created with a location that is a specific region (technically, even more specific --- specific to a zone within a region). It is possible for BigQuery resources to be region-specific. But it is common for BigQuery resources to be multi-region. E.g., datasets and jobs are created in US multi-region. And slot reservations are configured in the US multi-region location. You cannot use a US multi-region job to access a dataset in a region-specific location, even if that region is geographically within the United States. However, Google does support using a US multi-region job to access Cloud SQL data for certain supported region-specific locations. The documentation indicates that US multi-region jobs involving a Cloud SQL federates query will perform best if the Cloud SQL instance is in the us-central1 region. For more details see, https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries#regions.

Note that BigQuery support was added to region us-east1 on 2019-11-20. See https://cloud.google.com/bigquery/docs/release-notes#November_20_2019. The ability to execute BigQuery Cloud SQL federated queries against a us-east1 Cloud SQL instance was confirmed. However, it may take some time for the Google documentation to reflect support.

Below are instructions for testing BigQuery Cloud SQL federated queries against a MySQL instance. Note that an assumption is made that BigQuery datasets and jobs are created in location US multi-region.

  1. Connect to Cloud Shell.

    Note that, in order to connect to a Cloud SQL instance through Cloud Shell, you need sufficient permissions (e.g., role Cloud SQL Client).

    For information about Cloud Shell, see https://cloud.google.com/shell/docs/.

  2. Set variables.

    Note that how the CONNECTION_LOCATION value is set depends on the Cloud SQL location. Location us-central1 is special.

    PROJECT='my_project'
    
    CLOUD_SQL_INSTANCE='my-cloud-sql-instance'
    
    CLOUD_SQL_LOCATION='us-central1'
    
    CLOUD_SQL_DATABASE='test'
    
    CONNECTION="${CLOUD_SQL_INSTANCE}-${CLOUD_SQL_DATABASE}"
    
    if [ "${CLOUD_SQL_LOCATION}" = 'us-central1' ]
    then
        CONNECTION_LOCATION='us'
    else
        CONNECTION_LOCATION="${CLOUD_SQL_LOCATION}"
    fi
  3. Set Google Cloud project.

    gcloud \
        config \
        set \
        project "${PROJECT}"
  4. List Cloud SQL instances.

    gcloud \
        sql \
        instances \
        list
  5. Create Cloud SQL instance.

    In the command below, replace text CLOUD_SQL_ROOT_PASSWORD as needed.

     gcloud \
        sql \
        instances \
        create \
        --region "${CLOUD_SQL_LOCATION}" \
        --database-version MYSQL_5_7 \
        --root-password CLOUD_SQL_ROOT_PASSWORD \
        "${CLOUD_SQL_INSTANCE}"
  6. Connect to Cloud SQL instance.

    After executing the command below, when prompted, provide the Cloud SQL root password.

    gcloud \
        sql \
        connect "${CLOUD_SQL_INSTANCE}" \
        --user root
  7. Execute SQL commands to create database entities.

    In the commands below, replace text CLOUD_SQL_READER_PASSWORD as needed.

    CREATE DATABASE `test`;
    
    create user 'reader'@'%' identified by 'CLOUD_SQL_READER_PASSWORD';
    
    grant select on test.* to 'reader'@'%';
    
    use test;
    
    CREATE TABLE `foo` (
      `name` varchar(255)
    );
    
    insert into test.foo (name) values ('asdf');
    
    quit
  8. List BigQuery connections.

    bq \
        ls \
        --connection \
        --location "${CONNECTION_LOCATION}"
  9. Create BigQuery connection.

    In the command below, replace text CLOUD_SQL_READER_PASSWORD as needed.

     bq \
        mk \
        --connection \
        --connection_type='CLOUD_SQL' \
        --properties='{"instanceId":"'"${PROJECT}:${CLOUD_SQL_LOCATION}:${CLOUD_SQL_INSTANCE}"'","database":"'"${CLOUD_SQL_DATABASE}"'","type":"MYSQL"}' \
        --connection_credential='{"username":"reader", "password":"CLOUD_SQL_READER_PASSWORD"}' \
        --location="${CONNECTION_LOCATION}" \
        "${CONNECTION}"
  10. Execute a BigQuery Cloud SQL federated query.

    Note that, for the query to work, you may need to explicitly indicate that the job be executed in the US multi-region location so it does not default to the location in which the connection was created. In the bq command below, the --location us option is provided. To set this in the web UI, click button More, select Query settings, and, under label Processing location, select United States (US).

    bq \
        query \
        --nouse_legacy_sql \
        --location us \
        'select * from external_query("'"${PROJECT}.${CONNECTION_LOCATION}.${CONNECTION}"'", "select * from '"${CLOUD_SQL_DATABASE}"'.foo;");'
  11. Remove BigQuery connection.

    bq \
        rm \
        --connection \
        --location="${CONNECTION_LOCATION}" \
        "${CONNECTION}"
  12. Delete Cloud SQL instance.

    Note that, after a Cloud SQL instance is deleted, it may be some time before a Cloud SQL instance with the same name can be created.

    gcloud \
        sql \
        instances \
        delete \
        "${CLOUD_SQL_INSTANCE}"