BigQuery

Prerequisites

We strongly advise to create a dedicated user to extract your metadata.

You can follow those instructions to create the catalog user.

Run extraction script

Once the package has been installed, you should be able to run the following command in your terminal:

castor-extract-bigquery [arguments]

The script will run and display logs as following:

INFO - Credentials fetched from /.../keys/your-service-account.json
INFO - Available projects: ['project-1', 'project-2']

INFO - Extracting `DATABASE` ...
INFO - Results stored to /tmp/catalog/1649082442-database.csv


...

INFO - Extracting `USER` ...
INFO - Results stored to /tmp/catalog/1649082442-user.csv
INFO - Wrote output file: /tmp/catalog/1649082442-summary.json

Credentials

  • -k, --token: Token provided by Catalog

Other arguments

  • -o, --output: target folder to store the extracted files

Optional arguments

  • --skip-existing: Skip files already extracted instead of replacing them

  • --db-allowed: GCP project(s) you want to extract 🚦

  • --db-blocked: GCP project(s) you don't want to extract 🚦

You can also get help with argument --help

Use ENV variables

If you don't want to specify arguments every time, you can set the following ENV in your .bashrc:

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/credentials/service-account.json"
export CASTOR_OUTPUT_DIRECTORY="/tmp/catalog"

Then the script can be executed without any arguments:

castor-extract-bigquery

It can also be executed with partial arguments (the script looks in your ENV as a fallback):

castor-extract-bigquery --output /tmp/catalog

Database filtering

In GCP, databases are often referred to as Projects

Database filters are optional. If you don't specify any filter, all available databases will be extracted (depending on the provided credentials)

# extract all but <...>
castor-extract-bigquery --db-blocked ZZ_DEPRECATED_PROJECT

# extract only <...>
castor-extract-bigquery --db-allowed PROJECT_1 PROJECT_2 PROJECT_3

# mixed (not really useful, but still doable)
castor-extract-bigquery --db-allowed PROJECT_1 --db-blocked ZZ_DEPRECATED_PROJECT

Last updated

Was this helpful?