Snowflake

Prerequisites

We strongly advise to create a dedicated user to extract your metadata.

You can follow those instructions to create the catalog user.

Run extraction script

Once the package has been installed, you should be able to run the following command in your terminal:

castor-extract-snowflake [arguments]

The script will run and display logs as following:

INFO - Extracting `DATABASE` ...
INFO - Results stored to /tmp/catalog/1649083626-database.csv


...

INFO - Extracting `USER` ...
INFO - Results stored to /tmp/catalog/1649083626-user.csv
INFO - Wrote output file: /tmp/catalog/1649083626-summary.json

Credentials

  • -a, --account: Snowflake account identifier

  • -u, --user: Snowflake user

  • -p, --password: Snowflake password

If you use Keep pair Authentication (documentation), with no passphrase, replace password with:

  • -pk, --private_key: Snowflake private key

The key should be formatted as follows:

Other arguments

  • -o, --output: target folder to store the extracted files

Optional arguments

  • --skip-existing: Skip files already extracted instead of replacing them

  • --fetch-transient: Will fetch transients tables if added

  • --warehouse: Use a specific WAREHOUSE to run extraction queries

  • --role: Use a specific ROLE to run extraction queries

  • --db-allowed: Database(s) you want to extract

  • --db-blocked: Database(s) you don't want to extract

  • --insecure-mode: Turns off OCSP checking for Snowflake Client. Use when facing OCSP issues

You can also get help with argument --help

Use ENV variables

If you don't want to specify arguments every time, you can set the following ENV in your .bashrc:

export CASTOR_SNOWFLAKE_ACCOUNT=aCC0un7.eu-central-1
export CASTOR_SNOWFLAKE_USER=us3R
export CASTOR_SNOWFLAKE_PASSWORD=******
export CASTOR_SNOWFLAKE_INSECURE_MODE=True

export CASTOR_OUTPUT_DIRECTORY="tmp/catalog_output"

Then the script can be executed without any arguments:

castor-extract-snowflake

It can also be executed with partial arguments (the script looks in your ENV as a fallback):

castor-extract-snowflake --output /tmp/catalog

Database filtering

Database filters are optional. If you don't specify any filter, all available databases will be extracted (depending on the provided credentials)

# extract all but <...>
castor-extract-snowflake --db-blocked ZZ_DEPRECATED_DB

# extract only <...>
castor-extract-snowflake --db-allowed DB_1 DB_2 DB_2

# mixed (not really useful, but still doable)
castor-extract-snowflake --db-allowed DB_1 --db-blocked ZZ_DEPRECATED_DB

ROLE and WAREHOUSE

Here is how the extractor connects to SNOWFLAKE:

  1. Use the provided --role & --warehouse if any

  2. Otherwise, use the default ROLE & WAREHOUSE

  3. If no defaults are set, use an arbitrary available ROLE & WAREHOUSE

The --role and --warehouse arguments are useful when you need to enforce specific values.

If you extract your data with user catalog (instructions here) you don't need to specify those values:

  • METADATA_WH is the only available warehouse, it will be used by default

  • METADATA_VIEWER_ROLE is the only available role, it will be used by default

Last updated

Was this helpful?