Reports

Download metadata and Catalog analytics reports 🤓

Exportable metadata and analytics reports are available to Admin through the Governance section in Catalog as CSV or XLSX downloads.

When to use Reports

Here are some examples of scenarios when you might use reports:

  • Understanding and navigating your data's health

    • How well are my tables, dashboards or knowledge pages documented?

    • Which table should have the highest SLA?

    • Where is the PII data located?

  • Cost Reduction

    • Who is running the most queries? Which user runs the most expensive queries?

    • Which tables are queried the most? Which dashboards are the most expensive to compute?

  • Noise reduction

    • Which dashboards are never used ?

    • Which tables can be dropped ?

Unused Assets reports

Generated when we ingest either the warehouse or the visualization tool

Unused asset reports analyse how assets are used in your organisation. An asset is flagged as unused if both itself and all its downstream tables and dashboards have not been used by a human or BI tool user (classification of users is performed by Catalog and can be edited on demand).

The analysis is based using the past 180 days worth of data.

Unused tables

  • last_usage: date where Catalog has last seen a "useful" query or dashboard view for the table and its downstream tables/dashboards. Field will be empty if no activity has been seen by Catalog in earliest of past 180 days or date of Catalog connection to your warehouse

  • table_path: path of table in format {database_name}.{schema_name}.{table_name}

  • table_type: VIEW or TABLE

  • creation_date: date where table was first seen by Catalog

  • table_size_mb: storage size in MB of table (NB: tables of size 0.4 MB or below will have size 0 displayed).

  • refresh_computation_duration: sum of query duration required to refresh table data over past 180 days

  • lineage_type:

    • isolated : table has neither parent or child tables/dashboards

    • root : table has no parent tables but has children tables/dashboards

    • branch : table has both parent and children tables/dashboards

    • leaf: table has parent tables but no children tables/dashboards

  • table_url: Catalog url of table (points to columns tab)

Unused dashboards

  • last_usage: date where Catalog has last seen a dashboard view of dashboard. Field will be empty if no activity has been seen by Catalog in earliest of past 180 days or date of Catalog connection to your BI tool

  • dashboard_path: dashboard path

  • dashboard_name: dashboard name

  • creation_date: date where table was first seen by Catalog

  • total_views: number of views since dashboard creation

  • dashboard_url: Catalog url of dashboard (points to Read-me tab)

  • dashboard_type: DASHBOARD, TILE, EXPLORE (Looker), APP (Qlik), DATA_SOURCE (Tableau), DATASET (PowerBI & Sigma)

Metadata Reports

Generated when the download is requested

Dashboard metadata

  • dashboard_path: dashboard path and name

  • dashboard_url: Catalog url of dashboard (points to Read-me tab)

  • description: the description that is in the read-me

  • popularity: popularity of dashboard

  • individual_owners: list of owner emails

  • team_owners: list of team owner's name

  • editors: editor emails

  • user_tags: list of tags added by the user

  • external_tags: list of tags added from the source

  • internal_tags: list of tags added by Catalog

  • is_certified: has the dashboard been certified by a user

  • is_described : does the dashboard has a non-empty read-me

Knowledge metadata

  • knowledge_path: path of knowledge page

  • knowledge_name: name of knowledge page

  • knowledge_url: Catalog url of knowledge page

  • description: the description that is in the read-me

  • individual_owners: list of owner emails

  • team_owners: list of team owner's name

  • tags: list of tags

  • is_certified: is the knowledge page certified

  • is_described : does the knowledge page have a non-empty read-me

  • num_pins: how many assets are pinned in this knowledge page

  • num_references: how many times is this knowledge page pinned

PII columns

  • column_key: key of column in format {database_name}.{schema_name}.{table_name}.{column_name}

  • column_url: Catalog url of column

  • column_type: data type of column

  • column_editable_description: in-app user description

  • column_external_description: description fetched from external source (warehouse, BI tool, or dbt)

  • column_propagated_description: column description propagated from another column using column lineage

Table metadata

  • table_key: key of table in format {database_name}.{schema_name}.{table_name}

  • table_url: Catalog url of table (points to columns tab)

  • description: the description related to that table

  • popularity: popularity of table

  • has_pii: does the table contain any columns flagged as PII

  • individual_owners: list of owner emails

  • team_owners: list of team owner's name

  • user_tags: list of tags generated by a Catalog user

  • external_tags: list of tags imported from the external source

  • internal_tags: (or Catalog tags) list of tags added by Catalog when parsing the queries

  • is_certified: has the table been certified by a user

  • is_deprecated: has the table been deprecated by a user

  • is_described: does the table have a non-empty read-me

  • num_columns: number of columns in table

  • num_columns_described_by_client: number of columns with descriptions, excluding descriptions from Catalog Auto doc

  • num_columns_described_by_castor: number of columns with descriptions performed by Catalog Auto doc

Tag metadata

  • tag_name: name of the tag

  • tag_source: source of the tag

  • asset_type: type of the asset

  • asset_path: path of the asset {database_name}.{schema_name}.{table_name}

Analytics reports

Generated when we ingest either the warehouse or the visualization tool

Dashboard analytics

  • dashboard_path: dashboard path and name

  • dashboard_url: Catalog url of dashboard (points to Read-me tab)

  • num_views: number of views of dashboards over past 180 days

  • num_parent_tables: number of direct parent tables

  • num_ascendant_tables: total number of upstream tables

  • num_children_dashboards: number of direct children dashboards

  • num_descendant_dashboards: total number of downstream dashboards

  • num_parent_dashboards: number of direct parent dashboards

  • num_ascendant_dashboards: total number of upstream dashboards

Table analytics

  • table_name: key of table in format {database_name}.{schema_name}.{table_name}

  • num_read_queries: total number of read queries over past 180 days

  • num_write_queries: total number of write queries over past 180 days

  • num_queries: number of read + write queries over past 180 days (aka the sum of the 2 fields above)

  • popularity: popularity (1-1 000 000 score)

  • num_parent_tables: number of direct parent tables

  • num_ascendant_tables: total number of upstream tables

  • num_children_dashboards: number of direct children dashboards

  • num_descendant_dashboards: total number of downstream dashboards

  • num_children_tables: number of direct children tables

  • num_descendant_tables: total number of downstream tables

  • table_url: Catalog url of table (points to columns tab)

User activities in the warehouse

  • user_name: user email address (if available)

  • user_type: user categorisation by Catalog: Human, BI tool, or Other

  • num_queries: total number of queries performed by user over past 180 days

  • num_read_queries: total number of read queries performed by user over past 180 days

  • num_write_queries: total number of write queries performed by user over past 180 days

  • total_read_queries_duration: sum of read query duration times (for queries performed after 13/11/2022)

  • total_write_queries_duration: sum of write query duration times (for queries performed after 13/11/2022)

  • num_tables_read: total number of tables read by user over past 180 days

  • num_tables_write: total number of tables written by user over past 180 days

User activities in the warehouse per table

  • user_name: user email address (if available)

  • table_path: path of table in format {database_name}.{schema_name}.{table_name}

  • creation_date: date where table was first seen by Catalog

  • num_queries: total number of queries performed by user over past 180 days

  • num_read_queries: number of read queries performed by user on this table over past 180 days

  • num_write_queries: number of write queries performed by user on this table over past 180 days

  • total_read_queries_duration: sum of read query duration times (for queries performed after 13/11/2022)

  • total_write_queries_duration: sum of write query duration times (for queries performed after 13/11/2022)

  • num_days_w_queries: number of days where user has a ran a read or write query on table out of the past 180 days

Ingestion report

You want to know what was loaded by Catalog when we last extracted metadata from your warehouse(s)?

  • source_id: UUID of the source

  • name: name of the source

  • technology: technology of the source

  • ingestion_start: start datetime of the ingestion

  • ingestion_end: end datetime of the ingestion, i.e. when the report was generated

  • day_of_extracted_queries: date of the queries retrieved by this ingestion

  • database_allowed: list of allowed databases

  • database_blocked: list of blocked databases

  • schema_allowed: list of allowed schemas

  • schema_blocked: list of blocked schemas

  • users_blocked_for_query_display: list of users whose queries are not shown in the app

  • users_blocked_for_popularity_compute: list of users whose activity is ignored when calculating the popularity

  • num_databases: total number of databases in the catalog

  • num_schemas: total number of schemas in the catalog

  • num_tables: total number of tables in the catalog

  • num_columns: total number of columns in the catalog

  • num_table_descriptions: total number of tables with an external description (including propagated descriptions)

  • num_column_descriptions: total number of columns with an external description (including propagated descriptions)

  • num_table_tags: total number of external table tags

  • num_new_databases: number of databases added during the ingestion

  • num_new_schemas: number of schemas added during the ingestion

  • num_new_tables: number of tables added during the ingestion

  • num_new_columns: number of columns added during the ingestion

  • num_deleted_databases: number of databases removed during the ingestion

  • num_deleted_schemas: number of schemas removed during the ingestion

  • num_deleted_tables: number of tables removed during the ingestion

  • num_deleted_columns: number of columns removed during the ingestion

  • num_queries_on_day: total number of queries performed on the extraction date, including duplicate queries

  • num_queries_on_day_deduplicated: total number of unique queries performed on the extraction date

  • num_read_queries_on_day: total number of read queries performed on the extraction date

  • num_write_queries_on_day: total number of write queries performed on the extraction date

  • num_query_authors_on_day: number of unique query authors on the extraction date

  • num_table_to_table_lineage_links: total number of table lineage links

  • num_column_to_column_lineage_links: total number of column lineage links

  • num_new_table_to_table_lineage_links: number of table lineage added during the ingestion

  • num_new_column_to_column_lineage_links: number of column lineage added during the ingestion

More reports are coming soon !

For any feedbacks or requests concerning reports, please reach out ! Link

Last updated

Was this helpful?