Popularity

How does it work in Catalog?

What is popularity?

A table or Dashboard popularity tells you how frequently it is used by human users.

The popularity is a score given to data assets from 1 to 1 million and then downsized to a 5 stars system.

The popularity computation is a quantile computation based on the number of queries (for table) or number of views (for dashboard) amongst all tables/dashboards of the same source.

We may notably exclude table queries from specific users (settings on the extraction), usually coming from bots or services.

How is it computed?

Ranking: We sort the assets with respect to their score (number of queries/views)

Bucketing: We put the asset into different buckets according to their rank, aka "global scoring”

There are 8 buckets of varying size with the following thresholds:

Bucket 0: assets ranked between 0% -> 33% -- Bottom 33%
Bucket 1: assets ranked between 33% -> 48%
Bucket 2: assets ranked between 48% -> 63%
Bucket 3: assets ranked between 63% -> 73%
Bucket 4: assets ranked between 73% -> 83%
Bucket 5: assets ranked between 83% -> 93%
Bucket 6: assets ranked between 93% -> 98%
Bucket 7: assets ranked between 98% -> 100% -- Top 2%

In-bucket ranking: Within a single bucket, we sort again the assets and share them equally for
Final Score: Finally we compute a score out of the max popularity for all assets of a given source
- With MAX being 1 000 000
- With the number of buckets being 8
From the number to the stars 💫
- Everything before this step gives us a score that we store in our database, however, another process happens in the frontend in order to show you the number of stars according to the score.
- As mentioned, the popularity can range from 0 to 1 000 000 (or be undefined). Then we bucket it down to 11 states, corresponding to stars and half stars - 0, 0.5, 1, ..., 4.5, 5.

Which queries are used for computation

First of all, popularity is calculated on 30 days of activities following the last refresh of your source.

Then there are a few exclusions that allow us to determine a more accurate popularity:

We only use read queries, we want to be about usage, not update
We exclude queries that are immense or too small
We exclude service accounts from the calculation as we want to determine human usage and behavior. Queries by service account are translated in the lineage as you'll find parent/children assets there.

Some facts

There is exactly 1 asset per source with a perfect score of 1M
Due to the way the bucketing is done, two assets with the same number of queries might end up in 2 different buckets
The top 2% of assets are all in the 8th bucket and hence have a score over 875000 which means a number of stars between 4.5 and 5.
The bottom 33% of assets are all in the 1st bucket and as such have a score lower than 125000 which means a number of stars between 0 and 0.5

PreviousSearch your asset NextInsights and Expert views

Last updated 4 months ago

Was this helpful?