Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blink_features.usage with ranks and partitioned #39

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

max-ostapenko
Copy link
Contributor

@max-ostapenko max-ostapenko commented Dec 21, 2024

  1. Added rank column.
    Resolves blink_features.usage has null rank column #25

  2. Removed the blink_features.features table, as it's a duplicate of a pages.features column.
    And now loading usage data directly to blink_features.usage partitioned table.

Table migration script:

CREATE TABLE `blink_features.usage_partitioned` (
  date DATE,
  client STRING,
  rank INT64,
  id STRING,
  feature STRING,
  type STRING,
  num_urls INT64,
  total_urls INT64,
  pct_urls FLOAT64,
  sample_urls ARRAY<STRING>
)
PARTITION BY date
CLUSTER BY client, rank, feature;

INSERT INTO `blink_features.usage_partitioned`
SELECT
  PARSE_DATE('%Y%m%d', yyyymmdd) AS date,
  client,
  NULL AS rank,
  id,
  feature,
  type,
  num_urls,
  total_urls,
  pct_urls,
  sample_urls
FROM `blink_features.usage`;

ALTER TABLE `blink_features.usage` RENAME TO `blink_features.usage_backup`;

CREATE TABLE `blink_features.usage`
COPY `blink_features.usage_partitioned`;

@max-ostapenko
Copy link
Contributor Author

@tunetheweb do you have edit access to the Looker report using this data?

@max-ostapenko max-ostapenko marked this pull request as draft December 21, 2024 16:42
@max-ostapenko max-ostapenko marked this pull request as ready for review December 21, 2024 17:05
@max-ostapenko max-ostapenko changed the title blink_features.usage partitioned blink_features.usage with ranks and partitioned Dec 21, 2024
@tunetheweb
Copy link
Member

@tunetheweb do you have edit access to the Looker report using this data?

Yes I do. You can see it here if you wanna clone it to try it against any changed data source.

@max-ostapenko
Copy link
Contributor Author

@tunetheweb do you have edit access to the Looker report using this data?
Yes I do. You can see it here if you wanna clone it to try it against any changed data source.

Data sources are not accessible - can't clone.

Do you want to migrate the table and adjust the report? Or I can do it if I have edit permission for the data sources and the dashboard.

@max-ostapenko
Copy link
Contributor Author

@tunetheweb, need your assistance to continue.
How do we update the dashboard?

@tunetheweb
Copy link
Member

Yeah on my todo list. Trying to get the reports finished first and then this is next thing.

@tunetheweb
Copy link
Member

This works but I wasted a lot of time figuring out why it wouldn't import into Looker Studio and it was due to the Partition filter: Required setting. You can't set this on a Looker Studio connection so I had to change it to a Custom SQL rather than a direct table.

I think the Partition filter: Required setting makes sense for the larger tables, where we want to avoid users running up large bills, but I don't think it's necessary here.

Can we recreate the table without that setting?

@max-ostapenko
Copy link
Contributor Author

max-ostapenko commented Feb 14, 2025

Updated the code.
But to clarify, I thought we'll continue using httparchive.blink_features.usage just with updated schema.

Let me know when I can replace it, or run a replacement script yourself:

ALTER TABLE `blink_features.usage` RENAME TO `blink_features.usage_backup`;

CREATE TABLE `blink_features.usage`
COPY `blink_features.usage_partitioned`;

@tunetheweb
Copy link
Member

But to clarify, I thought we'll continue using httparchive.blink_features.usage just with updated schema.

Agreed.

Let me know when I can replace it, or run a replacement script yourself:

Will run this when I'm back rather than change something and leave for holidays! Let's leave this PR open as a reminder and merge after crawl is finished and I'm back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

blink_features.usage has null rank column
2 participants