Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/bigquery): Pass whether view is materialized; pass last_altered correctly #7660

Merged

Conversation

asikowitz
Copy link
Collaborator

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@asikowitz asikowitz requested a review from treff7es March 21, 2023 18:23
@vercel
Copy link

vercel bot commented Mar 21, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
docs-website ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Mar 22, 2023 at 4:43PM (UTC)

comment=table.comment,
view_definition=table.view_definition,
materialized=table.table_type == "MATERIALIZED VIEW",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can add a bigquery_constants.py file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Mar 21, 2023
Copy link
Contributor

@treff7es treff7es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -18,6 +20,16 @@
logger: logging.Logger = logging.getLogger(__name__)


class BigqueryTableType(Enum):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@@ -18,6 +20,16 @@
logger: logging.Logger = logging.getLogger(__name__)


class BigqueryTableType(Enum):
# See https://cloud.google.com/bigquery/docs/information-schema-tables#schema
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3 thank you for linking documentation. more of this pls!

@@ -173,20 +186,20 @@ class BigqueryQuery:
sum(case when storage_tier = 'LONG_TERM' then total_billable_bytes else 0 end) as long_term_billable_bytes,
sum(case when storage_tier = 'ACTIVE' then total_billable_bytes else 0 end) as active_billable_bytes,
from
`{project_id}`.`{dataset_name}`.INFORMATION_SCHEMA.PARTITIONS
`{{project_id}}`.`{{dataset_name}}`.INFORMATION_SCHEMA.PARTITIONS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes scare me - do we need to make them?

Copy link
Collaborator

@hsheth2 hsheth2 Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are because it's using an f-string (template string), so we need to escape the {} characters when not doing var substitution

we then do another string format later which substitutes those

@patch(
"datahub.ingestion.source.bigquery_v2.bigquery_schema.BigQueryDataDictionary.get_query_result"
)
@patch("google.cloud.bigquery.client.Client")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

group by
table_name) as p on
t.table_name = p.table_name
WHERE
table_type in ('BASE TABLE', 'EXTERNAL')
{table_filter}
table_type in ({BigqueryTableType.BASE_TABLE}, {BigqueryTableType.EXTERNAL})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused - why not {{BigQueryTablType.BASE_TABLE}} -- two brackets instead of one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this change caused regression in final query due to missing quotes around constants.

Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice work addressing this!

@asikowitz asikowitz merged commit 95f9919 into datahub-project:master Mar 22, 2023
@asikowitz asikowitz deleted the bigquery-pass-materialized branch March 22, 2023 17:41
yoonhyejin pushed a commit that referenced this pull request Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants