Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Label BigQuery jobs with Trino query id #16187

Merged
merged 2 commits into from
May 16, 2023

Conversation

wendigo
Copy link
Contributor

@wendigo wendigo commented Feb 20, 2023

Description

It makes it easier to track the origin of BQ DDL and DML queries.

Release notes

(x) This is not user-visible or docs only and no release notes are required.

@cla-bot cla-bot bot added the cla-signed label Feb 20, 2023
Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second commit lgtm

@wendigo wendigo requested a review from ebyhr February 21, 2023 09:54
@wendigo wendigo force-pushed the serafin/label-bq-queries branch from 06045c6 to 915be3f Compare February 21, 2023 09:54
@wendigo wendigo requested a review from kokosing February 21, 2023 09:55
@wendigo
Copy link
Contributor Author

wendigo commented Feb 21, 2023

@ebyhr can you run BQ tests?

@ebyhr
Copy link
Member

ebyhr commented Feb 21, 2023

/test-with-secrets sha=915be3f3e808f27aa0c2b684f342147a5a9a7173

@github-actions
Copy link

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4233150724

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit order looks a little weird. For instance, 1st commit adds ConnectorSession to BigQueryClient.query, and 2nd commit removes it. I'm fine with squashing commits into one.

@ebyhr ebyhr force-pushed the serafin/label-bq-queries branch from 915be3f to ebe9ec2 Compare March 22, 2023 04:33
@github-actions github-actions bot added the bigquery BigQuery connector label Mar 22, 2023
@wendigo wendigo force-pushed the serafin/label-bq-queries branch from ebe9ec2 to 9055938 Compare March 28, 2023 09:28
@wendigo
Copy link
Contributor Author

wendigo commented Mar 28, 2023

@ebyhr PTAL

@wendigo wendigo requested a review from ebyhr March 28, 2023 09:28
@wendigo wendigo force-pushed the serafin/label-bq-queries branch from 9055938 to 7c1ee90 Compare May 4, 2023 14:52
@wendigo
Copy link
Contributor Author

wendigo commented May 4, 2023

@ebyhr @hashhar ptal and run with secrets

@hashhar
Copy link
Member

hashhar commented May 8, 2023

/test-with-secrets sha=7c1ee905b13cecf03ab56cf27b5405f0b53f4a21

@github-actions
Copy link

github-actions bot commented May 8, 2023

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4912226543

@wendigo wendigo force-pushed the serafin/label-bq-queries branch from 7c1ee90 to 32618b7 Compare May 8, 2023 09:26
@wendigo
Copy link
Contributor Author

wendigo commented May 8, 2023

@ebyhr ptal. I needed to revert to the previous approach. Please see a rationale in the commit message why passing ConnectorSession is actually needed.

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have a test for it? To ask BQ what labels were set when Trino did a query?

@wendigo
Copy link
Contributor Author

wendigo commented May 8, 2023

@kokosing it's not possible to have this test with a current set of privileges that the credential key has.

@hashhar
Copy link
Member

hashhar commented May 8, 2023

Also I don't think it really needs elevated permissions, I see and can query from information schema https://cloud.google.com/bigquery/docs/viewing-labels

@wendigo wendigo force-pushed the serafin/label-bq-queries branch 2 times, most recently from 81ff158 to 1fb60a5 Compare May 8, 2023 14:10
@ebyhr ebyhr dismissed their stale review May 8, 2023 22:53

The another commit was pushed since the last review

@wendigo wendigo force-pushed the serafin/label-bq-queries branch from 1fb60a5 to 00b04af Compare May 9, 2023 09:18
@wendigo
Copy link
Contributor Author

wendigo commented May 9, 2023

@hashhar @kokosing test added. I hope that permissions will be working here.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % format-interpolation feature


@Config("bigquery.job.label-format")
@ConfigDescription("Adds `trino_query` label to the BigQuery job with provided value format")
public BigQueryConfig setQueryLabelFormat(String queryLabelFormat)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while it's useful to use common format across connectors for BigQuery it's probably less powerful in the way we use it since all information gets stuffed into single label in BigQuery - I feel this is prematurely adding a feature which would make it harder to evolve over time as people request changes.

What do you think @kokosing @ebyhr - should we not add it at the moment and just label the query id for now?

Alternatively we can convert the config into a boolean toggle and if enabled add one label per each pre-defined placeholder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should allow to define the label key. So people may organize it the way the like.

I am under impression that the current setup make is very customizable so one can use just query id for labelling. The other can format the label as JSON and parse it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Test
public void testQueryLabel()
{
String materializedView = "test_query_label" + randomNameSuffix();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a specific reason to use an MV? Add a comment or simplify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, labels are added to jobs and jobs are used only when querying non-storage-backed relations (i.e. views)

@hashhar
Copy link
Member

hashhar commented May 9, 2023

/test-with-secrets sha=00b04af367fe3febd59ae472a4f9fa5447f87652


@Config("bigquery.job.label-format")
@ConfigDescription("Adds `trino_query` label to the BigQuery job with provided value format")
public BigQueryConfig setQueryLabelFormat(String queryLabelFormat)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should allow to define the label key. So people may organize it the way the like.

I am under impression that the current setup make is very customizable so one can use just query id for labelling. The other can format the label as JSON and parse it later.

@wendigo
Copy link
Contributor Author

wendigo commented May 9, 2023

@kokosing @hashhar airlift/airlift#1066 this would allow this feature to be generic i.e. defining labels as:

bigquery.job.labels.label1 = $QUERY_ID
bigquery.job.labels.another-label = $TRACE_TOKEN

Asking the user to input JSON and parsing it is adding a complexity here, not hiding it from a user.

@github-actions
Copy link

github-actions bot commented May 9, 2023

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4926624071

@wendigo wendigo force-pushed the serafin/label-bq-queries branch from 00b04af to a68ab2e Compare May 15, 2023 13:44
@wendigo wendigo requested review from hashhar, kokosing and ebyhr May 15, 2023 13:45
@wendigo
Copy link
Contributor Author

wendigo commented May 15, 2023

@hashhar @kokosing there won't be a possibility to add much information to this label value as it's limited to 63 chars, and lowercase letters, digits, underscore and hyphen. No JSON possible

wendigo added 2 commits May 15, 2023 21:16
ConnectorSession needs to be passed to query/update methods because BigQueryClient is cached using
identityCacheMapping.getRemoteUserCacheKey() which is not taking into account session properties.

We need also to access queryId in order to properly label queries but we don't want to cache client per query id.
@wendigo wendigo force-pushed the serafin/label-bq-queries branch from a68ab2e to 56af278 Compare May 15, 2023 19:20
@wendigo
Copy link
Contributor Author

wendigo commented May 16, 2023

PTAL @hashhar @kokosing

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Eventually we should plan to have multiple labels instead of adding all information in a single label.

@hashhar
Copy link
Member

hashhar commented May 16, 2023

/test-with-secrets sha=56af27891d3001a0358bfd5b1bb4e3f76dee4c6c

@github-actions
Copy link

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4988294291

@hashhar hashhar merged commit c999795 into trinodb:master May 16, 2023
@hashhar
Copy link
Member

hashhar commented May 16, 2023

Do we want to not document this/mention in release notes until we have ability to add separate labels?

@kokosing @wendigo @ebyhr ?

@kokosing
Copy link
Member

I would prefer to document this as it is today.

@hashhar hashhar mentioned this pull request May 16, 2023
@wendigo wendigo deleted the serafin/label-bq-queries branch May 16, 2023 10:08
@github-actions github-actions bot added this to the 418 milestone May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery BigQuery connector cla-signed
Development

Successfully merging this pull request may close these issues.

5 participants