-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQuery adapter should record in a label the dbt invocation_id #2808
Comments
Hey @mescanne, I can do you one better—this is totally possible today with just a little bit of configuration, using the BigQuery models:
my_project:
+labels:
invocation_id: "{{ invocation_id }}" I'm going to close the issue, though I'd encourage you comment here if that approach isn't quite what you're after. |
Ah, no, that labels the tables and views. My proposal is to label the jobs themselves. This way you can track the invocation with the SQL jobs that you used for the invocation. It would make sense to always track the SQL jobs by the invocation, which is why I was proposing to do it always. |
Oh, I see! Sorry for my misunderstanding earlier. In that case, I believe this issue is a duplicate of #2483, which is to add support for user-configurable labels and tags on jobs (similar to how it works for tables and views). That said, I think it would be reasonable to include, by default, the same information in the |
Describe the feature
Performance analysis of BigQuery DBT invocations can be made with ease with a small feature within adapters/bigquery/connections.py. This is needed to record the dbt invocation_id as part of the job as a label.
Describe alternatives you've considered
Capturing statistics out of DBT is one alternative. There is no straight-forward path here.
Additional context
Within adapters/bigquery/connections.py within raw_execute there is a job_params being initialized. The job_params should have a labels dictionary added with, for example, "dbt_invocation_id" being set to dbt.tracking.active_user.invocation_id.
The API being used is here:
https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.QueryJobConfig.html#google.cloud.bigquery.job.QueryJobConfig
Using the INFORMATION_SCHEMA labels field can be used to extract for the unique invocation_id. This can give detailed information on performance of queries:
https://cloud.google.com/bigquery/docs/information-schema-jobs
The invocation_id could be used write to a logging table if logical information about the run should be recorded.
Who will this benefit?
This will benefit any organization using BigQuery with DBT who want to do systematic performance profiling.
Are you interested in contributing this feature?
Yes, but I am in the process of getting the CLA reviewed on my side.
The text was updated successfully, but these errors were encountered: