-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide jobIds behind model creation in run_results.json #92
Comments
I also think this would be an interesting enhancement. Any plans to pick this up for a future release? |
This would be our biggest challenge with implementing this today. In the meantime, we do include:
Both of those should allow you to tie dbt's |
Thanks for the info @jtcohen6 This is indeed the primary goal to link to JOBS_BY_* views. Having the invocation_id in the job labels will solve this so there is no need for me to have the job_id in the logs or run_results.json file. In what dbt release will the invocation_id label be included? |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Missed this when I transferred BigQuery-specific issues a few months ago. This is analogous to dbt-labs/dbt-snowflake#7, and many of the same implementation details apply. I'm going to mark this a |
@jtcohen6 I'd love to take a crack at implementing this. Do you have any tips on where to get started / what an implementation would look like? |
@akshaan Sorry for the delayed response here! If you're still interested in contributing, we'd still love to have the change. The implementation here should look very very similar to the change proposed in dbt-labs/dbt-snowflake#40. It's just a matter of adding @dataclass
class BigQueryAdapterResponse(AdapterResponse):
bytes_processed: Optional[int] = None
job_id: Optional[str] = None # should this be Optional, or str = '' ? And then accessing the # for each statement type
job_id = query_job.job_id
# and then
response = BigQueryAdapterResponse(
_message=message,
rows_affected=num_rows,
code=code,
bytes_processed=bytes_processed,
job_id=job_id
) |
In BigQuery, each time SQL is executed, a job is created and assigned a unique ID. (Example:
job_9JVepH9O1bzemNfB6xZzxz_mfFkY
) When DBT runs it checks for the results/status of these jobs.We've found the
run_results.json
file extremely useful, and being able to tie these results (errors, compiled SQL, variables, tags, etc.) back to the specific job that executed in BigQuery for additional information and troubleshooting would makerun_results.json
even more useful.Describe alternatives you've considered
You could try to look at a combination of DBT logs and BigQuery logs in Stackdriver Logging and try to approximate which exact job corresponded to the model creation by timestamp proximity or query (compiled SQL), but those methods are not deterministic.
Additional context
This is specific to BigQuery for now, but I am sure there are parallels in other databases (transaction ID? query ID?).
Another aspect to consider would be whether to provide multiple
jobId
s if there is a macro called in the model that executes additional queries against the database.Who will this benefit?
Anyone that wants to tie run results back to specific jobs for additional debugging would benefit from this feature.
The text was updated successfully, but these errors were encountered: