-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse query comment and use as bigquery job labels. #3145
Parse query comment and use as bigquery job labels. #3145
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really cool @jmcarp!
Thinking about how to test this:
- A unit test could inspect
job_params[labels]
and confirms that they contain the keys from the default query comment (app
,profile_name
,target_name
,dbt_version
,node_id
) and thedbt_invocation_id
. Maybe the query comment-to-job label conversion should be a standalone function? - An integration test could run a model and check that the labels appropriately registered, by querying
INFORMATION_SCHEMA.JOBS_BY_USER
@@ -215,6 +215,7 @@ def to_target_dict(self): | |||
class QueryComment(dbtClassMixin): | |||
comment: str = DEFAULT_QUERY_COMMENT | |||
append: bool = False | |||
job_label: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config is query-comment
, rather than query_comment
(docs). Not exactly sure why we did this, but I guess kebab casing is common in dbt_project.yml
/ project-level configs. I think this should be job-label
(instead of job_label
) for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I switched the base class to HyphenatedDbtClassMixin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I may not understand what this base class does, since that change broke tests that I thought were unrelated. Reverted for now, but let me know if you have thoughts about the right way to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another look at this, and I think I found a subtle bug in the casing logic. Should be fixed now.
except (TypeError, ValueError): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the comment isn't valid json, should we still include it as a single job label (called query_comment
), truncated to 128 characters if need be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I changed it.
7b4b728
to
8d73ae2
Compare
if cls._hyphenated: | ||
# `data` might not be a dict, e.g. for `query_comment`, which accepts | ||
# a dict or a string; only snake-case for dict values. | ||
if cls._hyphenated and isinstance(data, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make sense @gshank?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at all the existing calls to this class method, it seems to be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I'm not sure exactly how this code gets reached, but I dropped into a debugger at this point on the unit test suite and found non-dictionary values--specifically, there's a test that sets query-comment
to ""
. You should be able to reproduce the issue by dropping the isinstance
check that I added and running something like pytest -s test/unit/test_query_headers.py -k test_disable_query_comment
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also run that test with the following added to pre_deserialize:
if cls._hyphenated and not isinstance(data, dict):
import ipdb; ipdb.set_trace()
Looking at the stack, I'm inside a function called __unpack_union_Project_query_comment__9f5a8d0e89384cc286d2b99acef5623d
that I'm guessing is dynamically generated and eval-ed by mashumaro 😱.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good from my vantage! One small comment from my testing. I think it'd be good to add a few unit tests of the various inputs/outputs of query comment sanitizing and structuring. Beyond that, I think this is ready for a changelog entry (v0.20.0 Feature).
I also tagged a member of the Core team for python code review.
_SANITIZE_LABEL_PATTERN = re.compile(r"[^a-z0-9_-]") | ||
|
||
|
||
def _sanitize_label(value: str, max_length: int = 63) -> str: | ||
"""Return a legal value for a BigQuery label.""" | ||
value = value.lower() | ||
value = _SANITIZE_LABEL_PATTERN.sub("_", value) | ||
value = value[: max_length - 1] | ||
return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anecdotally, this has the effect of appending an extra underscore at the end of my non-dict comment, e.g. comment: whatever
in dbt_project.yml
becomes query_comment: whatever_
in BigQuery, and comment: something else
becomes query_comment: something_else_
. Is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this is happening--there may be something upstream of this code adding trailing whitespace, because I don't get that behavior testing the function directly. I tried adding a strip()
. Does that help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f3f2f72
to
af3c3f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome @jmcarp! Thanks for adding those unit tests.
Functional review is good by me: I've confirmed this works as expected when job-label: True
(with default, custom JSON, and custom string comments), and just as crucially that it works as expected when job-label: False
or not supplied (the only label is dbt_invocation_id
= status quo).
I'll leave final code/style review to @nathaniel-may. After that, this is good to go.
…rp/bigquery-job-labels
No rush @jtcohen6, but it would be awesome to get this into the next release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was waiting on me @jmcarp. Thank you for your patience. Once we address my relatively minor questions here, this looks good to merge.
I do want an integration test so that we can detect regressions on this behavior from both dbt and bigquery, but I'm ok ticketing that in a separate PR if you are too @jtcohen6.
if cls._hyphenated: | ||
# `data` might not be a dict, e.g. for `query_comment`, which accepts | ||
# a dict or a string; only snake-case for dict values. | ||
if cls._hyphenated and isinstance(data, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at all the existing calls to this class method, it seems to be true.
@@ -931,3 +941,16 @@ def test_convert_time_type(self): | |||
expected = ['time', 'time', 'time'] | |||
for col_idx, expect in enumerate(expected): | |||
assert BigQueryAdapter.convert_time_type(agate_table, col_idx) == expect | |||
|
|||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't seem to use this style of test very often, but I actually really like it so I'm happy to introduce it to this file too.
test/unit/test_bigquery_adapter.py
Outdated
from dbt.adapters.base.query_headers import MacroQueryStringSetter | ||
from dbt.clients import agate_helper | ||
import dbt.exceptions | ||
from dbt.logger import GLOBAL_LOGGER as logger # noqa | ||
from dbt.context.providers import RuntimeConfigObject | ||
|
||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't block the merge on this, but I'd rather put this import up just above import unittest
.
"""Return a legal value for a BigQuery label.""" | ||
value = value.strip().lower() | ||
value = _SANITIZE_LABEL_PATTERN.sub("_", value) | ||
value = value[: max_length] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the behavior we want for long labels? Alternatives include raising our own error, or sending something we know is too long to big query and raising that db exception. As much as this will not often, if ever, be attempted on purpose, a typo could cause this which might be a slightly annoying user experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think passing through the input and letting bigquery crash makes sense.
b0016ee
to
044a6c6
Compare
Thanks for the review @nathaniel-may, and for the quick cleanup @jmcarp! I'll open a separate issue for the integration test (#3184). |
resolves #2483
Description
Try to parse query comment and pass values to bigquery job labels. Just a draft to see if I'm heading in the right direction. @jtcohen6
Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.