-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
structured logging: add data attributes to json log output #4301
Conversation
Example output:
|
core/dbt/events/functions.py
Outdated
@@ -129,6 +129,7 @@ def create_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str: | |||
def create_json_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str: | |||
values = e.to_dict(scrub_secrets(msg_fn(e), env_secrets())) | |||
values['ts'] = e.ts.isoformat() | |||
values['data'] = {k: scrub_secrets(str(v), env_secrets()) for (k, v) in e.__dict__.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where my python knowledge is a little weak. I know there are lots of ways to access class attributes and I'm not sure what the tradeoffs of e.__dict__.items()
is compared to the others. Do you have thoughts on this choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should use vars()
here instead of __dict__
. My understanding is there's isn't much of a real difference between the two in terms of what they're doing. Were you thinking of another way to solve this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not in particular no. I am curious what @iknox-fa thinks though since he might know more of the alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh there be dragons there, maybe.
vars()
and _dict_()
both do the same thing, but I'm not sure it's what you want there. They contain all writeable attributes of an object which means it will be scrubbing some stuff you might not want to scrub (log levels, tags, anything else we hang on an event, etc).
If the scrub method handles that gracefully there's no problems though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to ignore everything on the superclasses. Those should be translated into the top level object via the rest of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iknox-fa that looks like exactly what we want. Although right now not every class is a dataclass so we might get an AttributeError
on x.__dataclass_fields__
. I'm fine making everything a dataclass that might not be currently, as long as it's not abstract because of that mypy bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nathaniel-may Oh good point. You can't use dataclass features without... dataclasses. This may also run afoul of our 3.6 dataclass library. I'm def getting into the nitty-gritty of how DCs work and they may not have implemented the full 3.7+ api on those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iknox-fa and @nathaniel-may What about just a simple
if hasattr(e, '__dataclass_fields'):
values['data2'] = {x:str(getattr(e, x)) for x,y in e.__dataclass_fields__.items() if type(y._field_type) == _FIELD_BASE}
We officially deprecated 3.6 (right?) for 1.0.0 so that should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that pattern looks totally reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that works too!
c7e0103
to
f716d3c
Compare
core/dbt/events/functions.py
Outdated
values['data'] = None | ||
log_line = json.dumps( | ||
{k: scrub_secrets(v, env_secrets()) for (k, v) in values.items()}, | ||
default=lambda x: set_default(x), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iknox-fa thoughts on this? Since our attributes are a bit all over the place in terms of type, many are not serializable. I don't like what I did but I'm not sure what the better alternative is. Any advice would be appreciated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! I would probably go for a try/except on json.JSONDecodeError
. You could also catch TypeError
exceptions if we thought someone would use non-json types in the keys of the dict (this seems highly unlikely, but it's technically possible)
) | ||
except TypeError: | ||
# the only key currently throwing errors is 'data'. Expand this list | ||
# as needed if new issues pop up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably have some indicator that part of the event record is missing from the json log. Can we add a
values["event_data_failed_serailize"] = True
or something similar?
core/dbt/events/functions.py
Outdated
except TypeError: | ||
# the only key currently throwing errors is 'data'. Expand this list | ||
# as needed if new issues pop up | ||
safe_values = {k: v for (k, v) in values.items() if k not in ('data')} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're not using values
after this point you can just pop the data
key.
except...
values.pop("data")
log_line = blahblah(in values.items())
1cd84ab
to
282261c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
282261c
to
1454150
Compare
Description
Add the dataclass attributes to a data key in the json log output.
Checklist
CHANGELOG.md
and added information about my change