Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

structured logging: add data attributes to json log output #4301

Merged
merged 7 commits into from
Nov 19, 2021

Conversation

emmyoop
Copy link
Member

@emmyoop emmyoop commented Nov 17, 2021

Description

Add the dataclass attributes to a data key in the json log output.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change

@cla-bot cla-bot bot added the cla:yes label Nov 17, 2021
@emmyoop emmyoop mentioned this pull request Nov 17, 2021
26 tasks
@emmyoop
Copy link
Member Author

emmyoop commented Nov 17, 2021

Example output:

{"data": {"v": "=1.0.0-rc1"}, "level": "info ", "msg": "Running with dbt=1.0.0-rc1", "pid": 23222, "ts": "2021-11-17T11:35:28.328580"}
{"data": {"stat_line": "5 models, 4 tests, 0 snapshots, 0 analyses, 166 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics"}, "level": "info ", "msg": "Found 5 models, 4 tests, 0 snapshots, 0 analyses, 166 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics", "pid": 23222, "ts": "2021-11-17T11:35:28.328580"}
{"data": {}, "level": "info ", "msg": "", "pid": 23222, "ts": "2021-11-17T11:35:28.328580"}

@@ -129,6 +129,7 @@ def create_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
def create_json_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
values = e.to_dict(scrub_secrets(msg_fn(e), env_secrets()))
values['ts'] = e.ts.isoformat()
values['data'] = {k: scrub_secrets(str(v), env_secrets()) for (k, v) in e.__dict__.items()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where my python knowledge is a little weak. I know there are lots of ways to access class attributes and I'm not sure what the tradeoffs of e.__dict__.items() is compared to the others. Do you have thoughts on this choice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should use vars() here instead of __dict__. My understanding is there's isn't much of a real difference between the two in terms of what they're doing. Were you thinking of another way to solve this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in particular no. I am curious what @iknox-fa thinks though since he might know more of the alternatives.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh there be dragons there, maybe.

vars() and _dict_() both do the same thing, but I'm not sure it's what you want there. They contain all writeable attributes of an object which means it will be scrubbing some stuff you might not want to scrub (log levels, tags, anything else we hang on an event, etc).

If the scrub method handles that gracefully there's no problems though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to ignore everything on the superclasses. Those should be translated into the top level object via the rest of this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iknox-fa that looks like exactly what we want. Although right now not every class is a dataclass so we might get an AttributeError on x.__dataclass_fields__. I'm fine making everything a dataclass that might not be currently, as long as it's not abstract because of that mypy bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nathaniel-may Oh good point. You can't use dataclass features without... dataclasses. This may also run afoul of our 3.6 dataclass library. I'm def getting into the nitty-gritty of how DCs work and they may not have implemented the full 3.7+ api on those.

Copy link
Member Author

@emmyoop emmyoop Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iknox-fa and @nathaniel-may What about just a simple

if hasattr(e, '__dataclass_fields'):
        values['data2'] = {x:str(getattr(e, x)) for x,y in e.__dataclass_fields__.items() if type(y._field_type) == _FIELD_BASE}

We officially deprecated 3.6 (right?) for 1.0.0 so that should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that pattern looks totally reasonable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that works too!

@emmyoop emmyoop force-pushed the er/sl-dataclass-attributes branch 3 times, most recently from c7e0103 to f716d3c Compare November 19, 2021 00:17
values['data'] = None
log_line = json.dumps(
{k: scrub_secrets(v, env_secrets()) for (k, v) in values.items()},
default=lambda x: set_default(x),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iknox-fa thoughts on this? Since our attributes are a bit all over the place in terms of type, many are not serializable. I don't like what I did but I'm not sure what the better alternative is. Any advice would be appreciated!

Copy link
Contributor

@iknox-fa iknox-fa Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! I would probably go for a try/except on json.JSONDecodeError. You could also catch TypeError exceptions if we thought someone would use non-json types in the keys of the dict (this seems highly unlikely, but it's technically possible)

)
except TypeError:
# the only key currently throwing errors is 'data'. Expand this list
# as needed if new issues pop up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have some indicator that part of the event record is missing from the json log. Can we add a
values["event_data_failed_serailize"] = True or something similar?

except TypeError:
# the only key currently throwing errors is 'data'. Expand this list
# as needed if new issues pop up
safe_values = {k: v for (k, v) in values.items() if k not in ('data')}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're not using values after this point you can just pop the data key.

except...
    values.pop("data")
    log_line = blahblah(in values.items())

@emmyoop emmyoop force-pushed the er/sl-dataclass-attributes branch from 1cd84ab to 282261c Compare November 19, 2021 20:04
Copy link
Contributor

@iknox-fa iknox-fa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@emmyoop emmyoop force-pushed the er/sl-dataclass-attributes branch from 282261c to 1454150 Compare November 19, 2021 21:35
@emmyoop emmyoop merged commit c541eca into main Nov 19, 2021
@emmyoop emmyoop deleted the er/sl-dataclass-attributes branch November 19, 2021 21:43
@emmyoop emmyoop restored the er/sl-dataclass-attributes branch November 19, 2021 22:29
@emmyoop emmyoop deleted the er/sl-dataclass-attributes branch November 19, 2021 22:31
@emmyoop emmyoop restored the er/sl-dataclass-attributes branch November 22, 2021 17:09
@emmyoop emmyoop deleted the er/sl-dataclass-attributes branch February 14, 2022 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants