structured logging: add data attributes to json log output #4301

emmyoop · 2021-11-17T17:40:27Z

Description

Add the dataclass attributes to a data key in the json log output.

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change

emmyoop · 2021-11-17T17:48:06Z

Example output:

{"data": {"v": "=1.0.0-rc1"}, "level": "info ", "msg": "Running with dbt=1.0.0-rc1", "pid": 23222, "ts": "2021-11-17T11:35:28.328580"}
{"data": {"stat_line": "5 models, 4 tests, 0 snapshots, 0 analyses, 166 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics"}, "level": "info ", "msg": "Found 5 models, 4 tests, 0 snapshots, 0 analyses, 166 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics", "pid": 23222, "ts": "2021-11-17T11:35:28.328580"}
{"data": {}, "level": "info ", "msg": "", "pid": 23222, "ts": "2021-11-17T11:35:28.328580"}

nathaniel-may · 2021-11-17T18:05:50Z

core/dbt/events/functions.py

@@ -129,6 +129,7 @@ def create_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
 def create_json_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
    values = e.to_dict(scrub_secrets(msg_fn(e), env_secrets()))
    values['ts'] = e.ts.isoformat()
+    values['data'] = {k: scrub_secrets(str(v), env_secrets()) for (k, v) in e.__dict__.items()}


This is where my python knowledge is a little weak. I know there are lots of ways to access class attributes and I'm not sure what the tradeoffs of e.__dict__.items() is compared to the others. Do you have thoughts on this choice?

I should use vars() here instead of __dict__. My understanding is there's isn't much of a real difference between the two in terms of what they're doing. Were you thinking of another way to solve this?

not in particular no. I am curious what @iknox-fa thinks though since he might know more of the alternatives.

Oh there be dragons there, maybe.

vars() and _dict_() both do the same thing, but I'm not sure it's what you want there. They contain all writeable attributes of an object which means it will be scrubbing some stuff you might not want to scrub (log levels, tags, anything else we hang on an event, etc).

If the scrub method handles that gracefully there's no problems though.

I think it's fine to ignore everything on the superclasses. Those should be translated into the top level object via the rest of this function.

@iknox-fa that looks like exactly what we want. Although right now not every class is a dataclass so we might get an AttributeError on x.__dataclass_fields__. I'm fine making everything a dataclass that might not be currently, as long as it's not abstract because of that mypy bug.

@nathaniel-may Oh good point. You can't use dataclass features without... dataclasses. This may also run afoul of our 3.6 dataclass library. I'm def getting into the nitty-gritty of how DCs work and they may not have implemented the full 3.7+ api on those.

@iknox-fa and @nathaniel-may What about just a simple

if hasattr(e, '__dataclass_fields'): values['data2'] = {x:str(getattr(e, x)) for x,y in e.__dataclass_fields__.items() if type(y._field_type) == _FIELD_BASE}

We officially deprecated 3.6 (right?) for 1.0.0 so that should be fine.

yeah that pattern looks totally reasonable.

Yep that works too!

emmyoop · 2021-11-19T01:00:53Z

core/dbt/events/functions.py

+        values['data'] = None
+    log_line = json.dumps(
+        {k: scrub_secrets(v, env_secrets()) for (k, v) in values.items()},
+        default=lambda x: set_default(x),


@iknox-fa thoughts on this? Since our attributes are a bit all over the place in terms of type, many are not serializable. I don't like what I did but I'm not sure what the better alternative is. Any advice would be appreciated!

Good question! I would probably go for a try/except on json.JSONDecodeError. You could also catch TypeError exceptions if we thought someone would use non-json types in the keys of the dict (this seems highly unlikely, but it's technically possible)

iknox-fa · 2021-11-19T17:43:38Z

core/dbt/events/functions.py

+        )
+    except TypeError:
+        # the only key currently throwing errors is 'data'.  Expand this list
+        # as needed if new issues pop up


We should probably have some indicator that part of the event record is missing from the json log. Can we add a
values["event_data_failed_serailize"] = True or something similar?

iknox-fa · 2021-11-19T17:49:16Z

core/dbt/events/functions.py

+    except TypeError:
+        # the only key currently throwing errors is 'data'.  Expand this list
+        # as needed if new issues pop up
+        safe_values = {k: v for (k, v) in values.items() if k not in ('data')}


Since you're not using values after this point you can just pop the data key.

except... values.pop("data") log_line = blahblah(in values.items())

…lues

iknox-fa

LGTM

cla-bot bot added the cla:yes label Nov 17, 2021

emmyoop mentioned this pull request Nov 17, 2021

Structured Logging Phase 2 #4260

Closed

26 tasks

emmyoop requested review from nathaniel-may and iknox-fa November 17, 2021 17:59

nathaniel-may reviewed Nov 17, 2021

View reviewed changes

emmyoop force-pushed the er/sl-dataclass-attributes branch 3 times, most recently from c7e0103 to f716d3c Compare November 19, 2021 00:17

emmyoop commented Nov 19, 2021

View reviewed changes

iknox-fa reviewed Nov 19, 2021

View reviewed changes

emmyoop force-pushed the er/sl-dataclass-attributes branch from 1cd84ab to 282261c Compare November 19, 2021 20:04

emmyoop added 7 commits November 19, 2021 15:00

simplified data construction

0af9e3a

fixed missed scrubbing of secrets

406c172

switched to vars()

13a4946

scrub entire log line, update how attributes get pulled

a5b53e4

get ahead of serialization errors

d2821ee

store if data is serialized and modify values instead of a copy of va…

cd2abec

…lues

fixed unused import from merge

1454150

iknox-fa approved these changes Nov 19, 2021

View reviewed changes

emmyoop force-pushed the er/sl-dataclass-attributes branch from 282261c to 1454150 Compare November 19, 2021 21:35

emmyoop merged commit c541eca into main Nov 19, 2021

emmyoop deleted the er/sl-dataclass-attributes branch November 19, 2021 21:43

emmyoop restored the er/sl-dataclass-attributes branch November 19, 2021 22:29

emmyoop deleted the er/sl-dataclass-attributes branch November 19, 2021 22:31

emmyoop restored the er/sl-dataclass-attributes branch November 22, 2021 17:09

emmyoop deleted the er/sl-dataclass-attributes branch February 14, 2022 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

structured logging: add data attributes to json log output #4301

structured logging: add data attributes to json log output #4301

emmyoop commented Nov 17, 2021

emmyoop commented Nov 17, 2021

nathaniel-may Nov 17, 2021

emmyoop Nov 17, 2021

nathaniel-may Nov 17, 2021

iknox-fa Nov 18, 2021

nathaniel-may Nov 18, 2021

nathaniel-may Nov 18, 2021

iknox-fa Nov 18, 2021

emmyoop Nov 18, 2021 •

edited

Loading

nathaniel-may Nov 18, 2021

iknox-fa Nov 19, 2021

emmyoop Nov 19, 2021

iknox-fa Nov 19, 2021 •

edited

Loading

iknox-fa Nov 19, 2021

iknox-fa Nov 19, 2021

iknox-fa left a comment

structured logging: add data attributes to json log output #4301

structured logging: add data attributes to json log output #4301

Conversation

emmyoop commented Nov 17, 2021

Description

Checklist

emmyoop commented Nov 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emmyoop Nov 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iknox-fa Nov 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iknox-fa left a comment

Choose a reason for hiding this comment

emmyoop Nov 18, 2021 •

edited

Loading

iknox-fa Nov 19, 2021 •

edited

Loading