-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use mashumaro for serializing logging events #4505
Conversation
Mashumaro required that the Event class be a dataclass. Changing the Event class to a dataclass with attributes with defaults means that the attributes of subclasses all have to have defaults too (because non-default attributes can only come before attributes with defaults). In some cases Event subclasses were being constructed without keyword names, which caused the passed in value to be used for the first attribute (i.e. log_version) which didn't work, so all constructors needed to use keyword arguments. Mashumaro also expects to serialize everything in the class, so it didn't work to have various node objects in the event classes. I switched to putting the node_info construction code in the nodes themselves and passing the node_info the event classes instead of passing in the various node objects. I would have liked to make _event_status and node_info into a separate class, but mypy really didn't like that. The stubs.py file was removed. I created a separate core/dbt/events/serialization.py file for handling the mashumaro serialization, since these classes didn't need some of the complications in core/dbt/dataclass_schema.py. It's also a good place to put other special serialization code if we need it. |
Integration tests with json serialization on are hanging in the interop tests. I suspect there's an exception being raised in threads again. |
I get a few failures too when I run |
83a2f95
to
6da0dec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of concerns with this approach:
- mypy can no longer catch malformed events such as events that have no
code
field. - mypy can no longer catch when we fail to provide event attributes even when mypy runs everywhere
- it looks like when an event takes a type that isn't serializable we are forced to convert it to a serializable type outside the event module which leaks a core abstraction of the event module back out to the rest of the code base
Since these warrant a larger discussion, I think we may way to make a new PR with the bug fixes you included here for the patch release, and consider the larger serialization change for 1.1.0.
"it looks like when an event takes a type that isn't serializable we are forced to convert it to a serializable type outside the event module which leaks a core abstraction of the event module back out to the rest of the code base" I'm not sure that I agree that it is the event module's job to enable serialization for outside classes. In any case, serializing classes that are not needed for the event message feels unnecessary, since it gets thrown away anyway. I think that we're better off providing only the information that is needed for the event/message. |
"mypy can no longer catch when we fail to provide event attributes even when mypy runs everywhere" Providing defaults for attributes of subclasses is something that the Python dataclasses module forces you to do. I think it's probably more important to consistently make the classes dataclasses than make a couple of methods abstractmethods, when executing those methods without an override will throw an error just like the abstract base class. |
The way I see it, the event module is tasked with translating dbt's internal representations into a structured output that is versioned as an API so that downstream applications can rely on it. Serialization and schema conformity are both major parts of creating this API to an enterprise quality. I want this code to be written in such a way that if you are not modifying the event module you do not need to know anything about how it was built or what it needs to continue to conform to the logging API version. If using dataclasses forces us to weaken our development-time guarantees for this API, we should switch tactics and do something else. |
6da0dec
to
1e5d2dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These recent commits address all 3 of my major concerns from my last review. Mypy is still able to be helpful in the ways it was planned to and there's a way to override serialization in serialization.py
. The following remarks are much smaller issues likely with much simpler solutions. Thanks for all the hard work here!
24c770d
to
fdef893
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some additional comments. There are still 5 unaddressed comments from my previous review.
6572b66
to
775959f
Compare
3fe8523
to
6468c96
Compare
Just finished a quick sync with @gshank to go over the remaining comments. Here are the outcomes from that conversation:
|
9b3a17b
to
a3f91e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's just that one value test for the log version that still needs to be updated, but this looks good to me!
in the event dictionary and leave it in the event data dictionary. Bump log version to 2.
a3f91e5
to
b85c113
Compare
resolves #4504
Description
Use mashumaro to serialize json logging.
Checklist
CHANGELOG.md
and added information about my change