-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guard against unnecessarily calling dump_graph in logging #4619
Conversation
core/dbt/events/README.md
Outdated
# Logging | ||
When events are processed via `fire_event`, nearly everything is logged. Whether or not the user has enabled the debug flag, all debug messages are still logged to the file. However, some events are particularly time consuming to construct because they return a huge amount of data. Today, the only messages in this category are cache events and are only logged if the `--log-cache-events` flag is on. This is important because these messages should not be created unless they are going to be logged, because they cause a noticable performance degredation. We achieve this by making the event class explicitly use lazy values for the expensive ones so they are not computed until the moment they are required. This is done with the data type `core/dbt/lazy.py::Lazy` which includes usage documentation. | ||
|
||
Example: | ||
``` | ||
@dataclass | ||
class DumpBeforeAddGraph(DebugLevel, Cache): | ||
dump: Lazy[Dict[str, List[str]]] | ||
code: str = "E031" | ||
|
||
def message(self) -> str: | ||
return f"before adding : {self.dump.force()}" | ||
``` | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
# This is an explicit deserializer for the type Lazy[Dict[str, List[str]]] | ||
# mashumaro does not support composing serialization strategies, so all | ||
# future uses of Lazy will need to register a unique serialization class like this one. | ||
class LazySerialization1(SerializationStrategy): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like this name with just an integer on the end. But I think it's as good as you get without being overly verbose and still communicating that there can eventually be multiple associated with Lazy
. No action needed, just needed to scratch the itch.
As the author of mashumaro, I would be happy to help you sort out the problem if you would explain what you want to achieve. If it's not difficult, could you please write a sample code explaining the problem? You can do it here or create a new issue in the mashumaro repo. As far as I understand, you want to serialize generic non dataclass types. Can GenericSerializableType help with it? You can also have a generic dataclass like your |
Hi, @Fatal1ty That's very kind of you to offer to help here. Mashumaro does have several options for serializing generic types, but I haven't managed to figure out how to defer specifying the serialization for the concrete inner type to mashumaro. Using the code in this PR, I currently have a class LazySerialization1(SerializationStrategy):
def serialize(self, value) -> Dict[str, List[str]]:
return value.force()
class Config(MashBaseConfig):
serialization_strategy = {
...
Lazy[Dict[str, List[str]]]: LazySerialization1()
} And this works great, but when I swap out the class to instead use a value of What I want is a way for the above to work for any type class LazySerialization(SerializationStrategy):
def serialize(self, value):
return value.force()
class Config(MashBaseConfig):
serialization_strategy = {
...
Lazy: LazySerialization()
} |
@nathaniel-may thank you for the details! I came up with the idea of using from dataclasses import dataclass
from datetime import date
from typing import Generic, TypeVar
from mashumaro import DataClassDictMixin
from mashumaro.types import SerializableType
T = TypeVar("T")
@dataclass
class Lazy(Generic[T], DataClassDictMixin, SerializableType):
inner: T
def _serialize(self) -> T:
return self.to_dict()['inner']
@classmethod
def _deserialize(cls, value):
return cls.from_dict({'inner': value})
class Config:
serialization_strategy = {
# ...
}
@dataclass
class LazyTest(DataClassDictMixin):
lazy: Lazy[date]
obj = LazyTest.from_dict({'lazy': '2022-01-31'})
assert obj.lazy == Lazy[date](date(2022, 1, 31))
print(obj.to_dict())
assert obj.to_dict() == {'lazy': '2022-01-31'} But unfortunately it won't work as expected. I think it's a bug that could be fixed. I'll dig deeper and try to fix it. |
@gshank suggested |
@Fatal1ty, since I've closed this PR if you'd like to continue our conversation in an issue or PR in the mashumaro repo, I'm happy to do so. |
Nice work here!! Is this a fix we're going to be able to backport to |
@jtcohen6 this definitely depends on #4505. If it's important to backport, I can make a different commit to 1.0.latest but the user-facing behavior won't be exactly the same. If I exclude the |
@nathaniel-may I'd be supportive of fixing the performance regression, even at the cost of losing that information in the structured logging output. To that end:
This feels like the right balance of effort + impact |
@nathaniel-may actually I missed a much simpler solution that works right now. You can wrap a value in the dataclass and use serialization hooks: from dataclasses import dataclass
from datetime import date
from typing import Generic, TypeVar
from mashumaro import DataClassDictMixin
T = TypeVar("T")
@dataclass
class Lazy(Generic[T], DataClassDictMixin):
inner: T
def __post_serialize__(self: T, d):
return d.pop('inner')
@classmethod
def __pre_deserialize__(cls, d):
return {'inner': d}
# @classmethod
# def __post_deserialize__(cls, obj: T):
# # if you want to get LazyTest(lazy=datetime.date(2022, 1, 31))
# # instead of LazyTest(lazy=Lazy(inner=datetime.date(2022, 1, 31)))
# # but serialization will be broken because date doesn't have to_dict
# return obj.inner
class Config:
serialization_strategy = {
# ...
}
@dataclass
class LazyTest(DataClassDictMixin):
lazy: Lazy[date]
obj = LazyTest.from_dict({'lazy': '2022-01-31'})
print(obj)
assert obj.lazy == Lazy[date](date(2022, 1, 31))
print(obj.to_dict())
assert obj.to_dict() == {'lazy': '2022-01-31'} Also, UnserializableField exception on generic classes that implement SerializableType interface is a bug that will be fixed in the next version. |
* add lazy type and apply to cache events automatic commit by git-black, original commits: 13b1865
Hey, @Fatal1ty Thanks for the update! If we have to make any modifications to this area of the code I'll definitely look into using this strategy 💪. However, since it seems like it's been easy to upgrade the last few versions I'm looking forward to using your fix when it comes out. |
resolves #4569
Description
Lazy
data type that can be reused anywhere in the code base (code quality passesmypy --strict
without any ignores)fire_event
function will work with this new lazy constructionCache
apply this lazy evaluation strategy to catch similar issues in the futureThe Challenge
Mashumaro does not support composing serialization strategies which makes serializing generic types, like
Lazy
tedious and error prone. Even if the wrapped typeT
has a mashumaro serialization strategy, it must be explicitly called by the serializer forLazy
itself. To be explicit, this means we would have to manually code mashumaro serialization strategies for all combinations of types that Lazy wraps in our code base e.g.Lazy[str]
,Lazy[int]
,Lazy[Dict[str, List[str]]]
.I have attempted to dig into Mashumaro's metaprogramming library to see if I can write a truly generic serialization for the
Lazy
type, but I haven't been able to figure it out.Questions
Lazy
type to more easily communicate intention, and get good mypy type checking but introduce room for runtime errors when we forget to add a serialization strategy when it's going to be serialized like in the events module.Lazy
so we don't have this gap to introduce new errors in serialization and instead pass unapplied methods with# type: ignore
because mypy doesn't handle them well.Suggested Review Flow
dbt.lazy.py::Lazy
.Checklist
CHANGELOG.md
and added information about my change