-
-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] warn_on_unsupported_args
a significant bottleneck when adding many series
#4237
Comments
Iterating over the DefaultsDict is very expensive. The first iteration, which extracts the Plot (the first entry obtained), consumes 359 allocations totaling 21 KiB. (Why so much just to iterate?) (UPDATE: Ah, I was missing a bit - the issue is the first iteration has a lot of overhead, unrelated to the item.) Each subsequent member of the dictionary iterated over uses 3 allocations of ~100 bytes each Every subsequent time the Plot is iterated (on a new series), it consumes 186 allocations totaling 8 KiB each time. This |
Why is this function in RecipesPipeline defined this way, rather than just taking a union?
Seems to be equivalent. |
Is there any reason Because if I restrict iteration to the explicit keys, I can cut the vast majority of the remaining allocations in this method. (2.19 M allocations: 134.719 MiB) |
Where ? |
Line 1614 in 539e5da
|
Those aren't the default ones, but the modified attributes, no ? |
The DefaultsDict contains two dictionaries, and provides a unified view of the default and explicit keys, just with the explicit keys taking precedence. |
Hi see. I see no valid reason to iterate over default keys which are by definition supported. Honestly, just by the look of the implementation of |
Thanks! The headache is really all due to the implementation of DefaultsDict iteration, which should probably be revised but that seems to be a bigger challenge. |
Just by changing to |
Yep! But I'll add a method in RecipesPipeline so it's not accessing an internal directly. |
PR, CI, merge & release at will ;) |
Maybe another justification for #4359, please weigh in there if you concur. |
In profiling the creation of a plot with many series, after #4235 and #4236, a significant bottleneck is
warn_on_unsupported_args
, with two lines taking up a combined 23% of samples: 1614 and 1623, which I believe is just both ends of the loop iterating over plot attributes. That's a lot of work just to inform people about deprecations.This function is run every time a series is added.
Current (after the two prior PRs):
365.065 ms (2814346 allocations: 124.53 MiB)
Short-circuiting
warn_on_unsupported_args
:252.194 ms (1170214 allocations: 69.42 MiB)
For perspective, when I started this profiling work, this plot was taking 8 seconds to generate. Once this (and the other two PRs) bottleneck is resolved, it'll be 250 ms, 32x faster.
There's a switch to turn warnings off,
warn_on_unsupported
, but this only prevents the warnings from being displayed, not from stopping the work from being done to check the attributes in the first place.Inspection with Cthulhu shows some potential for type instability. The iteration also allocates an array for the keys each time, rather than performing a lazy iteration.
Potential solutions
plot
call, regardless of how many series are addedThe text was updated successfully, but these errors were encountered: