-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Python and Cython internals for groupby aggregation #7818
Refactor Python and Cython internals for groupby aggregation #7818
Conversation
…nt of aggregators.
…ther than the underlying C++ class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cython looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Very nice cleanup, @vyasr!
rerun tests |
Why are there only 4 checks showing in this PR? |
rerun tests |
@karthikeyann all CI was blocked yesterday due to connectivity issues. The queue is slowly being emptied, I'll merge in #7968 once that's finalized then trigger tests again. |
25794bc
to
affc9c9
Compare
I can't merge the changes from #7968 without triggering C++ reviews, so I'll just wait for that to get merged into |
@gpucibot merge |
This PR makes some improvements to the groupby/aggregation code that I identified while working on #7731. The main purpose is to make the code logic easier to follow and reduce some unnecessary complexity; I see minor but measurable performance improvements (2-5% for small datasets) as well, but those are mostly just side effects here. Specifically, it makes the following changes:
_AggregationFactory
intoAggregation
, and removes the constructor forAggregation
. The one downside here is that the CythonAggregation
object's constructor no longer places it in a valid state; however, in practice the object is always constructed via either themake_aggregation
function or its various factories, and the object's constructor was only every used in_drop_unsupported_aggs
anyway. The benefit is we remove the fragmentation between these two classes, making the code much more readable, and theAggregation
class actually serves a purpose now beyond just providing a single propertykind
that is only used once: it is now the primary way that other Cython files interact with aggregations. This also means that in most places other Cython modules don't need to work withunique_ptr[aggregation]
as much anymore (although they do still have to moveAggregation.c_obj
for performance reasons).make_aggregation
now returns the Cython class instead of the underlying C++ one.idxmin
vsargmin
, nowARGMIN
).