-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle case of scan aggregation in groupby-transform #15450
Conversation
When performing a groupby-transform with a scan aggregation, the intermediate result obtained from calling groupby-agg is already the correct shape and does not need to be broadcast to align with the grouping keys. To handle this, make sure that if the requested transform is a scan that we don't try and broadcast. While here, tighten up the input checking: transform only applies to a single aggregation, rather than the more general interface offered by agg. - Closes rapidsai#12621 - Closes rapidsai#15448
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wence- Thanks for this fix. Do you think this is worth considering for 24.04? It seems like a pretty significant bugfix.
|
||
def test_transform_invalid(): | ||
df = cudf.DataFrame({"key": [1, 1], "values": [4, 5]}) | ||
with pytest.raises(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care to raise the same error as pandas? This ends up hitting a TypeError
in pandas for a few of the "bad" inputs that I tried like {"values": "cumprod"}
or the tuple ("cumprod",)
or the integer 3
. Only invalid function names (strings) raise ValueError
from what I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh probably, I didn't check exhaustively what pandas produces.
It's been here forever, so arguably 24.06 is OK. |
/merge |
Description
When performing a groupby-transform with a scan aggregation, the intermediate result obtained from calling groupby-agg is already the correct shape and does not need to be broadcast to align with the grouping keys.
To handle this, make sure that if the requested transform is a scan that we don't try and broadcast.
While here, tighten up the input checking: transform only applies to a single aggregation, rather than the more general interface offered by agg.
Checklist