Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle case of scan aggregation in groupby-transform #15450

Merged
merged 3 commits into from
Apr 16, 2024

Conversation

wence-
Copy link
Contributor

@wence- wence- commented Apr 3, 2024

Description

When performing a groupby-transform with a scan aggregation, the intermediate result obtained from calling groupby-agg is already the correct shape and does not need to be broadcast to align with the grouping keys.

To handle this, make sure that if the requested transform is a scan that we don't try and broadcast.

While here, tighten up the input checking: transform only applies to a single aggregation, rather than the more general interface offered by agg.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

When performing a groupby-transform with a scan aggregation, the
intermediate result obtained from calling groupby-agg is already the
correct shape and does not need to be broadcast to align with the
grouping keys.

To handle this, make sure that if the requested transform is a scan
that we don't try and broadcast.

While here, tighten up the input checking: transform only applies to a
single aggregation, rather than the more general interface offered by
agg.

- Closes rapidsai#12621
- Closes rapidsai#15448
@wence- wence- requested a review from a team as a code owner April 3, 2024 12:02
@wence- wence- requested review from shwina and mroeschke April 3, 2024 12:02
@github-actions github-actions bot added the Python Affects Python cuDF API. label Apr 3, 2024
@wence- wence- added bug Something isn't working non-breaking Non-breaking change and removed Python Affects Python cuDF API. labels Apr 3, 2024
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wence- Thanks for this fix. Do you think this is worth considering for 24.04? It seems like a pretty significant bugfix.


def test_transform_invalid():
df = cudf.DataFrame({"key": [1, 1], "values": [4, 5]})
with pytest.raises(ValueError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care to raise the same error as pandas? This ends up hitting a TypeError in pandas for a few of the "bad" inputs that I tried like {"values": "cumprod"} or the tuple ("cumprod",) or the integer 3. Only invalid function names (strings) raise ValueError from what I can tell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh probably, I didn't check exhaustively what pandas produces.

@wence-
Copy link
Contributor Author

wence- commented Apr 4, 2024

Do you think this is worth considering for 24.04? It seems like a pretty significant bugfix.

It's been here forever, so arguably 24.06 is OK.

@wence-
Copy link
Contributor Author

wence- commented Apr 15, 2024

/merge

@github-actions github-actions bot added the Python Affects Python cuDF API. label Apr 15, 2024
@rapids-bot rapids-bot bot merged commit c1dcc31 into rapidsai:branch-24.06 Apr 16, 2024
69 checks passed
@wence- wence- deleted the wence/fix/15448 branch June 14, 2024 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
2 participants