Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add groupby scan aggregation to cudf #7759

Merged
merged 72 commits into from
Apr 22, 2021

Conversation

karthikeyann
Copy link
Contributor

@karthikeyann karthikeyann commented Mar 30, 2021

closes #1296 Groupby cumulative count
closes #1298 Groupby cumulative sum

  • Add cython code for groupby scan (cannot mix reduce aggs and scan aggs)
  • Add python code for groupby scan functions - cumsum, cummin, cummax, cumcount, groupby.agg()
  • unit tests

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in reviewing this, was waiting until the changes from #7818 could be merged in. I think we can simplify this code a little and make it match the C++ internals by mapping cumulative operations to their non-cumulative counterparts earlier, but I've left a couple of comments to ascertain that this won't cause significant problems when we start supporting mixed scan/aggregate operations.

python/cudf/cudf/_lib/groupby.pyx Show resolved Hide resolved
python/cudf/cudf/_lib/groupby.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/aggregation.pyx Show resolved Hide resolved
python/cudf/cudf/_lib/groupby.pyx Show resolved Hide resolved
Comment on lines +1576 to +1578
# pd.groupby.cumcount returns a series.
if isinstance(expect_df, pd.Series):
expect_df = expect_df.to_frame("val")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shwina did you already write up the issue? If so, this conversation can be resolved.

@karthikeyann karthikeyann requested a review from vyasr April 16, 2021 20:24
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One or two minor comments from me, otherwise looks ready.

python/cudf/cudf/_lib/groupby.pyx Outdated Show resolved Hide resolved
@kkraus14
Copy link
Collaborator

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 8dae31c into rapidsai:branch-0.20 Apr 22, 2021
@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Apr 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Groupby cumulative sum [FEA] Groupby cumulative count
5 participants