Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Groupby.shift c++ API refactor and python binding #8131

Merged
merged 26 commits into from
May 26, 2021

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented May 1, 2021

Closes #7183 , follow up of #7910

This PR:

  • refactors existing libcudf groupby::shift API, which only takes a single column, to accept multiple columns.
  • adds cython and python bindings for groupby.shift. Example python usage:
df = cudf.DataFrame({"a":[1,2,1,2,2], "b":["x", "y", "z", "42", "7"]})
>>> df.groupby("a").shift(1)
      b
a      
1  <NA>
1     x
2  <NA>
2     y
2    42

Minor refactors:

  • adds use_thread parameter to dataset_generator.rand_dataframe to expose thread pool config.

@isVoid isVoid added feature request New feature or request 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. breaking Breaking change labels May 1, 2021
@isVoid isVoid self-assigned this May 1, 2021
@isVoid isVoid requested review from a team as code owners May 1, 2021 00:14
@github-actions github-actions bot added the Python Affects Python cuDF API. label May 1, 2021
@codecov
Copy link

codecov bot commented May 1, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.06@dd5eecd). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head e62c825 differs from pull request most recent head eef9a25. Consider uploading reports for the commit eef9a25 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.06    #8131   +/-   ##
===============================================
  Coverage                ?   82.88%           
===============================================
  Files                   ?      105           
  Lines                   ?    17888           
  Branches                ?        0           
===============================================
  Hits                    ?    14826           
  Misses                  ?     3062           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dd5eecd...eef9a25. Read the comment docs.

Co-authored-by: GALI PREM SAGAR <[email protected]>
@isVoid
Copy link
Contributor Author

isVoid commented May 20, 2021

Failed tests should be fixed by #8272

@isVoid isVoid requested review from cwharris and karthikeyann May 20, 2021 15:59
@isVoid isVoid added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels May 21, 2021
Copy link
Contributor

@cwharris cwharris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

python/cudf/cudf/core/groupby/groupby.py Outdated Show resolved Hide resolved
# Pandas returns shifted column in original row order. We set its index
# to be the key columns, so that `assert_groupby_results_equal` can sort
# rows by key columns to make sure cudf and pandas results matches.
expected.index = gdf["0"].to_pandas()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we change this index in the test, or should we be returning the same order as pandas in the public api?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently groupby.agg already diverts from pandas in row orders, in that cudf does not alter the key order returned from libcudf to match pandas order. Here we follow the same convention.

@kkraus14
Copy link
Collaborator

@gpucibot merge

@isVoid
Copy link
Contributor Author

isVoid commented May 25, 2021

rerun tests

2 similar comments
@isVoid
Copy link
Contributor Author

isVoid commented May 26, 2021

rerun tests

@kkraus14
Copy link
Collaborator

rerun tests

@kkraus14
Copy link
Collaborator

@gpucibot merge

@galipremsagar
Copy link
Contributor

rerun tests

@rapids-bot rapids-bot bot merged commit cd7fe6f into rapidsai:branch-21.06 May 26, 2021
@quasiben quasiben mentioned this pull request Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Groupby shift
8 participants