-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Groupby.shift
c++ API refactor and python binding
#8131
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-21.06 #8131 +/- ##
===============================================
Coverage ? 82.88%
===============================================
Files ? 105
Lines ? 17888
Branches ? 0
===============================================
Hits ? 14826
Misses ? 3062
Partials ? 0 Continue to review full report at Codecov.
|
Co-authored-by: GALI PREM SAGAR <[email protected]>
…roupshift-python
…groupshift-python
Failed tests should be fixed by #8272 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
# Pandas returns shifted column in original row order. We set its index | ||
# to be the key columns, so that `assert_groupby_results_equal` can sort | ||
# rows by key columns to make sure cudf and pandas results matches. | ||
expected.index = gdf["0"].to_pandas() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we change this index in the test, or should we be returning the same order as pandas in the public api?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently groupby.agg
already diverts from pandas in row orders, in that cudf does not alter the key order returned from libcudf to match pandas order. Here we follow the same convention.
@gpucibot merge |
rerun tests |
2 similar comments
rerun tests |
rerun tests |
@gpucibot merge |
rerun tests |
Closes #7183 , follow up of #7910
This PR:
groupby::shift
API, which only takes a single column, to accept multiple columns.groupby.shift
. Example python usage:Minor refactors:
use_thread
parameter todataset_generator.rand_dataframe
to expose thread pool config.