-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Implement groupby.sample
#12882
Merged
rapids-bot
merged 21 commits into
rapidsai:branch-23.04
from
wence-:wence/fea/groupby-sample
Mar 23, 2023
Merged
Changes from 16 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
008cfe1
Implement sketch of groupby.sample
wence- 0972339
Implement fast path for sample
wence- 30629ac
Pacify type-checker and some more implementation
wence- 114a164
Faster paths in most cases, better documentation
wence- 0179867
Merge branch 'branch-23.04' into wence/fea/groupby-sample
wence- 2f9a8c1
Fix bugs in fast-path code
wence- af5036a
Add tests of groupby.sample
wence- cf5fc32
Expose segmented_sort_by_key to Python
wence- a97ba8d
No more pathological slow cases in groupby sample
wence- cffe605
Slightly faster masking of the shuffled indices
wence- aa0fd8c
Minor fixes
wence- 74cc64b
Fix sample for non-range index
wence- 6113e8e
Test groupby.sample with non rangeindex
wence- 85acba9
Add groupby.sample to pytest benchmarks
wence- e90a214
Use numpy group_offsets
wence- 4f6d796
Merge branch-23.04 into wence/fea/groupby-sample
wence- b57b068
Merge remote-tracking branch 'upstream/branch-23.04' into wence/fea/g…
wence- adb7a0b
Minor fixes in review
wence- 7998d90
Dtypes aren't callable
wence- 558541d
Trailing comma is bad
wence- 206d07a
Correct string index construction
wence- File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -1,4 +1,4 @@ | ||||||||||||
# Copyright (c) 2020-2022, NVIDIA CORPORATION. | ||||||||||||
# Copyright (c) 2020-2023, NVIDIA CORPORATION. | ||||||||||||
|
||||||||||||
from cudf.core.buffer import acquire_spill_lock | ||||||||||||
|
||||||||||||
|
@@ -18,11 +18,13 @@ from cudf._lib.cpp.search cimport lower_bound, upper_bound | |||||||||||
from cudf._lib.cpp.sorting cimport ( | ||||||||||||
is_sorted as cpp_is_sorted, | ||||||||||||
rank, | ||||||||||||
segmented_sort_by_key as cpp_segmented_sort_by_key, | ||||||||||||
sorted_order, | ||||||||||||
) | ||||||||||||
from cudf._lib.cpp.table.table cimport table | ||||||||||||
from cudf._lib.cpp.table.table_view cimport table_view | ||||||||||||
from cudf._lib.cpp.types cimport null_order, null_policy, order | ||||||||||||
from cudf._lib.utils cimport table_view_from_columns | ||||||||||||
from cudf._lib.utils cimport columns_from_unique_ptr, table_view_from_columns | ||||||||||||
|
||||||||||||
|
||||||||||||
@acquire_spill_lock() | ||||||||||||
|
@@ -143,6 +145,70 @@ def order_by(list columns_from_table, object ascending, str na_position): | |||||||||||
return Column.from_unique_ptr(move(c_result)) | ||||||||||||
|
||||||||||||
|
||||||||||||
def segmented_sort_by_key( | ||||||||||||
list values, | ||||||||||||
list keys, | ||||||||||||
Column segment_offsets, | ||||||||||||
list column_order=None, | ||||||||||||
list null_precedence=None, | ||||||||||||
): | ||||||||||||
""" | ||||||||||||
Sort segments of a table by given keys | ||||||||||||
|
||||||||||||
Parameters | ||||||||||||
---------- | ||||||||||||
values : list[Column] | ||||||||||||
Columns of the table which will be sorted | ||||||||||||
keys : list[Column] | ||||||||||||
Columns making up the sort key | ||||||||||||
offsets : Column | ||||||||||||
Segment offsets | ||||||||||||
column_order : list[bool], optional | ||||||||||||
Sequence of boolean values which correspond to each column in | ||||||||||||
keys providing the sort order (default all True). | ||||||||||||
With True <=> ascending; False <=> descending. | ||||||||||||
null_precedence : list[str], optional | ||||||||||||
Sequence of "first" or "last" values (default "first") | ||||||||||||
indicating the position of null values when sorting the keys. | ||||||||||||
|
||||||||||||
Returns | ||||||||||||
------- | ||||||||||||
list[Column] | ||||||||||||
list of value columns sorted by keys | ||||||||||||
""" | ||||||||||||
cdef table_view values_view = table_view_from_columns(values) | ||||||||||||
cdef table_view keys_view = table_view_from_columns(keys) | ||||||||||||
cdef column_view offsets_view = segment_offsets.view() | ||||||||||||
cdef vector[order] c_column_order | ||||||||||||
cdef vector[null_order] c_null_precedence | ||||||||||||
cdef unique_ptr[table] result | ||||||||||||
ncol = len(values) | ||||||||||||
column_order = column_order or [True] * ncol, | ||||||||||||
null_precedence = null_precedence or ["first"] * ncol, | ||||||||||||
for asc, null in zip(column_order, null_precedence): | ||||||||||||
if asc: | ||||||||||||
c_column_order.push_back(order.ASCENDING) | ||||||||||||
else: | ||||||||||||
c_column_order.push_back(order.DESCENDING) | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess I was just copying the previous |
||||||||||||
if asc ^ (null == "first"): | ||||||||||||
c_null_precedence.push_back(null_order.AFTER) | ||||||||||||
elif asc ^ (null == "last"): | ||||||||||||
c_null_precedence.push_back(null_order.BEFORE) | ||||||||||||
else: | ||||||||||||
raise ValueError(f"Invalid null precedence {null}") | ||||||||||||
with nogil: | ||||||||||||
result = move( | ||||||||||||
cpp_segmented_sort_by_key( | ||||||||||||
values_view, | ||||||||||||
keys_view, | ||||||||||||
offsets_view, | ||||||||||||
c_column_order, | ||||||||||||
c_null_precedence, | ||||||||||||
) | ||||||||||||
) | ||||||||||||
return columns_from_unique_ptr(move(result)) | ||||||||||||
|
||||||||||||
|
||||||||||||
@acquire_spill_lock() | ||||||||||||
def digitize(list source_columns, list bins, bool right=False): | ||||||||||||
""" | ||||||||||||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, probably yes...