-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Segmented Sorting #4603
Comments
Sorry, can you clarify: is the present use case just for unit testing? |
Yep, not high priority. I imagine it will be useful outside of unit testing since it's a somewhat common operation, but I don't have any other immediate need. I just wanted to capture the idea while I was thinking of it. |
I believe there's a desire to sort as a groupby operation in Python. I.E. something like groupby columns 0 and 1 of the input Table and then sort by columns 2 and 4 within each group. |
But isn't that case just a lexicographic sort by columns 0, 1, 2, 4? |
Feels like a generalization of |
Yes it is, but there's no easy way with the current APIs to compose that because I may be asking for additional aggregations as well where currently I'd have to redo the work of sorting by columns 0 and 1 in order to merge the two. |
addresses part of #6541 Segment sort of lists - [x] lists_column_view segmented_sort - [x] numerical types (cub segmented sort limitation) - [x] sort_lists(table_view) - [x] unit tests closes #4603 Segmented sort - [x] segmented_sort - [x] unit tests. Authors: - Karthikeyan (@karthikeyann) Approvers: - AJ Schmidt (@ajschmidt8) - Keith Kraus (@kkraus14) - Jake Hemstad (@jrhemstad) - Conor Hoekstra (@codereport) URL: #7122
Is your feature request related to a problem? Please describe.
Given a table and a set of offsets that demarcate "segments" of the table, I would like to be able to sort each segment.
Describe the solution you'd like
Implement a
segmented_sort
API like:Describe alternatives you've considered
Alternatively, you can use
slice
to get a set of views and sort each view separately, but that's inefficient.Additional context
I thought of this while testing
partition
in #4472. Since the ordering per partition is non-deterministic, it is difficult to unit test when there are multiple values per partition.Unfortunately, CUB's segmented sort can't be used because we need a comparison sort for the lexicographic row ordering.
Instead, a straightforward way to implement this:
table_to_sort
segment_offsets
table_to_sort
The text was updated successfully, but these errors were encountered: