-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Segmented sort #7122
Add Segmented sort #7122
Conversation
cpp/src/sort/segmented_sort.cu
Outdated
CUDF_EXPECTS(std::all_of(keys.begin(), | ||
keys.end(), | ||
[](column_view const& col) { return col.type().id() == type_id::LIST; }), | ||
"segmented_sort only supports lists columns"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I'd think I'd should be able to do a segmented sort of a normal column. In fact, I filed an issue asking for exactly that ages ago: #4603
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implemented segmented_sort
as well.
Codecov Report
@@ Coverage Diff @@
## branch-0.18 #7122 +/- ##
===============================================
+ Coverage 82.09% 82.20% +0.11%
===============================================
Files 97 100 +3
Lines 16474 16952 +478
===============================================
+ Hits 13524 13936 +412
- Misses 2950 3016 +66
Continue to review full report at Codecov.
|
cpp/include/cudf/sorting.hpp
Outdated
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource()); | ||
|
||
/** | ||
* @brief Performs a lexicographic segmented sort of the list in each row of a table of list columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this function. How do you do a lexicographic sort of a table
of list columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this clear?
Performs a lexicographic sort of each list in a table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is it the same as calling sort independently on each column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. It will be similar to calling sort on each row of the table. Each row is a list.
The relative row order in the table remains same. In each row, the elements in list are sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's the same as calling sort_lists
on each column in the table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. it's similar to treating each row as table, then sort it. then repeat it for all rows.
Copying the example from #6541
row# | a | b | c |
---|---|---|---|
0 | [21, 22, 23, 22] | [13, 14, 12, 11] | ["a", "b", "c", "d"] |
1 | [22, 21, 23, 22] | [14, 13, 12, 11] | ["a", "b", "c", "d"] |
Here [...] is a list.
row# | a | b | c |
---|---|---|---|
0 |
|
|
|
1 |
|
|
|
sort_lists(values={a,b,c}, keys={a,b})
output will be
row# | a | b | c |
---|---|---|---|
0 |
|
|
|
1 |
|
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that require all of the lists in a given row to have the same number of elements? What would the result be for this input?
row# | a | b | c |
---|---|---|---|
0 | [21, 22, 23, 22] | [13, 14] | ["a", "c", "d"] |
1 | [22, 21, 23, 22] | [14] | ["c", "d"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
For this input, it will throw a logic error.
Unrelated style check failures. |
Being fixed in #7279 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level questions:
- Why is
sort_lists
different fromsegmented_sort
? Couldn'tsort_lists
just be implemented by callingsegmented_sort
on the data/offsets children of a list column? - This brings up the question, do we even need a
sort_lists
API? Couldn't a caller trivially just dosegmented_sort(list.data, list.offsets)
? This had the advantage of implicitly enforcing the "lists of depth 1" requirement. - The
sort_lists
table API still sticks out as very odd to me. I think from [FEA] Segmented sort / sorted_order (argsort) #6541 what @kkraus14 really wanted is just a table level segmented sort, not a lexicographic sort of lists of all the same length.
In summary, all I would think we need is:
segmented_sort_by_key(table_view values, table_view keys, column_view segment_offsets, ...)
segmented_sorted_order(table_view keys, column_view segment_offsets)
Had discussion with @jrhemstad
|
@karthikeyann can you get the requested changes done by tomorrow? If not we should push to 0.19. |
Yes. It will be done by today. By tomorrow, I will address re-review comments as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
rerun tests |
@gpucibot merge |
addresses part of #6541 Segment sort of lists
closes #4603 Segmented sort