-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve distinct_count
with cuco::static_set
#13343
Improve distinct_count
with cuco::static_set
#13343
Conversation
The new implementation can be as much as 3x faster by using half of the memory: distinct_count[0] Quadro RTX 8000
|
@PointKernel These speedups are amazing! Great work here. 😍 |
This is a follow-up PR of #407. It fetches the latest cuco that fixes several bugs uncovered during libcudf integration. rapidsai/cudf#13343 verified this won't break cudf. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Bradley Dice (https://github.com/bdice) URL: #412
Yeah, the speedup is fantastic! |
@@ -34,6 +34,8 @@ | |||
#include <rmm/cuda_stream_view.hpp> | |||
#include <rmm/exec_policy.hpp> | |||
|
|||
#include <cuco/static_set.cuh> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect to see some header removed from this but don't see any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for now, some common utilities will no longer be needed but I will leave it to the distinct
refactoring with cuco::static_reduction_map
to avoid back-and-forth effort.
auto key_set = cuco::experimental::static_set{ | ||
cuco::experimental::extent{static_cast<cudf::size_type>(compute_hash_table_size(num_rows))}, | ||
cuco::empty_key<cudf::size_type>{COMPACTION_EMPTY_KEY_SENTINEL}, | ||
row_equal, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the static_set
is now constructed with a comparator? Is this a requirement? And insert_if
doesn't require a comparator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the static_set is now constructed with a comparator?
Yes, we are trying to align with std::unordered_set
when possible. Similar to allocator and hasher, equality should be passed only once during construction (see reference).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see potential issue with adopting static_set
having such requirement into cudf contains
APIs. In such places:
- The
haystack
table is firstly fed into astatic_set
, when we need to useself_comparator
forkey_equal
. - Then, a
needles
table is used to check if its rows exist in the set. This time, we need to usetwo_table_comparator
forkey_equal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to steal the show. Here is my suggestion to overcome the aforementioned issue: Combine both comparators into a thin proxy wrapper:
template<typename SelfComparator, typename TwoTableComparator>
struct comparator {
SelfComparator self_comp;
TwoTableComparator two_table_comp;
bool operator()(lhs_index_type i, lhs_index_type j) const { return self_comp(static_cast<size_type>(i), static_cast<size_type>(j)); }
bool operator()(lhs_index_type i, rhs_index_type j) const { return two_table_comp(i, j); }
bool operator()(rhs_index_type i, lhs_index_type j) const { return two_table_comp(i, j); }
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we make sure the desired operator will be used with the equal wrapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When inserting, we should insert all keys of type lhs_index_type
while when checking contains we will check keys of type rhs_index_type
against the lhs_index_type
keys in the set.
Theoretically, my proposal should work but that depends on the detail implementation of static_set
. If there is compile error with it then we may also need additional modification to static_set
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, my proposal should work but that depends on the detail implementation of static_set.
Good point. cuco has an additional indirection layer to facilitate sentinel comparison thus the wrapper may not work as it is. To be tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a fantastic PR!
/merge |
Description
This PR improves
distinct_count
with the newcuco::static_set
data structure.Related to #12261
Checklist