-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cuco::static_set in the hash-based groupby #14813
Use cuco::static_set in the hash-based groupby #14813
Conversation
6213d8e
to
166ed49
Compare
…428) This PR fixes a bug in the OA base class where the erased key sentinel value should have been initialized by the empty key sentinel if not specified. Tests are updated to exercise this issue. Needed by rapidsai/cudf#14813
Question to reviewers: is it worth doing runtime dispatching based on whether nested types are involved or not? Benchmark results are shown below, TLDR:
Based on our past tuning experience, e.g.: cudf/cpp/src/search/contains_table.cu Lines 158 to 165 in 3ba63c3
CGSize == 1 for all types.
groupby_max[0] Quadro RTX 8000
groupby_struct_keys[0] Quadro RTX 8000
|
@vyasr Thanks for the review, have you reviewed both cpp and python or do I need another cpp/python approval to merge the PR? |
I reviewed both. I think the main question on the Python side would be whether we need to update the docs in https://github.com/rapidsai/cudf/blob/branch-24.04/docs/cudf/source/user_guide/pandas-comparison.md#result-ordering to also mention |
I think we should do that, yes (this PR is then a breaking change) |
using probing_scheme_type = cuco::linear_probing< | ||
1, ///< Number of threads used to handle each input key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PointKernel You mentioned:
Based on our past tuning experience, e.g.:
cudf/cpp/src/search/contains_table.cu
using a larger CG size (like 2 or 4) for nested types can bring significant speedups compared to the current CGSize == 1 for all types.
Do we need to adopt that here? Do we need an issue or TODO describing that optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to adopt that here?
Probably not. That requires nontrivial changes to the current code and the benefit is unclear, i.e., no users actually complained about the groupby performance with nested types. I'm inclined to look into it until we have the bandwidth or someone raises an issue about it. Does it sound good to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d like some kind of note in the code or an issue to make sure that we are aware of this optimization potential for the future. Otherwise, no action needed in terms of implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. Done in 56a2229
Also CC @mroeschke @galipremsagar we may have to introduce an extra sorting in pandas compatibility mode due to the ordering change in this PR for cudf.pandas to behave as expected. No need to make any changes yet, just something to keep on your radars. |
/merge |
Description
Depends on #14849
Contributes to #12261
This PR migrates hash groupby to use the new
cuco::static_set
data structure. It doesn't change any existing libcudf behavior but uncovers the fact that the cudf pythonvalue_counts
doesn't guarantee output orders thus the PR becomes a breaking change.Checklist