-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Add COLLECT
groupby aggregation
#5874
Conversation
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
@shwina moving to 0.16 since still WIP. |
COLLECT
groupby aggregationCOLLECT
groupby aggregation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ LGTM
// Always use list for COLLECT | ||
template <typename Source> | ||
struct target_type_impl<Source, aggregation::COLLECT> { | ||
using type = cudf::list_view; | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually needed? The target_type
logic is for determining the type to use for an accumulator for ops like sum/min/max. I wouldn't think it would be needed for COLLECT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- otherwise we fail this check: https://github.com/rapidsai/cudf/blob/branch-0.16/cpp/src/groupby/groupby.cu#L102
which eventually calls:
return (not std::is_void<target_type_t<Source, k>>::value); |
Sorry for not having looked at this PR sooner. Thanks for working on this. This is something the Spark team would be interested in as well. :] @shwina, perhaps at a later date, we might consider a test for collecting |
rerun tests |
Closes #5620
COLLECT
agg to libcudf