Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Add COLLECT groupby aggregation #5874

Merged
merged 32 commits into from
Sep 1, 2020

Conversation

shwina
Copy link
Contributor

@shwina shwina commented Aug 6, 2020

Closes #5620

  • Adds groupby COLLECT agg to libcudf
  • Python bindings

@GPUtester
Copy link
Collaborator

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@shwina shwina mentioned this pull request Aug 7, 2020
@harrism harrism added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Aug 18, 2020
@harrism harrism changed the base branch from branch-0.15 to branch-0.16 August 18, 2020 01:15
@harrism
Copy link
Member

harrism commented Aug 18, 2020

@shwina moving to 0.16 since still WIP.

@shwina shwina changed the title [WIP] Add COLLECT groupby aggregation [REVIEW] Add COLLECT groupby aggregation Aug 19, 2020
@shwina shwina marked this pull request as ready for review August 19, 2020 14:03
@shwina shwina requested review from a team as code owners August 19, 2020 14:03
@shwina shwina requested a review from devavret August 19, 2020 14:03
Copy link
Contributor

@rgsl888prabhu rgsl888prabhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ LGTM

cpp/include/cudf/aggregation.hpp Outdated Show resolved Hide resolved
@kkraus14 kkraus14 removed the request for review from trevorsm7 August 28, 2020 21:33
Comment on lines +358 to +363
// Always use list for COLLECT
template <typename Source>
struct target_type_impl<Source, aggregation::COLLECT> {
using type = cudf::list_view;
};

Copy link
Contributor

@jrhemstad jrhemstad Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually needed? The target_type logic is for determining the type to use for an accumulator for ops like sum/min/max. I wouldn't think it would be needed for COLLECT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- otherwise we fail this check: https://github.com/rapidsai/cudf/blob/branch-0.16/cpp/src/groupby/groupby.cu#L102

which eventually calls:

return (not std::is_void<target_type_t<Source, k>>::value);

cpp/src/groupby/sort/group_collect.cu Outdated Show resolved Hide resolved
cpp/src/groupby/sort/group_collect.cu Outdated Show resolved Hide resolved
@kkraus14 kkraus14 added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Aug 31, 2020
@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Aug 31, 2020
@mythrocks
Copy link
Contributor

Sorry for not having looked at this PR sooner. Thanks for working on this. This is something the Spark team would be interested in as well. :]

@shwina, perhaps at a later date, we might consider a test for collecting struct columns. We can try adding one when we attempt collect as a rolling-window aggregation.

@shwina
Copy link
Contributor Author

shwina commented Sep 1, 2020

rerun tests

@kkraus14 kkraus14 merged commit da6c03a into rapidsai:branch-0.16 Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] cudf groupby aggregate into list
9 participants