Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add groupby product support #7763

Merged
merged 21 commits into from
Apr 21, 2021

Conversation

karthikeyann
Copy link
Contributor

closes #4882

Added groupby.product support in both hash and sort groupby.

@github-actions github-actions bot added CMake CMake build issue Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Mar 30, 2021
@karthikeyann karthikeyann added 2 - In Progress Currently a work in progress 4 - Needs Review Waiting for reviewer to review or respond feature request New feature or request and removed CMake CMake build issue Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Mar 30, 2021
@karthikeyann karthikeyann added the non-breaking Non-breaking change label Mar 30, 2021
Copy link
Contributor

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't target_type_impl need to be updated?

struct target_type_impl<SourceType, aggregation::VARIANCE> {

@karthikeyann
Copy link
Contributor Author

Doesn't target_type_impl need to be updated?

It's enabled with is_sum_product_agg.

std::enable_if_t<std::is_integral<Source>::value && is_sum_product_agg(k)>> {

std::enable_if_t<std::is_floating_point<Source>::value && is_sum_product_agg(k)>> {

@karthikeyann karthikeyann added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Mar 31, 2021
@github-actions github-actions bot added CMake CMake build issue Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Mar 31, 2021
@karthikeyann karthikeyann marked this pull request as ready for review March 31, 2021 15:14
@karthikeyann karthikeyann requested review from a team as code owners March 31, 2021 15:14
@karthikeyann karthikeyann requested a review from trxcllnt March 31, 2021 15:14
@codecov
Copy link

codecov bot commented Apr 12, 2021

Codecov Report

Merging #7763 (98b2218) into branch-0.20 (51336df) will increase coverage by 0.03%.
The diff coverage is 88.67%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.20    #7763      +/-   ##
===============================================
+ Coverage        82.88%   82.92%   +0.03%     
===============================================
  Files              103      103              
  Lines            17668    17664       -4     
===============================================
+ Hits             14645    14648       +3     
+ Misses            3023     3016       -7     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/__init__.py 100.00% <ø> (ø)
python/cudf/cudf/utils/cudautils.py 57.75% <25.00%> (ø)
python/cudf/cudf/core/column/column.py 88.64% <71.42%> (ø)
python/cudf/cudf/core/column/numerical.py 94.43% <72.72%> (ø)
python/dask_cudf/dask_cudf/backends.py 89.51% <80.00%> (-0.08%) ⬇️
python/cudf/cudf/core/dataframe.py 90.87% <83.33%> (+0.01%) ⬆️
python/cudf/cudf/utils/utils.py 89.53% <91.66%> (+0.02%) ⬆️
python/cudf/cudf/core/column/datetime.py 89.91% <100.00%> (ø)
python/cudf/cudf/core/column/timedelta.py 88.66% <100.00%> (ø)
python/cudf/cudf/core/groupby/groupby.py 92.33% <100.00%> (+0.88%) ⬆️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5c2f744...98b2218. Read the comment docs.

cpp/src/groupby/hash/groupby.cu Outdated Show resolved Hide resolved
cpp/src/groupby/sort/aggregate.cpp Show resolved Hide resolved
@karthikeyann
Copy link
Contributor Author

rerun tests

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the inline comments, I had one additional question that isn't relevant to this exact PR but I'd appreciate an answer on (I don't know this part of the C++ code base very well yet): What is the purpose of the elementwise_aggregator? It seems like an unnecessary level of indirection between aggregate_row and update_target_element since the same template specializations could be applied without it.

cpp/src/groupby/hash/groupby.cu Outdated Show resolved Hide resolved
cpp/src/groupby/hash/groupby.cu Show resolved Hide resolved
cpp/src/groupby/sort/group_reductions.hpp Show resolved Hide resolved
cpp/src/groupby/sort/group_single_pass_reduction_util.cuh Outdated Show resolved Hide resolved
cpp/tests/groupby/group_product_test.cpp Outdated Show resolved Hide resolved
cpp/tests/groupby/group_product_test.cpp Show resolved Hide resolved
Comment on lines +127 to +129
// This test will not work until the following ptxas bug is fixed in 10.2
// https://nvbugswb.nvidia.com/NvBugs5/SWBug.aspx?bugid=3186317&cp=
TYPED_TEST(groupby_product_test, DISABLED_dictionary)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This test will not work until the following ptxas bug is fixed in 10.2
// https://nvbugswb.nvidia.com/NvBugs5/SWBug.aspx?bugid=3186317&cp=
TYPED_TEST(groupby_product_test, DISABLED_dictionary)
TYPED_TEST(groupby_product_test, dictionary)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need another cleanup. At current state, code needs specialization for every keys type.
Also another PR #7949 is also adding few operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if you think it's best to wait to enable this test feel free to resolve.

python/cudf/cudf/_lib/aggregation.pyx Show resolved Hide resolved
python/cudf/cudf/core/groupby/groupby.py Show resolved Hide resolved
@jrhemstad
Copy link
Contributor

What is the purpose of the elementwise_aggregator? It seems like an unnecessary level of indirection between aggregate_row and update_target_element since the same template specializations could be applied without it.

elementwise_aggregator is the thing that is dispatched by dispatch_type_and_aggregation and presents the operator() template with the appropriate template type interface. You could put the "specializations" on that operator(), but that requires SFINAE and enable_if. By calling using the update_target_element type we can just use normal partial specializations of a type instead of SFINAE.

It also allows us to have the base case automatically handle any error paths that don't have specializations:

template <typename Source,
aggregation::Kind k,
bool target_has_nulls,
bool source_has_nulls,
typename Enable = void>
struct update_target_element {
__device__ void operator()(mutable_column_device_view target,
size_type target_index,
column_device_view source,
size_type source_index) const noexcept
{
cudf_assert(false and "Invalid source type and aggregation combination.");
}
};

@karthikeyann karthikeyann requested a review from vyasr April 20, 2021 17:23
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good assuming no further action items on my one open thread.

@karthikeyann
Copy link
Contributor Author

@gpucibot merge

@jrhemstad
Copy link
Contributor

@gpucibot merge

@rapids-bot rapids-bot bot merged commit c0cf5e1 into rapidsai:branch-0.20 Apr 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA]need support 'product' method in cudf.Dataframe.groupby.agg
6 participants