Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cudf::group_by (hash) for decimal32 and decimal64 #7190

Merged
merged 13 commits into from
Feb 5, 2021

Conversation

codereport
Copy link
Contributor

Follow up PR to #7169

This PR resolves a part of #3556.

@codereport codereport added feature request New feature or request 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Jan 22, 2021
@codereport codereport self-assigned this Jan 22, 2021
@codecov
Copy link

codecov bot commented Jan 28, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-0.19@53d7ad2). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff               @@
##             branch-0.19    #7190   +/-   ##
==============================================
  Coverage               ?   82.19%           
==============================================
  Files                  ?      100           
  Lines                  ?    16955           
  Branches               ?        0           
==============================================
  Hits                   ?    13937           
  Misses                 ?     3018           
  Partials               ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 53d7ad2...d6028c1. Read the comment docs.

@harrism
Copy link
Member

harrism commented Feb 2, 2021

Moving to P0 for 0.19.

@codereport codereport changed the base branch from branch-0.18 to branch-0.19 February 4, 2021 01:31
@codereport codereport added 3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond and removed 2 - In Progress Currently a work in progress labels Feb 4, 2021
@codereport codereport marked this pull request as ready for review February 4, 2021 01:34
@codereport codereport requested a review from a team as a code owner February 4, 2021 01:34
Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you put all the fixed-point groupby tests into tests/group_by/group_count_test.cpp . There are individual test source files for count, min, max, sum, mean, etc. in tests/group_by/. I would recommend splitting the tests into the matching test source files.

Also, once that is done you can declare those tests that are not yet supported as disabled instead of commenting them out. Here are some examples I used for disabling some groupby dictionary tests:

// These tests will not work until the following ptxas bug is fixed in 10.2
// https://nvbugswb.nvidia.com/NvBugs5/SWBug.aspx?bugid=3186317&cp=
TYPED_TEST(groupby_sum_test, DISABLED_dictionary)

// This tests will not work until the following ptxas bug is fixed in 10.2
// https://nvbugswb.nvidia.com/NvBugs5/SWBug.aspx?bugid=3186317&cp=
TEST_F(groupby_dictionary_mean_test, DISABLED_basic)

@davidwendt
Copy link
Contributor

Another place I added a check for dictionary columns is in the cudf::groupby::verify_valid_requests() internal function. I think that is because the dictionary code path would crash at runtime. The fixed-point aggs may just give incorrect results so it may not be necessary.

// The aggregations listed in the lambda below will not work with a values column of type
// dictionary if this is compiled with nvcc/ptxas 10.2.
// https://nvbugswb.nvidia.com/NvBugs5/SWBug.aspx?bugid=3186317&cp=
#if (__CUDACC_VER_MAJOR__ == 10) and (__CUDACC_VER_MINOR__ == 2)
CUDF_EXPECTS(
std::all_of(
requests.begin(),
requests.end(),
[](auto const& request) {
return std::all_of(
request.aggregations.begin(), request.aggregations.end(), [&request](auto const& agg) {
return (!cudf::is_dictionary(request.values.type()) ||
!(agg->kind == aggregation::SUM or agg->kind == aggregation::MEAN or
agg->kind == aggregation::STD or agg->kind == aggregation::VARIANCE));
});
}),
"dictionary type not supported for this aggregation");
#endif

@codereport
Copy link
Contributor Author

Another place I added a check for dictionary columns is in the cudf::groupby::verify_valid_requests() internal function. I think that is because the dictionary code path would crash at runtime. The fixed-point aggs may just give incorrect results so it may not be necessary.

I came across this in my code exploring but I don't think it is necessary.

@codereport codereport added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond labels Feb 4, 2021
cpp/tests/groupby/group_max_test.cpp Outdated Show resolved Hide resolved
@codereport codereport requested a review from devavret February 4, 2021 21:41
@kkraus14
Copy link
Collaborator

kkraus14 commented Feb 5, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 5d151a7 into rapidsai:branch-0.19 Feb 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants