Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scan_aggregation and reduce_aggregation derived types. #10357

Merged
merged 10 commits into from
Mar 11, 2022

Conversation

nvdbaranec
Copy link
Contributor

This PR adds the scan_aggregation and reduce_aggregation derived types. With it, all concrete aggregation types are now derived from algorithmic specific subtypes.

@nvdbaranec nvdbaranec added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. Java Affects Java cuDF API. improvement Improvement / enhancement to an existing function breaking Breaking change labels Feb 25, 2022
@nvdbaranec nvdbaranec requested review from a team as code owners February 25, 2022 21:26
@nvdbaranec
Copy link
Contributor Author

Pinging @jrhemstad.

@karthikeyann, there are some comments in the existing python aggregations that I wasn't sure about:

# TODO: update this after adding per algorithm aggregation derived types

If there's anything I can do in this PR to address them, let me know.

@codecov
Copy link

codecov bot commented Feb 25, 2022

Codecov Report

Merging #10357 (6f940fd) into branch-22.04 (b613394) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.04   #10357      +/-   ##
================================================
+ Coverage         86.13%   86.16%   +0.02%     
================================================
  Files               139      139              
  Lines             22460    22457       -3     
================================================
+ Hits              19347    19351       +4     
+ Misses             3113     3106       -7     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/column.py 89.16% <100.00%> (+0.17%) ⬆️
python/cudf/cudf/core/dataframe.py 93.57% <100.00%> (ø)
python/cudf/cudf/core/frame.py 91.72% <100.00%> (ø)
python/cudf/cudf/core/single_column_frame.py 97.01% <100.00%> (ø)
python/cudf/cudf/core/column/string.py 88.39% <0.00%> (+0.12%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.49% <0.00%> (+0.30%) ⬆️
python/cudf/cudf/core/groupby/groupby.py 91.92% <0.00%> (+0.43%) ⬆️
python/cudf/cudf/core/column/lists.py 90.56% <0.00%> (+0.47%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0f7fe6...6f940fd. Read the comment docs.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments attached. Reading issue #7106 helped me understand a lot more of this PR. A cross-link to that related issue would be nice to have in the PR description for future readers.

cpp/src/aggregation/aggregation.cpp Show resolved Hide resolved
cpp/tests/reductions/reduction_tests.cpp Outdated Show resolved Hide resolved
cpp/tests/reductions/reduction_tests.cpp Outdated Show resolved Hide resolved
java/src/main/native/src/ColumnViewJni.cpp Show resolved Hide resolved
python/cudf/cudf/_lib/aggregation.pyx Show resolved Hide resolved
@karthikeyann
Copy link
Contributor

@nvdbaranec
cumsum, cummin, cummax were aliases for scan operations for sum, min, and max respectively when single Aggregation class was present.
since per algorithm (reduction, scan, group_scan, rolling) is implemented, cumsum, cummin, cummax should not exist.
They should be removed.
For eg. wherever ScanAggregation::cumsum() is called, it should be replaced by ScanAggregation::sum()

@nvdbaranec
Copy link
Contributor Author

@nvdbaranec cumsum, cummin, cummax were aliases for scan operations for sum, min, and max respectively when single Aggregation class was present. since per algorithm (reduction, scan, group_scan, rolling) is implemented, cumsum, cummin, cummax should not exist. They should be removed. For eg. wherever ScanAggregation::cumsum() is called, it should be replaced by ScanAggregation::sum()

In the interest of keeping this PR down in size, I'll do this work as a second PR. This PR is already a prereq for other high priority work (implementing percentile_approx as a reduction).

@nvdbaranec
Copy link
Contributor Author

nvdbaranec commented Mar 7, 2022

Added an issue for followup (assigned to me)

#10394

Copy link
Contributor

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the java perspective this looks fine to me.

cpp/tests/reductions/reduction_tests.cpp Show resolved Hide resolved
cpp/tests/reductions/reduction_tests.cpp Outdated Show resolved Hide resolved
cpp/tests/reductions/reduction_tests.cpp Outdated Show resolved Hide resolved
@nvdbaranec nvdbaranec requested a review from bdice March 9, 2022 16:09
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One fix for consistency -- otherwise LGTM. Thanks!

cpp/tests/reductions/reduction_tests.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. 👍
Just couple of cpp suggestions.

cpp/include/cudf/aggregation.hpp Show resolved Hide resolved
cpp/src/aggregation/aggregation.cpp Show resolved Hide resolved
@nvdbaranec nvdbaranec requested a review from karthikeyann March 11, 2022 20:48
@nvdbaranec
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit b1ea304 into rapidsai:branch-22.04 Mar 11, 2022
rapids-bot bot pushed a commit that referenced this pull request Mar 14, 2022
Fixes benchmarks compile errors introduced by #10357 

Example:
```
/cudf/cpp/benchmarks/reduction/reduce.cpp: In function ‘void BM_reduction(benchmark::State&, const std::unique_ptr<cudf::aggregation>&)’:
/cudf/cpp/benchmarks/reduction/reduce.cpp:52:46: error: invalid initialization of reference of type ‘const std::unique_ptr<cudf::reduce_aggregation>&’ from expression of type ‘const std::unique_ptr<cudf::aggregation>’
   52 |     auto result = cudf::reduce(input_column, agg, output_dtype);
```

Aggregation types for reduce and scan were modified to include template types.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - https://github.com/nvdbaranec

URL: #10428
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants