JNI Aggregation Type Changes #8919

revans2 · 2021-08-02T13:41:47Z

CUDF is in the process of tagging aggregations with different classes to make it a compile error to use the wrong aggregation with the wrong API. The first set of changes were for rolling windows and when I did the corresponding java changes I ended up using generics and interfaces to try and replicate the same thing. This didn't work out as well as I had hoped and the code to use them ended up being more cumbersome thqn I wanted.

This patch adjusts it so we have truly separate classes for each type of aggregation. It adds more code here, but it makes the code that uses these cleaner.

This is a breaking change and I will be putting up a corresponding change in the Spark plugin to deal with this.

abellina

LGTM

jlowe

Overall looks good to me. Is there utility in having an interface that all aggregations derive from that allows applications to track aggregations regardless of type? The use of unconstrained generics in the corresponding RAPIDS Accelerator change made me wonder if there's any utility in applications being able to tell something is an aggregation even though there are no common, public methods across aggregations. I suppose we can add this later if it's needed.

java/src/main/java/ai/rapids/cudf/GroupByAggregation.java

revans2 · 2021-08-02T15:18:33Z

Is there utility in having an interface that all aggregations derive from that allows applications to track aggregations regardless of type?

We do not have that in the Spark code right now. Each aggregation is tracked separately. Some of this comes from how distributed group by aggregations work in two phases, which requires a mapping from one Spark aggregation to at least two separate cudf aggregations. But there are further issues complicated by cudf. Semantically a single aggregation like min or max does not behaved the same way from one API to another. A lot of this ends up being like corner cases for null handling and the like. They are all things that can be worked around. This makes it difficult to have a single mapping, or API to say that spark.Max maps to cudf.Max. Also because some aggregations are only available under some aggregation APIs, then we still need a way to know which are and are not supported. Generally it is simplest and more flexible in this case to just have separate APIs, which is what we ended up with for Spark.

I'm also not sure how you would use this. The only thing I can think of is giving the user the option for runtime checking for aggregations instead of compile time checking. I can do that, but I want to make sure that I am designing the APIs for the desired use case instead of adding in flexibility without knowing why.

jlowe · 2021-08-02T15:22:09Z

The use case I had in mind was primarily to allow constraining the generic in the RAPIDS Accelerator code so you couldn't just pass anything to it at compile time - it needs to be some kind of aggregation. It's a very minor point, and I'm fine with it as-is.

codecov · 2021-08-02T15:37:58Z

Codecov Report

Merging #8919 (b155e6d) into branch-21.10 (18f7c01) will decrease coverage by 0.08%.
The diff coverage is n/a.

❗ Current head b155e6d differs from pull request most recent head bfe2338. Consider uploading reports for the commit bfe2338 to get more accurate results

@@               Coverage Diff                @@
##           branch-21.10    #8919      +/-   ##
================================================
- Coverage         10.67%   10.59%   -0.09%     
================================================
  Files               110      116       +6     
  Lines             18271    19043     +772     
================================================
+ Hits               1951     2017      +66     
- Misses            16320    17026     +706

Impacted Files	Coverage Δ
python/cudf/cudf/__init__.py	`0.00% <ø> (ø)`
python/cudf/cudf/core/__init__.py	`0.00% <ø> (ø)`
python/cudf/cudf/core/column/categorical.py	`0.00% <ø> (ø)`
python/cudf/cudf/core/column/column.py	`0.00% <ø> (ø)`
python/cudf/cudf/core/column/lists.py	`0.00% <ø> (ø)`
python/cudf/cudf/core/column/numerical.py	`0.00% <ø> (ø)`
python/cudf/cudf/core/column/string.py	`0.00% <ø> (ø)`
python/cudf/cudf/core/column/struct.py	`0.00% <ø> (ø)`
python/cudf/cudf/core/dataframe.py	`0.00% <ø> (ø)`
python/cudf/cudf/core/frame.py	`0.00% <ø> (ø)`
... and 75 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56c4a2e...bfe2338. Read the comment docs.

revans2 · 2021-08-04T12:40:19Z

@gpucibot merge

revans2 added 6 commits August 2, 2021 07:13

Refactor rolling aggregations to make it cleaner

2f696a9

Grouped Scan

0c4cda4

Scan Aggregation

9a33623

ReductionAggregation

707e4ed

Group by aggregtion

bfdd972

Group by aggregation file missed

023cfdd

revans2 added 3 - Ready for Review Ready for review by team Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS 4 - Needs cuDF (Java) Reviewer improvement Improvement / enhancement to an existing function breaking Breaking change labels Aug 2, 2021

revans2 self-assigned this Aug 2, 2021

revans2 requested a review from a team as a code owner August 2, 2021 13:41

revans2 mentioned this pull request Aug 2, 2021

Cudf agg type changes NVIDIA/spark-rapids#3109

Merged

abellina approved these changes Aug 2, 2021

View reviewed changes

jlowe reviewed Aug 2, 2021

View reviewed changes

java/src/main/java/ai/rapids/cudf/GroupByAggregation.java Outdated Show resolved Hide resolved

Addressed review comments

bfe2338

jlowe approved these changes Aug 2, 2021

View reviewed changes

rapids-bot bot merged commit b9820f1 into rapidsai:branch-21.10 Aug 4, 2021

revans2 deleted the jni_agg_type_changes_standalone branch August 4, 2021 12:40

vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuDF (Java) Reviewer labels Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JNI Aggregation Type Changes #8919

JNI Aggregation Type Changes #8919

revans2 commented Aug 2, 2021

abellina left a comment

jlowe left a comment

revans2 commented Aug 2, 2021

jlowe commented Aug 2, 2021

codecov bot commented Aug 2, 2021 •

edited

Loading

revans2 commented Aug 4, 2021

JNI Aggregation Type Changes #8919

JNI Aggregation Type Changes #8919

Conversation

revans2 commented Aug 2, 2021

abellina left a comment

Choose a reason for hiding this comment

jlowe left a comment

Choose a reason for hiding this comment

revans2 commented Aug 2, 2021

jlowe commented Aug 2, 2021

codecov bot commented Aug 2, 2021 • edited Loading

Codecov Report

revans2 commented Aug 4, 2021

codecov bot commented Aug 2, 2021 •

edited

Loading