Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JNI Aggregation Type Changes #8919

Merged

Conversation

revans2
Copy link
Contributor

@revans2 revans2 commented Aug 2, 2021

CUDF is in the process of tagging aggregations with different classes to make it a compile error to use the wrong aggregation with the wrong API. The first set of changes were for rolling windows and when I did the corresponding java changes I ended up using generics and interfaces to try and replicate the same thing. This didn't work out as well as I had hoped and the code to use them ended up being more cumbersome thqn I wanted.

This patch adjusts it so we have truly separate classes for each type of aggregation. It adds more code here, but it makes the code that uses these cleaner.

This is a breaking change and I will be putting up a corresponding change in the Spark plugin to deal with this.

@revans2 revans2 added 3 - Ready for Review Ready for review by team Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS 4 - Needs cuDF (Java) Reviewer improvement Improvement / enhancement to an existing function breaking Breaking change labels Aug 2, 2021
@revans2 revans2 self-assigned this Aug 2, 2021
@revans2 revans2 requested a review from a team as a code owner August 2, 2021 13:41
Copy link
Contributor

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. Is there utility in having an interface that all aggregations derive from that allows applications to track aggregations regardless of type? The use of unconstrained generics in the corresponding RAPIDS Accelerator change made me wonder if there's any utility in applications being able to tell something is an aggregation even though there are no common, public methods across aggregations. I suppose we can add this later if it's needed.

java/src/main/java/ai/rapids/cudf/GroupByAggregation.java Outdated Show resolved Hide resolved
@revans2
Copy link
Contributor Author

revans2 commented Aug 2, 2021

Is there utility in having an interface that all aggregations derive from that allows applications to track aggregations regardless of type?

We do not have that in the Spark code right now. Each aggregation is tracked separately. Some of this comes from how distributed group by aggregations work in two phases, which requires a mapping from one Spark aggregation to at least two separate cudf aggregations. But there are further issues complicated by cudf. Semantically a single aggregation like min or max does not behaved the same way from one API to another. A lot of this ends up being like corner cases for null handling and the like. They are all things that can be worked around. This makes it difficult to have a single mapping, or API to say that spark.Max maps to cudf.Max. Also because some aggregations are only available under some aggregation APIs, then we still need a way to know which are and are not supported. Generally it is simplest and more flexible in this case to just have separate APIs, which is what we ended up with for Spark.

I'm also not sure how you would use this. The only thing I can think of is giving the user the option for runtime checking for aggregations instead of compile time checking. I can do that, but I want to make sure that I am designing the APIs for the desired use case instead of adding in flexibility without knowing why.

@jlowe
Copy link
Member

jlowe commented Aug 2, 2021

The use case I had in mind was primarily to allow constraining the generic in the RAPIDS Accelerator code so you couldn't just pass anything to it at compile time - it needs to be some kind of aggregation. It's a very minor point, and I'm fine with it as-is.

@codecov
Copy link

codecov bot commented Aug 2, 2021

Codecov Report

Merging #8919 (b155e6d) into branch-21.10 (18f7c01) will decrease coverage by 0.08%.
The diff coverage is n/a.

❗ Current head b155e6d differs from pull request most recent head bfe2338. Consider uploading reports for the commit bfe2338 to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.10    #8919      +/-   ##
================================================
- Coverage         10.67%   10.59%   -0.09%     
================================================
  Files               110      116       +6     
  Lines             18271    19043     +772     
================================================
+ Hits               1951     2017      +66     
- Misses            16320    17026     +706     
Impacted Files Coverage Δ
python/cudf/cudf/__init__.py 0.00% <ø> (ø)
python/cudf/cudf/core/__init__.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/categorical.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/column.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/lists.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/numerical.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/string.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/struct.py 0.00% <ø> (ø)
python/cudf/cudf/core/dataframe.py 0.00% <ø> (ø)
python/cudf/cudf/core/frame.py 0.00% <ø> (ø)
... and 75 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56c4a2e...bfe2338. Read the comment docs.

@revans2
Copy link
Contributor Author

revans2 commented Aug 4, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit b9820f1 into rapidsai:branch-21.10 Aug 4, 2021
@revans2 revans2 deleted the jni_agg_type_changes_standalone branch August 4, 2021 12:40
@vyasr vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuDF (Java) Reviewer labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond breaking Breaking change improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants