Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DECIMAL order-by for RANGE window functions #11645

Merged
merged 16 commits into from
Sep 13, 2022

Conversation

mythrocks
Copy link
Contributor

@mythrocks mythrocks commented Sep 2, 2022

Description

CUDF grouped RANGE window functions currently support only
integral types and timestamps as the ORDER BY (OBY) column.

This commit adds support for DECIMAL types (i.e. decimal32,
decimal64, and decimal128) to be used as the ORDER BY
column in RANGE window functions.

This feature allows spark-rapids to address NVIDIA/spark-rapids#6400.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

CUDF grouped RANGE window functions currently support only
integral types and timestamps as the ORDER BY (OBY) column.

This commit adds support for DECIMAL types (i.e. decimal32,
decimal64, and decimal128) columns to by used as the ORDER BY
column in RANGE window functions.
@mythrocks mythrocks added feature request New feature or request 2 - In Progress Currently a work in progress non-breaking Non-breaking change labels Sep 2, 2022
@mythrocks mythrocks self-assigned this Sep 2, 2022
@mythrocks mythrocks requested a review from a team as a code owner September 2, 2022 17:57
@github-actions github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels Sep 2, 2022
@mythrocks mythrocks marked this pull request as draft September 2, 2022 17:58
@codecov
Copy link

codecov bot commented Sep 2, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.10@488c7ad). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Current head 40238b8 differs from pull request most recent head 283e76b. Consider uploading reports for the commit 283e76b to get more accurate results

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-22.10   #11645   +/-   ##
===============================================
  Coverage                ?   86.42%           
===============================================
  Files                   ?      145           
  Lines                   ?    23014           
  Branches                ?        0           
===============================================
  Hits                    ?    19890           
  Misses                  ?     3124           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@github-actions github-actions bot added the Java Affects Java cuDF API. label Sep 8, 2022
@mythrocks
Copy link
Contributor Author

With the benefit of hindsight, I see that I could have also supported decimal range bounds with scales lower than that of the order-by column, simply by scaling in the other direction.

The fear at the time was the loss of precision when scaling to a higher scale, but that should also apply in the other direction.

I am not rocking the boat now. I'll raise an issue to tackle that separately from this PR.

@mythrocks mythrocks added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Sep 9, 2022
@mythrocks mythrocks changed the title WIP: Support DECIMAL order-by for RANGE window functions Support DECIMAL order-by for RANGE window functions Sep 9, 2022
@mythrocks mythrocks marked this pull request as ready for review September 9, 2022 22:10
@mythrocks mythrocks requested a review from a team as a code owner September 9, 2022 22:10
Also, fixed grammar error in a code comment.
Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice and clean, thanks for making this change!

cpp/src/rolling/detail/range_window_bounds.hpp Outdated Show resolved Hide resolved
cpp/tests/rolling/grouped_rolling_range_test.cpp Outdated Show resolved Hide resolved
@mythrocks mythrocks removed the request for review from robertmaynard September 12, 2022 19:23
1. Better documentation of how decimals are compared.
2. enable_if_t<> => CUDF_ENABLE_IF().
3. Better names for test typedefs.
@ttnghia ttnghia self-requested a review September 13, 2022 05:07
1. Removed extra whitespaces, unnecessary comments.
2. Fixed copyright date.
3. Reused `big()` helper function.
4. Updated `Scalar.equals()` to include DECIMAL128.

Signed-off-by: MithunR <[email protected]>
@mythrocks mythrocks requested a review from jlowe September 13, 2022 18:22
@mythrocks
Copy link
Contributor Author

Thank you for the reviews, all. I'll merge this change after CI is done cranking.

@mythrocks
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 69cb31d into rapidsai:branch-22.10 Sep 13, 2022
rapids-bot bot pushed a commit that referenced this pull request Sep 19, 2022
…11710)

As a follow-up to #11645, this change includes `DECIMAL` among the list of data-types that may be used in the order-by column for `RANGE`-based window functions, listed in the exception message.

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - Gera Shegalov (https://github.com/gerashegalov)
  - Nghia Truong (https://github.com/ttnghia)

URL: #11710
rapids-bot bot pushed a commit that referenced this pull request Jun 20, 2023
This commit adds support for `FLOAT32` and `FLOAT64` order-by columns for RANGE-based window functions.

## Background
Up until this commit, order-by columns for RANGE window functions were allowed to be integral numerics, timestamps, or strings (for unbounded/current rows).

With this commit, window functions will be permitted to run on floating point value ranges. E.g. This supports windows defined with floating point deltas, like `rows with values exceeding the current row by no more than 3.14f`.

This is in the same vein as the support for `DECIMAL` (#11645) and `STRING` (#13143).

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - David Wendt (https://github.com/davidwendt)

URL: #13512
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue feature request New feature or request Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants