-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support nth_element
for window functions
#11158
Conversation
This is to address spark-rapids/issues/4005 and spark-rapids/issues/5061. This change adds support for `NTH_ELEMENT` window aggregations. This should allow for the implementation of `FIRST()`, `LAST()`, and `NTH_VALUE()` window functions in Spark SQL. `NTH_ELEMENT` in window function returns the Nth element from the specified window for each row in a column. `N` is deemed to be zero based, so `NTH_ELEMENT(0)` translates to the first element in a window. Similarly, `NTH_ELEMENT(-1)` translates to the last. If for any window of size `W`, if the specified `N` falls outside the range `[ -W, W-1 ]`, a null element is returned for that row.
b6295fe
to
7d02895
Compare
Codecov Report
@@ Coverage Diff @@
## branch-22.08 #11158 +/- ##
===============================================
Coverage ? 86.30%
===============================================
Files ? 144
Lines ? 22698
Branches ? 0
===============================================
Hits ? 19589
Misses ? 3109
Partials ? 0 Continue to review full report at Codecov.
|
Looks like this more than doubles the compile time for |
The snag is that this is a function template, whose type parameters are transform iterators. I'll take a crack at this. |
More readable.
This commit adds the JNI bindings required to invoke `NTH_ELEMENT` aggregations as a window aggregation. Java tests are included. Depends on rapidsai#11158.
@davidwendt, @ttnghia, thank you for the prompt reviews. I think I've addressed your concerns here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMake LGTM.
I was a bit taken aback by the _detail
named files, but it looks like that's not unique to this PR. We should probably be more consistent about using a detail/
directory, even in src/
, rather than putting it in the filename. But out of scope for this PR.
1. Using #pragma once. 2. Doxygen format. 3. Renamed gather map functor. 4. Materialized gather indices.
I started with making this change in this PR, but I now agree that this is peripheral to the current change. I've filed #11211 to track this. To clarify: The changes required here would end up moving some of the files with substantial changes, and distract from the main change here. I'll address this after the current change has been merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one more minor issue. Otherwise it's good.
Also, switched to use gathered.front().
Thank you for the reviews, @davidwendt, @ttnghia, @vyasr. I'll merge this now. |
@gpucibot merge |
Depends on #11158. This commit adds the JNI bindings required to invoke `NTH_ELEMENT` aggregations as a window aggregation. Java tests are included. Authors: - MithunR (https://github.com/mythrocks) Approvers: - Ryan Lee (https://github.com/rwlee) - Raza Jafri (https://github.com/razajafri) URL: #11201
Fixes #9643.
This is to address spark-rapids/issues/4005 and
spark-rapids/issues/5061.
This change adds support for
NTH_ELEMENT
window aggregations.This should allow for the implementation of
FIRST()
,LAST()
,and
NTH_VALUE()
window functions in Spark SQL.NTH_ELEMENT
in window function returns the Nth element from thespecified window for each row in a column.
N
is deemed to bezero based, so
NTH_ELEMENT(0)
translates to the first elementin a window. Similarly,
NTH_ELEMENT(-1)
translates to the last.If for any window of size
W
, if the specifiedN
falls outsidethe range
[ -W, W-1 ]
, a null element is returned for that row.