-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add in running window optimization using scan #2895
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Robert (Bobby) Evans <[email protected]>
I tested this on databricks and it works there too. |
jlowe
reviewed
Jul 9, 2021
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExpression.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExpression.scala
Outdated
Show resolved
Hide resolved
build |
jlowe
approved these changes
Jul 9, 2021
With the review work I accidentally checked in a change that makes this require the fix from rapidsai/cudf#8705. I am inclined to wait for it to get merged in, but if others want to merge this in sooner I can revert the small change and do a follow on PR when the cudf change does get merged in. |
build |
build |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Spark optimizes running windows to have a linear time algorithm. When discussing this with cudf rapidsai/cudf#8440 it was decided to use scan and segmented_scan (group by scan). This puts in a framework for this and adds in a few initial implementations.
In performance tests on my local box row_number and count are only slightly faster now than they were under window, but min, max, and sum all show significant performance gains similar to those I showed were possible in rapidsai/cudf#8440
In a large max running window with no partition by I have seen performance improvements of 171x faster cold and 542x faster hot compared to the CPU
This is a special case because when no partition is given the data all goes to a single task, so it needs a single core to process the data. But as you can see it would still take hundreds of CPU cores in the partitioned cast to offset the performance gains.
On the previous GPU code I could not run this because I had to kill it before my GPU overheated.
This is stepping stone to be able to support rank and dense_rank