-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split batch by key for window operations #2458
Conversation
Signed-off-by: Robert (Bobby) Evans <[email protected]>
Switched to draft. I found a bug in cudf while adding the tests and I want to fix it before adding the tests. I also have a new idea on how to get the framework to insert the chunking only when needed and I want to work on that too. |
I added tests and fixed some issues, including #2473 But if found a bug in cudf as a part of the tests so this is also blocked by rapidsai/cudf#8314 |
I am going to move the test fixes to another PR, and then move this to the next release, and I don't think the benefit is worth the added risk right now. |
Signed-off-by: Robert (Bobby) Evans <[email protected]>
build |
This is actually waiting for a new branch because it is a bit too risky for 21.06 |
build |
build |
sql-plugin/src/main/java/com/nvidia/spark/rapids/GpuColumnVector.java
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/java/com/nvidia/spark/rapids/GpuColumnVectorBase.java
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCoalesceBatches.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCoalesceBatches.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuKeyBatchingIterator.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuKeyBatchingIterator.scala
Show resolved
Hide resolved
build |
@jlowe I have addressed your comments please take another look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a minor nit that's fine to leave, lgtm.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuKeyBatchingIterator.scala
Show resolved
Hide resolved
build |
Signed-off-by: Robert (Bobby) Evans <[email protected]>
This fixes #1856
It makes it so that window operations do not need a
RequireSingleBatch
for anything except when there is no partitioning. This allows us to do window operations where there are a lot of keys, but more data than fits on the GPU. I have not updated the tests yet. I'll post that shortly, but I have done manual testing on very large data sets and it has essentially no performance impact on queries that do fit in GPU memory.