-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update GpuRunningWindowExec to use OOM retry framework #8170
Update GpuRunningWindowExec to use OOM retry framework #8170
Conversation
CheckpointRestore
for BatchedRunningWindowFixer
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExpression.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExec.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Now in GpuWindowIterator we need to add in a retry. That means that the WindowRetrySuite tests will need to change, but I think it would just mean that we need to setup an iterator and call next on it.
Also could you file a follow on issue to fix GpuCachedDoublePassWindowIterator I forgot that I added that in.
I filed #8217 for this. |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExpression.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. No new comments to add
tests/src/test/scala/com/nvidia/spark/rapids/WindowRetrySuite.scala
Outdated
Show resolved
Hide resolved
I've marked this PR as ready for review, but I am running into a segfault after the first unit test passes and I am still trying to track that down. The seg fault occurs in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit
numOutputRows += ret.numRows() | ||
ret | ||
} | ||
// TODO maybe should create spillable batch here before calling computeRunning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Remove the TODO, what we have it fine, especially once the spill code changes are in so that making something spillable is super cheap.
build |
Closes #7809
This PR introduces a retry block in
GpuRunningWindowIterator.computeRunning
around the fixUp code. See comments in code for more information.