-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry with smaller split on CudfColumnSizeOverflowException
#9085
Retry with smaller split on CudfColumnSizeOverflowException
#9085
Conversation
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RmmRapidsRetryIterator.scala
Outdated
Show resolved
Hide resolved
122370c
to
e64056a
Compare
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RmmRapidsRetryIterator.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RmmRapidsRetryIterator.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a very minor nit, otherwise this looks good to me.
(CI will continue to fail on this PR, until rapidsai/cudf#13911 is merged, and available in |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RmmRapidsRetryIterator.scala
Show resolved
Hide resolved
build |
Examining the test failure now. |
(I'm not sure how the whole world got added as reviewers for this PR. I'll request the original reviewers for review.) |
Build |
Look at git, it's counting a bunch of other commits here. My guess is that via some sort of rebase the PR ended up counting all of that in the diff, hence it thinks there are changes to a pom file (which brings a whole group of reviewers with). You could try rebasing your changes on top of branch-23.10 HEAD and force pushing. I know that's a no-no, but it will clear up the history for this change. |
Argh. Let me try doing a rebase and a force-push. |
Depends on rapidsai/cudf#13911. When a CUDF operation causes a column's size to exceed the valid range for CUDF columns (i.e. cudf::size_type), CUDF will throw an exception. Prior to this commit, the `RmmRapidsRetryIterator` does not attempt retries with smaller splits, in this case. Instead, the overflow is treated as a generic exception. This commit allows the RmmRapidsRetryIterator to recognize the exception specific to the overflow case (i.e. `CudfColumnSizeOverflowException`), and attempt a split-retry. Note: This error condition is difficult to reproduce. The catch/retry is a "best effort" attempt not to fail the entire task. Signed-off-by: MithunR <[email protected]>
db98518
to
ad48d46
Compare
Build |
Build |
Depends on rapidsai/cudf#13911.
When a CUDF operation causes a column's size to exceed the valid range for CUDF columns (i.e.
cudf::size_type
), CUDF will throw an exception.Prior to this commit, the
RmmRapidsRetryIterator
does not attempt retries with smaller splits in this case. Instead, the overflow is treated as a generic exception.This commit allows the
RmmRapidsRetryIterator
to recognize the exception specific to the overflow case (i.e.CudfColumnSizeOverflowException
), and attempt a split-retry.Note: This error condition is difficult to reproduce. The catch/retry is a best effort attempt not to fail the entire task.