Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate column size overflow exception to JNI #13911

Merged
merged 4 commits into from
Aug 21, 2023

Conversation

mythrocks
Copy link
Contributor

When a CUDF operation causes a column's row count to exceed the size limit imposed by cudf::size_type, the operation throws a std::overflow_error exception. However, prior to this commit, CUDF JNI did not translate this to a separate Java exception. Because of handling this condition as any generic exception, there was no way to attempt case specific recovery for overflow conditions.

This commit translates std::overflow_error into a new Java exception (CudfColumnOverflowException) that may be caught in user space to attempt recovery/retry.

This is a non-breaking change. The user-facing change is minimal in that existing failure handling based on catching CudfException will continue to work as before. The user will now have more fine grained error handling by catching CudfColumnOverflowException.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

When a CUDF operation causes a column's row count to exceed the size limit
imposed by `cudf::size_type`, the operation throws a `std::overflow_error`
exception. However, prior to this commit, CUDF JNI did not translate this
to a separate Java exception. Because of handling this condition as
any generic exception, there was no way to attempt case specific recovery
for overflow conditions.

This commit translates `std::overflow_error` into a new Java exception
(`CudfColumnOverflowException`) that may be caught in user space
to attempt recovery/retry.

Signed-off-by: MithunR <[email protected]>
@mythrocks mythrocks added feature request New feature or request 3 - Ready for Review Ready for review by team Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Aug 17, 2023
@mythrocks mythrocks self-assigned this Aug 17, 2023
@mythrocks mythrocks requested a review from a team as a code owner August 17, 2023 22:56
@mythrocks mythrocks requested a review from ttnghia August 18, 2023 21:58
mythrocks added a commit to mythrocks/spark-rapids that referenced this pull request Aug 21, 2023
Depends on rapidsai/cudf#13911.

When a CUDF operation causes a column's size to exceed the valid range
for CUDF columns (i.e. cudf::size_type), CUDF will throw an exception.

Prior to this commit, the `RmmRapidsRetryIterator` does not attempt retries
with smaller splits, in this case. Instead, the overflow is treated as
a generic exception.

This commit allows the RmmRapidsRetryIterator to recognize the exception
specific to the overflow case (i.e. `CudfColumnSizeOverflowException`),
and attempt a split-retry.

Note: This error condition is difficult to reproduce. The catch/retry is
a "best effort" attempt not to fail the entire task.

Signed-off-by: MithunR <[email protected]>
mythrocks added a commit to mythrocks/spark-rapids that referenced this pull request Aug 21, 2023
Depends on rapidsai/cudf#13911.

When a CUDF operation causes a column's size to exceed the valid range
for CUDF columns (i.e. cudf::size_type), CUDF will throw an exception.

Prior to this commit, the `RmmRapidsRetryIterator` does not attempt retries
with smaller splits, in this case. Instead, the overflow is treated as
a generic exception.

This commit allows the RmmRapidsRetryIterator to recognize the exception
specific to the overflow case (i.e. `CudfColumnSizeOverflowException`),
and attempt a split-retry.

Note: This error condition is difficult to reproduce. The catch/retry is
a "best effort" attempt not to fail the entire task.

Signed-off-by: MithunR <[email protected]>
@mythrocks
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 55a4ecf into rapidsai:branch-23.10 Aug 21, 2023
@mythrocks
Copy link
Contributor Author

Thank you for the reviews, @ttnghia, @gerashegalov. This change is now merged.

mythrocks added a commit to mythrocks/spark-rapids that referenced this pull request Aug 24, 2023
Depends on rapidsai/cudf#13911.

When a CUDF operation causes a column's size to exceed the valid range
for CUDF columns (i.e. cudf::size_type), CUDF will throw an exception.

Prior to this commit, the `RmmRapidsRetryIterator` does not attempt retries
with smaller splits, in this case. Instead, the overflow is treated as
a generic exception.

This commit allows the RmmRapidsRetryIterator to recognize the exception
specific to the overflow case (i.e. `CudfColumnSizeOverflowException`),
and attempt a split-retry.

Note: This error condition is difficult to reproduce. The catch/retry is
a "best effort" attempt not to fail the entire task.

Signed-off-by: MithunR <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team feature request New feature or request Java Affects Java cuDF API. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants