Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA][JNI] Throw a specific java exception for OOM #11970

Closed
Tracked by #6746
abellina opened this issue Oct 21, 2022 · 1 comment
Closed
Tracked by #6746

[FEA][JNI] Throw a specific java exception for OOM #11970

abellina opened this issue Oct 21, 2022 · 1 comment
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS

Comments

@abellina
Copy link
Contributor

Currently we catch rmm::out_of_memory here: https://github.com/rapidsai/cudf/blob/branch-22.12/java/src/main/native/include/jni_utils.hpp#L853, and then we throw java.lang.OutOfMemoryError.

The problem is that this exception type is normally used for the Java heap (but we are using for GPU memory as well), as pointed out by @gerashegalov here: NVIDIA/spark-rapids#6810 (comment).

We'd like to fix this by throwing a cuDF-JNI specific exception for GPU OOM (e.g. RapidsGpuOutOfMemory), and likely adding other exceptions for HOST. We would like to include the amount of memory that we attempted to allocate (ideally from the RMM exception rapidsai/rmm#1134), but we may have enough info in cuDF to work around it.

The issue would also try/catch java.lang.OutOfMemory as thrown by UNSAFE.allocateMemory here: https://github.com/rapidsai/cudf/blob/branch-22.12/java/src/main/java/ai/rapids/cudf/UnsafeMemoryAccessor.java#L79, and instead throw something like RapidsHostOutOfMemory.

@abellina abellina added feature request New feature or request Needs Triage Need team to review and classify Java Affects Java cuDF API. labels Oct 21, 2022
@GregoryKimball GregoryKimball added 0 - Backlog In queue waiting for assignment Spark Functionality that helps Spark RAPIDS and removed Needs Triage Need team to review and classify labels Oct 30, 2022
@abellina
Copy link
Contributor Author

We are going to be rethinking the exceptions and exception handling as part of NVIDIA/spark-rapids#7252, so we'll close this issue and open new ones if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

No branches or pull requests

2 participants