-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] pinned_host_vector can cause abrupt program termination #14165
Comments
I believe this example hits the We could adjust deallocate to not throw, catch the exception in the destructor (which is currently empty), or perhaps choose another option that we haven't yet considered. |
This is what RMM's memory allocators do. Allocate functions always check, but deallocate is a debug assert that does nothing in a release build. See https://github.com/rapidsai/rmm/blob/branch-23.10/include/rmm/mr/device/cuda_memory_resource.hpp#L69-L88 as an example. |
Fixes rapidsai#14165 The deallocate function is called by the `pinned_host_vector`. Throwing from destructors is bad since they can't be caught, and generally get converted into runtime sig aborts.
Fixes #14165 The deallocate function is called by the `pinned_host_vector`. Throwing from destructors is bad since they can't be caught, and generally get converted into runtime sig aborts. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - David Wendt (https://github.com/davidwendt) - Divye Gala (https://github.com/divyegala) - Mike Wilson (https://github.com/hyperbolic2346) URL: #14251
Describe the bug
pinned_host_vector can throw from within it's destructor, causing the application process to terminate abruptly. For example, this is causing abnormal termination of Spark executors when a GPU illegal access occurs during a Parquet read. Since the executor process is abruptly terminated via low-level
abort()
, there's no chance to convey a useful message from the driver to the executor otherwise try to handle at the application level thecudf::fatal_cuda_error
exception that is being thrown.Steps/Code to reproduce bug
Compile and run the following program:
which will produce the following output:
Expected behavior
Exceptions should not be thrown from destructors and cause an application to be terminated abruptly with no chance for the application to shutdown gracefully.
The text was updated successfully, but these errors were encountered: