Skip to content

Commit

Permalink
Intentionally leak static CUDA resources to avoid crash (part 2) (#462)
Browse files Browse the repository at this point in the history
The NVbench application `PARQUET_READER_NVBENCH` in libcudf currently crashes with the segmentation fault. To reproduce:

```
./PARQUET_READER_NVBENCH -d 0 -b 1 --run-once -a io_type=FILEPATH -a compression_type=SNAPPY -a cardinality=0 -a run_length=1
```
 
The root cause is that some (1) `thread_local`  objects on the main thread in `libcudf` and (2) `static` objects in `kvikio` are destroyed after `cudaDeviceReset()` in NVbench and upon program termination. These objects should simply be leaked, since their destructors making CUDA calls upon program termination constitutes UB in CUDA.

This simple PR is the kvikIO side of the fix. The other part is done here rapidsai/cudf#16787.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: #462
  • Loading branch information
kingcrimsontianyu authored Sep 12, 2024
1 parent 59edda0 commit 9d352ef
Showing 1 changed file with 8 additions and 10 deletions.
18 changes: 8 additions & 10 deletions cpp/include/kvikio/posix_io.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,16 +42,14 @@ class StreamsByThread {

public:
StreamsByThread() = default;
~StreamsByThread() noexcept
{
for (auto& [_, stream] : _streams) {
try {
CUDA_DRIVER_TRY(cudaAPI::instance().StreamDestroy(stream));
} catch (const CUfileException& e) {
std::cerr << e.what() << std::endl;
}
}
}

// Here we intentionally do not destroy in the destructor the CUDA resources
// (e.g. CUstream) with static storage duration, but instead let them leak
// on program termination. This is to prevent undefined behavior in CUDA. See
// <https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#initialization>
// This also prevents crash (segmentation fault) if clients call
// cuDevicePrimaryCtxReset() or cudaDeviceReset() before program termination.
~StreamsByThread() = default;

static CUstream get(CUcontext ctx, std::thread::id thd_id)
{
Expand Down

0 comments on commit 9d352ef

Please sign in to comment.