-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] libnvcomp is not compiled with PTDS #9534
Comments
Some relevant discussion from an internal slack channel: @jrhemstad - that or we need to pass @abellina - so the jni side changes the value of "default" to 2 if we detect PTDS. But if we are not passing that to the C++ side, then yeah this could happen. @jrhemstad - So we don't have a great fix for this right now as libcudf public APIs don't take streams. The easiest thing would be to use the detail:: API from cuIO and specify cudaStreamPerThread. Alternatively, we could make it so that when libcudf is compiled with PTDS that all internal stream arguments are defaulted to cudaStreamPerThread instead of 0. |
As a workaround, I was able to build libnvcomp.a by adding: @jrhemstad, @abellina should I put up a patch with this change? It should be sufficient as long as we are building nvcomp during the cudf build. |
closes #9534 Change get_nvcomp.cmake to compile with CUDA_API_PER_THREAD_DEFAULT_STREAM is PER_THREAD_DEFAULT_STREAM is defined. I did not add unit tests for this, but I tested it manually by building CUDF and then running a query to verify that PTDS was being used. Authors: - Jim Brennan (https://github.com/jbrennan333) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Vyas Ramasubramani (https://github.com/vyasr) URL: #9540
CUDF does not define CUDA_API_PER_THREAD_DEFAULT_STREAM when building libnvcomp.a, even when -DPER_THREAD_DEFAULT_STREAM=ON is defined.
We found this when comparing nsys traces taken while running tpc-ds benchmarks on spark-rapids with and without LIBCUDF_USE_NVCOMP=1 defined. The cudf snappy kernels use per thread default streams, but nvcomp was using only 1 stream (7). As a result, most queries were slower with LIBCUDF_USE_NVCOMP=1, which is not what we expect.
Expected behavior
When you compile CUDF with -DPER_THREAD_DEFAULT_STREAM=ON, libnvcomp.a should be compiled with CUDA_API_PER_THREAD_DEFAULT_STREAM. We expect that with this change, most queries will be at least as fast if not faster when run with LIBCUDF_USE_NVCOMP=1.
The text was updated successfully, but these errors were encountered: