Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] libnvcomp is not compiled with PTDS #9534

Closed
jbrennan333 opened this issue Oct 27, 2021 · 3 comments · Fixed by #9540
Closed

[BUG] libnvcomp is not compiled with PTDS #9534

jbrennan333 opened this issue Oct 27, 2021 · 3 comments · Fixed by #9540
Labels
bug Something isn't working cuIO cuIO issue Performance Performance related issue

Comments

@jbrennan333
Copy link
Contributor

CUDF does not define CUDA_API_PER_THREAD_DEFAULT_STREAM when building libnvcomp.a, even when -DPER_THREAD_DEFAULT_STREAM=ON is defined.

We found this when comparing nsys traces taken while running tpc-ds benchmarks on spark-rapids with and without LIBCUDF_USE_NVCOMP=1 defined. The cudf snappy kernels use per thread default streams, but nvcomp was using only 1 stream (7). As a result, most queries were slower with LIBCUDF_USE_NVCOMP=1, which is not what we expect.

Expected behavior
When you compile CUDF with -DPER_THREAD_DEFAULT_STREAM=ON, libnvcomp.a should be compiled with CUDA_API_PER_THREAD_DEFAULT_STREAM. We expect that with this change, most queries will be at least as fast if not faster when run with LIBCUDF_USE_NVCOMP=1.

@jbrennan333 jbrennan333 added bug Something isn't working Needs Triage Need team to review and classify labels Oct 27, 2021
@jbrennan333
Copy link
Contributor Author

Some relevant discussion from an internal slack channel:
@abellina - Given that nvcomp is a separate compilation unit in cuDF, do we need to define CUDA_API_PER_THREAD_DEFAULT_STREAM for it? @jbrennan333 shared some nsys traces he obtained using the "use nvcomp" switch for unsnappy, and those kernels ended up in the Default Stream (7), rather than in our other per task streams.

@jrhemstad - that or we need to pass cudaStreamPerThread for any stream argument to nvcomp functions. I'm almost certain we're not passing in cudaStreamPerThread to nvcomp APIs. In libcudf we'd just be passing in the stream argument from the detail API which defaults to 0. The meaning of 0 == PTDS gets lost when crossing precompiled library boundaries.

@abellina - so the jni side changes the value of "default" to 2 if we detect PTDS. But if we are not passing that to the C++ side, then yeah this could happen.

@jrhemstad - So we don't have a great fix for this right now as libcudf public APIs don't take streams. The easiest thing would be to use the detail:: API from cuIO and specify cudaStreamPerThread. Alternatively, we could make it so that when libcudf is compiled with PTDS that all internal stream arguments are defaulted to cudaStreamPerThread instead of 0.

@jbrennan333
Copy link
Contributor Author

Screenshot from 2021-10-27 08-37-42

Attaching a screeenshot from one of the nsys traces.

@jbrennan333
Copy link
Contributor Author

As a workaround, I was able to build libnvcomp.a by adding:
target_compile_definitions(nvcomp PUBLIC CUDA_API_PER_THREAD_DEFAULT_STREAM)
to find_and_configure_nvcomp in get_nvcomp.cmake.

@jrhemstad, @abellina should I put up a patch with this change? It should be sufficient as long as we are building nvcomp during the cudf build.

@jbrennan333 jbrennan333 added the Performance Performance related issue label Oct 27, 2021
@devavret devavret added cuIO cuIO issue and removed Needs Triage Need team to review and classify labels Oct 27, 2021
@rapids-bot rapids-bot bot closed this as completed in #9540 Nov 1, 2021
rapids-bot bot pushed a commit that referenced this issue Nov 1, 2021
closes #9534 

Change get_nvcomp.cmake to compile with CUDA_API_PER_THREAD_DEFAULT_STREAM is PER_THREAD_DEFAULT_STREAM is defined.

I did not add unit tests for this, but I tested it manually by building CUDF and then running a query to verify that PTDS was being used.

Authors:
  - Jim Brennan (https://github.com/jbrennan333)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #9540
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue Performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants