Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Ensure that all of cudf's algorithms run on cudf's default stream by default #11942

Closed
vyasr opened this issue Oct 18, 2022 · 0 comments · Fixed by #11875
Closed

[FEA] Ensure that all of cudf's algorithms run on cudf's default stream by default #11942

vyasr opened this issue Oct 18, 2022 · 0 comments · Fixed by #11875
Assignees
Labels
2 - In Progress Currently a work in progress feature request New feature or request

Comments

@vyasr
Copy link
Contributor

vyasr commented Oct 18, 2022

Is your feature request related to a problem? Please describe.
libcudf currently runs entirely on CUDA's default stream. Internally, this is managing using an rmm::cuda_stream_view stored as a global variable used everywhere. We would like to move towards enabling stream-ordered libcudf APIs. In order to enable this safely, we need to ensure that libcudf is properly using its default stream everywhere in the codebase.

Describe the solution you'd like
We should write a separate library that overrides existing CUDA functions (via dlsym) to throw an error if they are invoked using CUDA's default stream. We should then preload this library when running libcudf's test suite so that our unit tests can verify that we are properly passing cudf::default_stream_value through to everywhere that it needs to be in the code. We will need to make cudf's default stream configurable so that it can be set to a non-default value when running the tests.

Describe alternatives you've considered
None

@vyasr vyasr added feature request New feature or request 2 - In Progress Currently a work in progress labels Oct 18, 2022
@vyasr vyasr added this to the Enable streams milestone Oct 18, 2022
@vyasr vyasr self-assigned this Oct 18, 2022
rapids-bot bot pushed a commit that referenced this issue Oct 21, 2022
This PR ensures that cudf's default stream is properly passed to all kernel launches so that nothing implicitly runs on the CUDA default stream. It adds a small library that is built during the tests and overloads CUDA functions to throw an exception when usage of the default stream is detected. It also fixes all remaining usage of anything other than cudf's default stream (I fixed most of the issues in previous PRs, but I found a few others when finalizing this one).

Resolves #11929
Resolves #11942 

### Important notes for reviewers:
- **The changeset is deceptively large.** The vast majority of the changes are just a global find-and-replace of `cudf::get_default_stream()` for `cudf::default_stream_value`, as well as a few smaller fixes such as missing `CUDF_TEST_PROGRAM_MAIN` in a couple of tests and usage of `rmm::cuda_stream_default`. The meaningful changes are:
    - The new default stream getter/setter in `default_stream.[hpp|cpp]`
    - The addition of `cpp/tests/utilities/identify_stream_usage`
    - The changes to the base testing fixture in `cpp/include/cudf_test/base_fixture.hpp` to inject the custom stream.
    - The changes to CI in `ci/gpu/build.sh` to build and use the new library.
- This PR is a breaking change because it moves the default stream into the detail namespace. Going forward the default stream may only be accessed using the public accessor `cudf::get_default_stream()`. I have added a corresponding setter, but it is also in the detail namespace since I do not want to publicly support changing the default stream yet, only for the purpose of testing. Reviewers, please leave comments if you disagree with those choices.
- I have made getting and setting the default stream thread-safe, but there is still only a single stream. In multi-threaded applications we may want to support a stream per thread so that users could manually achieve PTDS with more fine-tuned control. Is this worthwhile? Even if it is, I'm inclined to wait for a subsequent PR to implement this unless someone feels strongly otherwise.
- I'm currently only overloading `cudaLaunchKernel`. I can add overloads for other functions as well, but I didn't want to go through the effort of overloading every possible API. If reviewers have a minimal set that they'd like to see overloaded, let me know. [I've included links to all the relevant pages of the CUDA runtime API in the identify_stream_usage.cu file](https://github.com/rapidsai/cudf/pull/11875/files#diff-0b2762207c27c080acd2114475c7a1c06377a7c18c4e9c3de60ecbdc82a4dc61R99) if someone wants to look through them.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Jason Lowe (https://github.com/jlowe)
  - Bradley Dice (https://github.com/bdice)
  - Sevag H (https://github.com/sevagh)
  - https://github.com/brandon-b-miller
  - Jake Hemstad (https://github.com/jrhemstad)
  - David Wendt (https://github.com/davidwendt)

URL: #11875
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant