-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2118 [CUDAX] Change the RAII device swapper to use driver API and add it in places where it was missing #2192
2118 [CUDAX] Change the RAII device swapper to use driver API and add it in places where it was missing #2192
Conversation
We need to use versioned version to get correct cuStreamGetCtx. There is v2 version of it in 12.5, fortunatelly the versioned get entry point is available there too
🟨 CI finished in 11m 05s: Pass: 96%/56 | Total: 2h 34m | Avg: 2m 45s | Max: 11m 00s | Hits: 78%/2756
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
🟨 CI finished in 10m 22s: Pass: 96%/56 | Total: 2h 36m | Avg: 2m 47s | Max: 10m 22s | Hits: 73%/2756
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
🟩 CI finished in 10m 55s: Pass: 100%/56 | Total: 2h 35m | Avg: 2m 46s | Max: 10m 55s | Hits: 96%/2848
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
cudax/include/cuda/experimental/__utility/ensure_current_device.cuh
Outdated
Show resolved
Hide resolved
explicit __ensure_current_device(device_ref new_device) | ||
{ | ||
auto ctx = devices[new_device.get()].primary_context(); | ||
detail::driver::ctxPush(ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other __ensure_current_device
only does the push / pop when the device actually differs. Is this something that is possible with the driver API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed it in a meeting today, the expectation when using only the C++ Runtime is to have the stack empty and we would push/pop in most cases anyway
// SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. | ||
// | ||
//===----------------------------------------------------------------------===// | ||
#define LIBCUDACXX_ENABLE_EXCEPTIONS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should already be defined in the CMakeLists.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no redefinition errors, so it's probably not defined and the test just terminates on exception.
But I added it, since all tests need to define it anyway
🟩 CI finished in 11m 07s: Pass: 100%/56 | Total: 2h 52m | Avg: 3m 04s | Max: 11m 07s | Hits: 79%/2848
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
In order to be able to select the current device based on a stream and to ensure no visible side-effects the device swapper needs to use driver API.
The type name was changed to
__ensure_current_device
to match what was added in #2073.This change introduces primary context storing in
device
type. The context retain is lazy, it won't cause initialization of devices that are not explicitly used.This primary context is then used in
__ensure_current_device
to push/pop it to the driver stack on construction/destruction.Ideally, every API in
cudax
would leave the driver stack exactly the same as before it was called. I had to change some CUDART APIs to driver equivalents, because of extra stack changes introduced in the CUDART versions.Added tests for most APIs to see if they would keep the stack empty through their usage.