Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2118 [CUDAX] Change the RAII device swapper to use driver API and add it in places where it was missing #2192

Merged

Conversation

pciolkosz
Copy link
Contributor

In order to be able to select the current device based on a stream and to ensure no visible side-effects the device swapper needs to use driver API.

The type name was changed to __ensure_current_device to match what was added in #2073.

This change introduces primary context storing in device type. The context retain is lazy, it won't cause initialization of devices that are not explicitly used.
This primary context is then used in __ensure_current_device to push/pop it to the driver stack on construction/destruction.
Ideally, every API in cudax would leave the driver stack exactly the same as before it was called. I had to change some CUDART APIs to driver equivalents, because of extra stack changes introduced in the CUDART versions.
Added tests for most APIs to see if they would keep the stack empty through their usage.

@pciolkosz pciolkosz requested review from a team as code owners August 3, 2024 21:35
We need to use versioned version to get correct cuStreamGetCtx.
There is v2 version of it in 12.5, fortunatelly the versioned
get entry point is available there too
@pciolkosz pciolkosz linked an issue Aug 3, 2024 that may be closed by this pull request
Copy link
Contributor

github-actions bot commented Aug 3, 2024

🟨 CI finished in 11m 05s: Pass: 96%/56 | Total: 2h 34m | Avg: 2m 45s | Max: 11m 00s | Hits: 78%/2756
  • 🟨 cudax: Pass: 96%/55 | Total: 2h 23m | Avg: 2m 36s | Max: 8m 16s | Hits: 78%/2756

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  96%/51  | Total:  2h 13m | Avg:  2m 36s | Max:  8m 16s | Hits:  76%/2548  
      🟩 arm64              Pass: 100%/4   | Total: 10m 18s | Avg:  2m 34s | Max:  3m 01s | Hits:  98%/208   
    🚨 cxx_family: MSVC 🚨
      🟩 Clang              Pass: 100%/30  | Total:  1h 15m | Avg:  2m 30s | Max:  4m 03s | Hits:  74%/1560  
      🟩 GCC                Pass: 100%/22  | Total: 50m 38s | Avg:  2m 18s | Max:  3m 50s | Hits:  82%/1144  
      🟩 Intel              Pass: 100%/1   | Total:  2m 35s | Avg:  2m 35s | Max:  2m 35s | Hits: 100%/52    
      🔥 MSVC               Pass:   0%/2   | Total: 15m 12s | Avg:  7m 36s | Max:  8m 16s
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  95%/47  | Total:  1h 54m | Avg:  2m 25s | Max:  8m 16s | Hits:  74%/2340  
      🟩 Test               Pass: 100%/8   | Total: 29m 21s | Avg:  3m 40s | Max:  4m 03s | Hits:  98%/416   
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/31  | Total:  1h 14m | Avg:  2m 24s | Max:  3m 50s | Hits:  76%/1612  
      🔍 20                 Pass:  91%/24  | Total:  1h 08m | Avg:  2m 52s | Max:  8m 16s | Hits:  81%/1144  
    🟨 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  4m 29s | Avg:  2m 14s | Max:  2m 21s | Hits:  57%/104   
      🟩 Clang10            Pass: 100%/2   | Total:  4m 39s | Avg:  2m 19s | Max:  2m 20s | Hits:  57%/104   
      🟩 Clang11            Pass: 100%/4   | Total:  9m 25s | Avg:  2m 21s | Max:  2m 33s | Hits:  57%/208   
      🟩 Clang12            Pass: 100%/4   | Total:  8m 54s | Avg:  2m 13s | Max:  2m 23s | Hits:  57%/208   
      🟩 Clang13            Pass: 100%/4   | Total:  9m 26s | Avg:  2m 21s | Max:  2m 31s | Hits:  57%/208   
      🟩 Clang14            Pass: 100%/6   | Total: 16m 15s | Avg:  2m 42s | Max:  3m 47s | Hits:  85%/312   
      🟩 Clang15            Pass: 100%/2   | Total:  3m 57s | Avg:  1m 58s | Max:  2m 01s | Hits: 100%/104   
      🟩 Clang16            Pass: 100%/6   | Total: 18m 01s | Avg:  3m 00s | Max:  4m 03s | Hits: 100%/312   
      🟩 GCC9               Pass: 100%/2   | Total:  3m 44s | Avg:  1m 52s | Max:  2m 00s | Hits:  75%/104   
      🟩 GCC10              Pass: 100%/4   | Total:  8m 19s | Avg:  2m 04s | Max:  2m 16s | Hits:  75%/208   
      🟩 GCC11              Pass: 100%/4   | Total:  7m 59s | Avg:  1m 59s | Max:  2m 02s | Hits:  75%/208   
      🟩 GCC12              Pass: 100%/12  | Total: 30m 36s | Avg:  2m 33s | Max:  3m 50s | Hits:  89%/624   
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  2m 35s | Avg:  2m 35s | Max:  2m 35s | Hits: 100%/52    
      🟥 MSVC14.36          Pass:   0%/1   | Total:  6m 56s | Avg:  6m 56s | Max:  6m 56s
      🟥 MSVC14.39          Pass:   0%/1   | Total:  8m 16s | Avg:  8m 16s | Max:  8m 16s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  96%/55  | Total:  2h 23m | Avg:  2m 36s | Max:  8m 16s | Hits:  78%/2756  
    🟨 gpu
      🟨 v100               Pass:  96%/55  | Total:  2h 23m | Avg:  2m 36s | Max:  8m 16s | Hits:  78%/2756  
    🟨 ctk
      🟨 12.0               Pass:  95%/23  | Total: 59m 50s | Avg:  2m 36s | Max:  6m 56s | Hits:  65%/1144  
      🟨 12.5               Pass:  96%/32  | Total:  1h 23m | Avg:  2m 36s | Max:  8m 16s | Hits:  87%/1612  
    🟨 cudacxx
      🟨 nvcc12.0           Pass:  95%/23  | Total: 59m 50s | Avg:  2m 36s | Max:  6m 56s | Hits:  65%/1144  
      🟨 nvcc12.5           Pass:  96%/32  | Total:  1h 23m | Avg:  2m 36s | Max:  8m 16s | Hits:  87%/1612  
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  1m 57s | Avg:  1m 57s | Max:  1m 57s | Hits:  53%/52    
      🟩 90a                Pass: 100%/1   | Total:  1m 57s | Avg:  1m 57s | Max:  1m 57s | Hits:  96%/52    
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 00s | Avg: 11m 00s | Max: 11m 00s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 00s | Avg: 11m 00s | Max: 11m 00s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 00s | Avg: 11m 00s | Max: 11m 00s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 00s | Avg: 11m 00s | Max: 11m 00s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 00s | Avg: 11m 00s | Max: 11m 00s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 00s | Avg: 11m 00s | Max: 11m 00s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 00s | Avg: 11m 00s | Max: 11m 00s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 00s | Avg: 11m 00s | Max: 11m 00s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 00s | Avg: 11m 00s | Max: 11m 00s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 56)

# Runner
41 linux-amd64-cpu16
9 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

@pciolkosz pciolkosz requested a review from ericniebler August 4, 2024 00:00
Copy link
Contributor

github-actions bot commented Aug 4, 2024

🟨 CI finished in 10m 22s: Pass: 96%/56 | Total: 2h 36m | Avg: 2m 47s | Max: 10m 22s | Hits: 73%/2756
  • 🟨 cudax: Pass: 96%/55 | Total: 2h 25m | Avg: 2m 39s | Max: 8m 33s | Hits: 73%/2756

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  96%/51  | Total:  2h 14m | Avg:  2m 38s | Max:  8m 33s | Hits:  74%/2548  
      🟩 arm64              Pass: 100%/4   | Total: 11m 05s | Avg:  2m 46s | Max:  3m 21s | Hits:  69%/208   
    🚨 cxx_family: MSVC 🚨
      🟩 Clang              Pass: 100%/30  | Total:  1h 14m | Avg:  2m 29s | Max:  4m 12s | Hits:  75%/1560  
      🟩 GCC                Pass: 100%/22  | Total: 52m 39s | Avg:  2m 23s | Max:  3m 56s | Hits:  72%/1144  
      🟩 Intel              Pass: 100%/1   | Total:  2m 54s | Avg:  2m 54s | Max:  2m 54s | Hits:  71%/52    
      🔥 MSVC               Pass:   0%/2   | Total: 15m 27s | Avg:  7m 43s | Max:  8m 33s
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  95%/47  | Total:  1h 56m | Avg:  2m 28s | Max:  8m 33s | Hits:  69%/2340  
      🟩 Test               Pass: 100%/8   | Total: 29m 38s | Avg:  3m 42s | Max:  4m 12s | Hits:  98%/416   
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/31  | Total:  1h 15m | Avg:  2m 25s | Max:  4m 12s | Hits:  73%/1612  
      🔍 20                 Pass:  91%/24  | Total:  1h 10m | Avg:  2m 56s | Max:  8m 33s | Hits:  74%/1144  
    🟨 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  4m 47s | Avg:  2m 23s | Max:  2m 24s | Hits:  71%/104   
      🟩 Clang10            Pass: 100%/2   | Total:  4m 16s | Avg:  2m 08s | Max:  2m 10s | Hits:  71%/104   
      🟩 Clang11            Pass: 100%/4   | Total:  8m 19s | Avg:  2m 04s | Max:  2m 16s | Hits:  71%/208   
      🟩 Clang12            Pass: 100%/4   | Total:  8m 33s | Avg:  2m 08s | Max:  2m 19s | Hits:  71%/208   
      🟩 Clang13            Pass: 100%/4   | Total:  8m 59s | Avg:  2m 14s | Max:  2m 19s | Hits:  71%/208   
      🟩 Clang14            Pass: 100%/6   | Total: 15m 57s | Avg:  2m 39s | Max:  3m 37s | Hits:  80%/312   
      🟩 Clang15            Pass: 100%/2   | Total:  4m 42s | Avg:  2m 21s | Max:  2m 23s | Hits:  71%/104   
      🟩 Clang16            Pass: 100%/6   | Total: 19m 24s | Avg:  3m 14s | Max:  4m 12s | Hits:  80%/312   
      🟩 GCC9               Pass: 100%/2   | Total:  3m 54s | Avg:  1m 57s | Max:  1m 57s | Hits:  67%/104   
      🟩 GCC10              Pass: 100%/4   | Total:  8m 58s | Avg:  2m 14s | Max:  2m 24s | Hits:  67%/208   
      🟩 GCC11              Pass: 100%/4   | Total:  8m 07s | Avg:  2m 01s | Max:  2m 26s | Hits:  67%/208   
      🟩 GCC12              Pass: 100%/12  | Total: 31m 40s | Avg:  2m 38s | Max:  3m 56s | Hits:  76%/624   
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  2m 54s | Avg:  2m 54s | Max:  2m 54s | Hits:  71%/52    
      🟥 MSVC14.36          Pass:   0%/1   | Total:  6m 54s | Avg:  6m 54s | Max:  6m 54s
      🟥 MSVC14.39          Pass:   0%/1   | Total:  8m 33s | Avg:  8m 33s | Max:  8m 33s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  96%/55  | Total:  2h 25m | Avg:  2m 39s | Max:  8m 33s | Hits:  73%/2756  
    🟨 gpu
      🟨 v100               Pass:  96%/55  | Total:  2h 25m | Avg:  2m 39s | Max:  8m 33s | Hits:  73%/2756  
    🟨 ctk
      🟨 12.0               Pass:  95%/23  | Total: 57m 50s | Avg:  2m 30s | Max:  6m 54s | Hits:  74%/1144  
      🟨 12.5               Pass:  96%/32  | Total:  1h 28m | Avg:  2m 45s | Max:  8m 33s | Hits:  73%/1612  
    🟨 cudacxx
      🟨 nvcc12.0           Pass:  95%/23  | Total: 57m 50s | Avg:  2m 30s | Max:  6m 54s | Hits:  74%/1144  
      🟨 nvcc12.5           Pass:  96%/32  | Total:  1h 28m | Avg:  2m 45s | Max:  8m 33s | Hits:  73%/1612  
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  1m 46s | Avg:  1m 46s | Max:  1m 46s | Hits:  67%/52    
      🟩 90a                Pass: 100%/1   | Total:  1m 57s | Avg:  1m 57s | Max:  1m 57s | Hits:  67%/52    
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 10m 22s | Avg: 10m 22s | Max: 10m 22s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 10m 22s | Avg: 10m 22s | Max: 10m 22s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 10m 22s | Avg: 10m 22s | Max: 10m 22s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 10m 22s | Avg: 10m 22s | Max: 10m 22s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 10m 22s | Avg: 10m 22s | Max: 10m 22s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 10m 22s | Avg: 10m 22s | Max: 10m 22s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 10m 22s | Avg: 10m 22s | Max: 10m 22s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 10m 22s | Avg: 10m 22s | Max: 10m 22s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 10m 22s | Avg: 10m 22s | Max: 10m 22s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 56)

# Runner
41 linux-amd64-cpu16
9 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

Copy link
Contributor

github-actions bot commented Aug 4, 2024

🟩 CI finished in 10m 55s: Pass: 100%/56 | Total: 2h 35m | Avg: 2m 46s | Max: 10m 55s | Hits: 96%/2848
  • 🟩 cudax: Pass: 100%/55 | Total: 2h 24m | Avg: 2m 37s | Max: 8m 12s | Hits: 96%/2848

    🟩 cpu
      🟩 amd64              Pass: 100%/51  | Total:  2h 14m | Avg:  2m 37s | Max:  8m 12s | Hits:  96%/2640  
      🟩 arm64              Pass: 100%/4   | Total: 10m 39s | Avg:  2m 39s | Max:  3m 06s | Hits:  96%/208   
    🟩 ctk
      🟩 12.0               Pass: 100%/23  | Total: 57m 41s | Avg:  2m 30s | Max:  6m 54s | Hits:  96%/1190  
      🟩 12.5               Pass: 100%/32  | Total:  1h 27m | Avg:  2m 43s | Max:  8m 12s | Hits:  96%/1658  
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/23  | Total: 57m 41s | Avg:  2m 30s | Max:  6m 54s | Hits:  96%/1190  
      🟩 nvcc12.5           Pass: 100%/32  | Total:  1h 27m | Avg:  2m 43s | Max:  8m 12s | Hits:  96%/1658  
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/55  | Total:  2h 24m | Avg:  2m 37s | Max:  8m 12s | Hits:  96%/2848  
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  4m 27s | Avg:  2m 13s | Max:  2m 15s | Hits:  98%/104   
      🟩 Clang10            Pass: 100%/2   | Total:  3m 57s | Avg:  1m 58s | Max:  2m 00s | Hits:  98%/104   
      🟩 Clang11            Pass: 100%/4   | Total:  8m 33s | Avg:  2m 08s | Max:  2m 15s | Hits:  98%/208   
      🟩 Clang12            Pass: 100%/4   | Total:  8m 50s | Avg:  2m 12s | Max:  2m 24s | Hits:  98%/208   
      🟩 Clang13            Pass: 100%/4   | Total:  8m 35s | Avg:  2m 08s | Max:  2m 15s | Hits:  98%/208   
      🟩 Clang14            Pass: 100%/6   | Total: 15m 32s | Avg:  2m 35s | Max:  3m 31s | Hits:  98%/312   
      🟩 Clang15            Pass: 100%/2   | Total:  4m 32s | Avg:  2m 16s | Max:  2m 27s | Hits:  98%/104   
      🟩 Clang16            Pass: 100%/6   | Total: 17m 47s | Avg:  2m 57s | Max:  3m 58s | Hits:  98%/312   
      🟩 GCC9               Pass: 100%/2   | Total:  3m 55s | Avg:  1m 57s | Max:  2m 14s | Hits:  94%/104   
      🟩 GCC10              Pass: 100%/4   | Total:  8m 05s | Avg:  2m 01s | Max:  2m 10s | Hits:  94%/208   
      🟩 GCC11              Pass: 100%/4   | Total:  8m 23s | Avg:  2m 05s | Max:  2m 15s | Hits:  94%/208   
      🟩 GCC12              Pass: 100%/12  | Total: 34m 19s | Avg:  2m 51s | Max:  5m 35s | Hits:  94%/624   
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  2m 46s | Avg:  2m 46s | Max:  2m 46s | Hits:  98%/52    
      🟩 MSVC14.36          Pass: 100%/1   | Total:  6m 54s | Avg:  6m 54s | Max:  6m 54s | Hits:  80%/46    
      🟩 MSVC14.39          Pass: 100%/1   | Total:  8m 12s | Avg:  8m 12s | Max:  8m 12s | Hits:  80%/46    
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  1h 12m | Avg:  2m 24s | Max:  3m 58s | Hits:  98%/1560  
      🟩 GCC                Pass: 100%/22  | Total: 54m 42s | Avg:  2m 29s | Max:  5m 35s | Hits:  94%/1144  
      🟩 Intel              Pass: 100%/1   | Total:  2m 46s | Avg:  2m 46s | Max:  2m 46s | Hits:  98%/52    
      🟩 MSVC               Pass: 100%/2   | Total: 15m 06s | Avg:  7m 33s | Max:  8m 12s | Hits:  80%/92    
    🟩 gpu
      🟩 v100               Pass: 100%/55  | Total:  2h 24m | Avg:  2m 37s | Max:  8m 12s | Hits:  96%/2848  
    🟩 jobs
      🟩 Build              Pass: 100%/47  | Total:  1h 55m | Avg:  2m 27s | Max:  8m 12s | Hits:  95%/2432  
      🟩 Test               Pass: 100%/8   | Total: 28m 55s | Avg:  3m 36s | Max:  3m 58s | Hits:  98%/416   
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  1m 45s | Avg:  1m 45s | Max:  1m 45s | Hits:  94%/52    
      🟩 90a                Pass: 100%/1   | Total:  2m 03s | Avg:  2m 03s | Max:  2m 03s | Hits:  94%/52    
    🟩 std
      🟩 17                 Pass: 100%/31  | Total:  1h 11m | Avg:  2m 18s | Max:  3m 58s | Hits:  96%/1612  
      🟩 20                 Pass: 100%/24  | Total:  1h 13m | Avg:  3m 02s | Max:  8m 12s | Hits:  95%/1236  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 10m 55s | Avg: 10m 55s | Max: 10m 55s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 10m 55s | Avg: 10m 55s | Max: 10m 55s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 10m 55s | Avg: 10m 55s | Max: 10m 55s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 10m 55s | Avg: 10m 55s | Max: 10m 55s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 10m 55s | Avg: 10m 55s | Max: 10m 55s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 10m 55s | Avg: 10m 55s | Max: 10m 55s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 10m 55s | Avg: 10m 55s | Max: 10m 55s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 10m 55s | Avg: 10m 55s | Max: 10m 55s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 10m 55s | Avg: 10m 55s | Max: 10m 55s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 56)

# Runner
41 linux-amd64-cpu16
9 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

cudax/include/cuda/experimental/__launch/launch.cuh Outdated Show resolved Hide resolved
explicit __ensure_current_device(device_ref new_device)
{
auto ctx = devices[new_device.get()].primary_context();
detail::driver::ctxPush(ctx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other __ensure_current_device only does the push / pop when the device actually differs. Is this something that is possible with the driver API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed it in a meeting today, the expectation when using only the C++ Runtime is to have the stack empty and we would push/pop in most cases anyway

// SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES.
//
//===----------------------------------------------------------------------===//
#define LIBCUDACXX_ENABLE_EXCEPTIONS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should already be defined in the CMakeLists.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no redefinition errors, so it's probably not defined and the test just terminates on exception.
But I added it, since all tests need to define it anyway

Copy link
Contributor

github-actions bot commented Aug 5, 2024

🟩 CI finished in 11m 07s: Pass: 100%/56 | Total: 2h 52m | Avg: 3m 04s | Max: 11m 07s | Hits: 79%/2848
  • 🟩 cudax: Pass: 100%/55 | Total: 2h 41m | Avg: 2m 56s | Max: 8m 47s | Hits: 79%/2848

    🟩 cpu
      🟩 amd64              Pass: 100%/51  | Total:  2h 29m | Avg:  2m 56s | Max:  8m 47s | Hits:  79%/2640  
      🟩 arm64              Pass: 100%/4   | Total: 11m 48s | Avg:  2m 57s | Max:  3m 27s | Hits:  76%/208   
    🟩 ctk
      🟩 12.0               Pass: 100%/23  | Total:  1h 08m | Avg:  2m 58s | Max:  8m 47s | Hits:  79%/1190  
      🟩 12.5               Pass: 100%/32  | Total:  1h 33m | Avg:  2m 54s | Max:  8m 03s | Hits:  78%/1658  
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/23  | Total:  1h 08m | Avg:  2m 58s | Max:  8m 47s | Hits:  79%/1190  
      🟩 nvcc12.5           Pass: 100%/32  | Total:  1h 33m | Avg:  2m 54s | Max:  8m 03s | Hits:  78%/1658  
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/55  | Total:  2h 41m | Avg:  2m 56s | Max:  8m 47s | Hits:  79%/2848  
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  4m 49s | Avg:  2m 24s | Max:  2m 28s | Hits:  78%/104   
      🟩 Clang10            Pass: 100%/2   | Total:  5m 01s | Avg:  2m 30s | Max:  2m 34s | Hits:  78%/104   
      🟩 Clang11            Pass: 100%/4   | Total:  9m 25s | Avg:  2m 21s | Max:  2m 29s | Hits:  78%/208   
      🟩 Clang12            Pass: 100%/4   | Total:  9m 33s | Avg:  2m 23s | Max:  2m 33s | Hits:  78%/208   
      🟩 Clang13            Pass: 100%/4   | Total:  9m 41s | Avg:  2m 25s | Max:  2m 29s | Hits:  78%/208   
      🟩 Clang14            Pass: 100%/6   | Total: 18m 30s | Avg:  3m 05s | Max:  4m 17s | Hits:  85%/312   
      🟩 Clang15            Pass: 100%/2   | Total:  5m 05s | Avg:  2m 32s | Max:  2m 39s | Hits:  78%/104   
      🟩 Clang16            Pass: 100%/6   | Total: 20m 28s | Avg:  3m 24s | Max:  4m 14s | Hits:  85%/312   
      🟩 GCC9               Pass: 100%/2   | Total:  4m 40s | Avg:  2m 20s | Max:  2m 31s | Hits:  75%/104   
      🟩 GCC10              Pass: 100%/4   | Total:  9m 44s | Avg:  2m 26s | Max:  2m 37s | Hits:  75%/208   
      🟩 GCC11              Pass: 100%/4   | Total:  9m 45s | Avg:  2m 26s | Max:  2m 37s | Hits:  75%/208   
      🟩 GCC12              Pass: 100%/12  | Total: 34m 36s | Avg:  2m 53s | Max:  4m 10s | Hits:  82%/624   
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  3m 22s | Avg:  3m 22s | Max:  3m 22s | Hits:  78%/52    
      🟩 MSVC14.36          Pass: 100%/1   | Total:  8m 47s | Avg:  8m 47s | Max:  8m 47s | Hits:  39%/46    
      🟩 MSVC14.39          Pass: 100%/1   | Total:  8m 03s | Avg:  8m 03s | Max:  8m 03s | Hits:  39%/46    
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  1h 22m | Avg:  2m 45s | Max:  4m 17s | Hits:  81%/1560  
      🟩 GCC                Pass: 100%/22  | Total: 58m 45s | Avg:  2m 40s | Max:  4m 10s | Hits:  78%/1144  
      🟩 Intel              Pass: 100%/1   | Total:  3m 22s | Avg:  3m 22s | Max:  3m 22s | Hits:  78%/52    
      🟩 MSVC               Pass: 100%/2   | Total: 16m 50s | Avg:  8m 25s | Max:  8m 47s | Hits:  39%/92    
    🟩 gpu
      🟩 v100               Pass: 100%/55  | Total:  2h 41m | Avg:  2m 56s | Max:  8m 47s | Hits:  79%/2848  
    🟩 jobs
      🟩 Build              Pass: 100%/47  | Total:  2h 09m | Avg:  2m 44s | Max:  8m 47s | Hits:  75%/2432  
      🟩 Test               Pass: 100%/8   | Total: 32m 16s | Avg:  4m 02s | Max:  4m 17s | Hits:  98%/416   
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  1m 57s | Avg:  1m 57s | Max:  1m 57s | Hits:  75%/52    
      🟩 90a                Pass: 100%/1   | Total:  2m 12s | Avg:  2m 12s | Max:  2m 12s | Hits:  75%/52    
    🟩 std
      🟩 17                 Pass: 100%/31  | Total:  1h 23m | Avg:  2m 40s | Max:  4m 16s | Hits:  79%/1612  
      🟩 20                 Pass: 100%/24  | Total:  1h 18m | Avg:  3m 15s | Max:  8m 47s | Hits:  77%/1236  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 07s | Avg: 11m 07s | Max: 11m 07s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 07s | Avg: 11m 07s | Max: 11m 07s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 07s | Avg: 11m 07s | Max: 11m 07s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 07s | Avg: 11m 07s | Max: 11m 07s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 07s | Avg: 11m 07s | Max: 11m 07s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 07s | Avg: 11m 07s | Max: 11m 07s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 07s | Avg: 11m 07s | Max: 11m 07s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 07s | Avg: 11m 07s | Max: 11m 07s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 07s | Avg: 11m 07s | Max: 11m 07s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 56)

# Runner
41 linux-amd64-cpu16
9 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

@miscco miscco merged commit 75929cb into NVIDIA:main Aug 6, 2024
73 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Driver context initialization and push/pop
2 participants