Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDAX] make uninitialized_buffer usable with launch #2342

Merged

Conversation

ericniebler
Copy link
Collaborator

Description

Give cudax::uninitialized_buffer a launch transform so that it can be passed as an argument to cudax::launch. The transform turns an uninitialized_buffer into a span. If the buffer does not have the device_accessible property, the transform will static_assert.

This PR is stacked on #2340.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@ericniebler ericniebler added the CUDA Next Feature intended for the Cuda Next experimental library label Aug 30, 2024
@ericniebler ericniebler requested review from a team as code owners August 30, 2024 22:46
@ericniebler ericniebler requested a review from fbusato August 30, 2024 22:46
Copy link
Contributor

🟩 CI finished in 3h 44m: Pass: 100%/55 | Total: 2h 49m | Avg: 3m 04s | Max: 11m 58s | Hits: 78%/114
  • 🟩 cudax: Pass: 100%/54 | Total: 2h 37m | Avg: 2m 54s | Max: 8m 53s | Hits: 78%/114

    🟩 cpu
      🟩 amd64              Pass: 100%/50  | Total:  2h 27m | Avg:  2m 57s | Max:  8m 53s | Hits:  78%/114   
      🟩 arm64              Pass: 100%/4   | Total:  9m 32s | Avg:  2m 23s | Max:  2m 51s
    🟩 ctk
      🟩 12.0               Pass: 100%/23  | Total:  1h 07m | Avg:  2m 57s | Max:  8m 53s | Hits:  78%/57    
      🟩 12.5               Pass: 100%/31  | Total:  1h 29m | Avg:  2m 52s | Max:  8m 43s | Hits:  78%/57    
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/23  | Total:  1h 07m | Avg:  2m 57s | Max:  8m 53s | Hits:  78%/57    
      🟩 nvcc12.5           Pass: 100%/31  | Total:  1h 29m | Avg:  2m 52s | Max:  8m 43s | Hits:  78%/57    
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/54  | Total:  2h 37m | Avg:  2m 54s | Max:  8m 53s | Hits:  78%/114   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  4m 51s | Avg:  2m 25s | Max:  2m 28s
      🟩 Clang10            Pass: 100%/2   | Total:  4m 56s | Avg:  2m 28s | Max:  2m 28s
      🟩 Clang11            Pass: 100%/4   | Total:  9m 53s | Avg:  2m 28s | Max:  2m 40s
      🟩 Clang12            Pass: 100%/4   | Total: 10m 21s | Avg:  2m 35s | Max:  2m 53s
      🟩 Clang13            Pass: 100%/4   | Total:  9m 37s | Avg:  2m 24s | Max:  2m 32s
      🟩 Clang14            Pass: 100%/6   | Total: 18m 15s | Avg:  3m 02s | Max:  4m 38s
      🟩 Clang15            Pass: 100%/2   | Total:  5m 47s | Avg:  2m 53s | Max:  3m 28s
      🟩 Clang16            Pass: 100%/6   | Total: 18m 49s | Avg:  3m 08s | Max:  4m 22s
      🟩 GCC9               Pass: 100%/2   | Total:  4m 49s | Avg:  2m 24s | Max:  2m 36s
      🟩 GCC10              Pass: 100%/4   | Total:  9m 54s | Avg:  2m 28s | Max:  2m 38s
      🟩 GCC11              Pass: 100%/4   | Total:  9m 01s | Avg:  2m 15s | Max:  2m 23s
      🟩 GCC12              Pass: 100%/12  | Total: 33m 25s | Avg:  2m 47s | Max:  3m 59s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  8m 53s | Avg:  8m 53s | Max:  8m 53s | Hits:  78%/57    
      🟩 MSVC14.39          Pass: 100%/1   | Total:  8m 43s | Avg:  8m 43s | Max:  8m 43s | Hits:  78%/57    
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  1h 22m | Avg:  2m 44s | Max:  4m 38s
      🟩 GCC                Pass: 100%/22  | Total: 57m 09s | Avg:  2m 35s | Max:  3m 59s
      🟩 MSVC               Pass: 100%/2   | Total: 17m 36s | Avg:  8m 48s | Max:  8m 53s | Hits:  78%/114   
    🟩 gpu
      🟩 v100               Pass: 100%/54  | Total:  2h 37m | Avg:  2m 54s | Max:  8m 53s | Hits:  78%/114   
    🟩 jobs
      🟩 Build              Pass: 100%/46  | Total:  2h 05m | Avg:  2m 43s | Max:  8m 53s | Hits:  78%/114   
      🟩 Test               Pass: 100%/8   | Total: 31m 49s | Avg:  3m 58s | Max:  4m 38s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 07s | Avg:  2m 07s | Max:  2m 07s
      🟩 90a                Pass: 100%/1   | Total:  2m 15s | Avg:  2m 15s | Max:  2m 15s
    🟩 std
      🟩 17                 Pass: 100%/30  | Total:  1h 20m | Avg:  2m 41s | Max:  4m 38s
      🟩 20                 Pass: 100%/24  | Total:  1h 16m | Avg:  3m 11s | Max:  8m 53s | Hits:  78%/114   
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 58s | Avg: 11m 58s | Max: 11m 58s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 58s | Avg: 11m 58s | Max: 11m 58s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 58s | Avg: 11m 58s | Max: 11m 58s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 58s | Avg: 11m 58s | Max: 11m 58s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 58s | Avg: 11m 58s | Max: 11m 58s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 58s | Avg: 11m 58s | Max: 11m 58s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 58s | Avg: 11m 58s | Max: 11m 58s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 58s | Avg: 11m 58s | Max: 11m 58s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 58s | Avg: 11m 58s | Max: 11m 58s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 55)

# Runner
40 linux-amd64-cpu16
9 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

@ericniebler ericniebler force-pushed the cudax-uninitialized-buffer-launch-transform branch from 8df409d to e696cfd Compare September 2, 2024 00:43
Copy link
Contributor

github-actions bot commented Sep 2, 2024

🟨 CI finished in 12m 05s: Pass: 92%/55 | Total: 2h 43m | Avg: 2m 58s | Max: 10m 28s | Hits: 77%/114
  • 🟨 cudax: Pass: 92%/54 | Total: 2h 33m | Avg: 2m 50s | Max: 10m 01s | Hits: 77%/114

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  92%/50  | Total:  2h 24m | Avg:  2m 53s | Max: 10m 01s | Hits:  77%/114   
      🟩 arm64              Pass: 100%/4   | Total:  8m 56s | Avg:  2m 14s | Max:  2m 18s
    🔍 jobs: Test 🔍
      🟩 Build              Pass: 100%/46  | Total:  2h 04m | Avg:  2m 42s | Max: 10m 01s | Hits:  77%/114   
      🔍 Test               Pass:  50%/8   | Total: 28m 55s | Avg:  3m 36s | Max:  3m 57s
    🟨 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  5m 26s | Avg:  2m 43s | Max:  2m 59s
      🟩 Clang10            Pass: 100%/2   | Total:  5m 03s | Avg:  2m 31s | Max:  2m 43s
      🟩 Clang11            Pass: 100%/4   | Total: 10m 10s | Avg:  2m 32s | Max:  2m 55s
      🟩 Clang12            Pass: 100%/4   | Total:  9m 37s | Avg:  2m 24s | Max:  2m 30s
      🟩 Clang13            Pass: 100%/4   | Total: 10m 56s | Avg:  2m 44s | Max:  2m 54s
      🟩 Clang14            Pass: 100%/6   | Total: 17m 32s | Avg:  2m 55s | Max:  3m 44s
      🟩 Clang15            Pass: 100%/2   | Total:  4m 53s | Avg:  2m 26s | Max:  2m 29s
      🟨 Clang16            Pass:  83%/6   | Total: 17m 05s | Avg:  2m 50s | Max:  3m 57s
      🟩 GCC9               Pass: 100%/2   | Total:  4m 39s | Avg:  2m 19s | Max:  2m 33s
      🟩 GCC10              Pass: 100%/4   | Total:  9m 05s | Avg:  2m 16s | Max:  2m 31s
      🟩 GCC11              Pass: 100%/4   | Total:  8m 47s | Avg:  2m 11s | Max:  2m 15s
      🟨 GCC12              Pass:  75%/12  | Total: 33m 06s | Avg:  2m 45s | Max:  3m 41s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  7m 11s | Avg:  7m 11s | Max:  7m 11s | Hits:  77%/57    
      🟩 MSVC14.39          Pass: 100%/1   | Total: 10m 01s | Avg: 10m 01s | Max: 10m 01s | Hits:  77%/57    
    🟨 cxx_family
      🟨 Clang              Pass:  96%/30  | Total:  1h 20m | Avg:  2m 41s | Max:  3m 57s
      🟨 GCC                Pass:  86%/22  | Total: 55m 37s | Avg:  2m 31s | Max:  3m 41s
      🟩 MSVC               Pass: 100%/2   | Total: 17m 12s | Avg:  8m 36s | Max: 10m 01s | Hits:  77%/114   
    🟨 cudacxx_family
      🟨 nvcc               Pass:  92%/54  | Total:  2h 33m | Avg:  2m 50s | Max: 10m 01s | Hits:  77%/114   
    🟨 gpu
      🟨 v100               Pass:  92%/54  | Total:  2h 33m | Avg:  2m 50s | Max: 10m 01s | Hits:  77%/114   
    🟨 ctk
      🟨 12.0               Pass:  91%/23  | Total:  1h 04m | Avg:  2m 49s | Max:  7m 11s | Hits:  77%/57    
      🟨 12.5               Pass:  93%/31  | Total:  1h 28m | Avg:  2m 51s | Max: 10m 01s | Hits:  77%/57    
    🟨 cudacxx
      🟨 nvcc12.0           Pass:  91%/23  | Total:  1h 04m | Avg:  2m 49s | Max:  7m 11s | Hits:  77%/57    
      🟨 nvcc12.5           Pass:  93%/31  | Total:  1h 28m | Avg:  2m 51s | Max: 10m 01s | Hits:  77%/57    
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 06s | Avg:  2m 06s | Max:  2m 06s
      🟩 90a                Pass: 100%/1   | Total:  2m 07s | Avg:  2m 07s | Max:  2m 07s
    🟨 std
      🟨 17                 Pass:  93%/30  | Total:  1h 17m | Avg:  2m 35s | Max:  3m 57s
      🟨 20                 Pass:  91%/24  | Total:  1h 15m | Avg:  3m 09s | Max: 10m 01s | Hits:  77%/114   
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 55)

# Runner
40 linux-amd64-cpu16
9 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit torn with the approach.

rather than putting this into individual classes, we should have a CPO that verifies that a class is a range and provides device_accessible memory

Copy link
Contributor

github-actions bot commented Sep 2, 2024

🟩 CI finished in 13h 17m: Pass: 100%/55 | Total: 2h 46m | Avg: 3m 01s | Max: 10m 28s | Hits: 77%/114
  • 🟩 cudax: Pass: 100%/54 | Total: 2h 35m | Avg: 2m 53s | Max: 10m 01s | Hits: 77%/114

    🟩 cpu
      🟩 amd64              Pass: 100%/50  | Total:  2h 26m | Avg:  2m 56s | Max: 10m 01s | Hits:  77%/114   
      🟩 arm64              Pass: 100%/4   | Total:  8m 56s | Avg:  2m 14s | Max:  2m 18s
    🟩 ctk
      🟩 12.0               Pass: 100%/23  | Total:  1h 05m | Avg:  2m 50s | Max:  7m 11s | Hits:  77%/57    
      🟩 12.5               Pass: 100%/31  | Total:  1h 30m | Avg:  2m 54s | Max: 10m 01s | Hits:  77%/57    
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/23  | Total:  1h 05m | Avg:  2m 50s | Max:  7m 11s | Hits:  77%/57    
      🟩 nvcc12.5           Pass: 100%/31  | Total:  1h 30m | Avg:  2m 54s | Max: 10m 01s | Hits:  77%/57    
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/54  | Total:  2h 35m | Avg:  2m 53s | Max: 10m 01s | Hits:  77%/114   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  5m 26s | Avg:  2m 43s | Max:  2m 59s
      🟩 Clang10            Pass: 100%/2   | Total:  5m 03s | Avg:  2m 31s | Max:  2m 43s
      🟩 Clang11            Pass: 100%/4   | Total: 10m 10s | Avg:  2m 32s | Max:  2m 55s
      🟩 Clang12            Pass: 100%/4   | Total:  9m 37s | Avg:  2m 24s | Max:  2m 30s
      🟩 Clang13            Pass: 100%/4   | Total: 10m 56s | Avg:  2m 44s | Max:  2m 54s
      🟩 Clang14            Pass: 100%/6   | Total: 17m 32s | Avg:  2m 55s | Max:  3m 44s
      🟩 Clang15            Pass: 100%/2   | Total:  4m 53s | Avg:  2m 26s | Max:  2m 29s
      🟩 Clang16            Pass: 100%/6   | Total: 17m 55s | Avg:  2m 59s | Max:  4m 32s
      🟩 GCC9               Pass: 100%/2   | Total:  4m 39s | Avg:  2m 19s | Max:  2m 33s
      🟩 GCC10              Pass: 100%/4   | Total:  9m 05s | Avg:  2m 16s | Max:  2m 31s
      🟩 GCC11              Pass: 100%/4   | Total:  8m 47s | Avg:  2m 11s | Max:  2m 15s
      🟩 GCC12              Pass: 100%/12  | Total: 34m 30s | Avg:  2m 52s | Max:  4m 24s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  7m 11s | Avg:  7m 11s | Max:  7m 11s | Hits:  77%/57    
      🟩 MSVC14.39          Pass: 100%/1   | Total: 10m 01s | Avg: 10m 01s | Max: 10m 01s | Hits:  77%/57    
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  1h 21m | Avg:  2m 43s | Max:  4m 32s
      🟩 GCC                Pass: 100%/22  | Total: 57m 01s | Avg:  2m 35s | Max:  4m 24s
      🟩 MSVC               Pass: 100%/2   | Total: 17m 12s | Avg:  8m 36s | Max: 10m 01s | Hits:  77%/114   
    🟩 gpu
      🟩 v100               Pass: 100%/54  | Total:  2h 35m | Avg:  2m 53s | Max: 10m 01s | Hits:  77%/114   
    🟩 jobs
      🟩 Build              Pass: 100%/46  | Total:  2h 04m | Avg:  2m 42s | Max: 10m 01s | Hits:  77%/114   
      🟩 Test               Pass: 100%/8   | Total: 31m 09s | Avg:  3m 53s | Max:  4m 32s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 06s | Avg:  2m 06s | Max:  2m 06s
      🟩 90a                Pass: 100%/1   | Total:  2m 07s | Avg:  2m 07s | Max:  2m 07s
    🟩 std
      🟩 17                 Pass: 100%/30  | Total:  1h 19m | Avg:  2m 38s | Max:  4m 24s
      🟩 20                 Pass: 100%/24  | Total:  1h 16m | Avg:  3m 11s | Max: 10m 01s | Hits:  77%/114   
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 10m 28s | Avg: 10m 28s | Max: 10m 28s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 55)

# Runner
40 linux-amd64-cpu16
9 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

@miscco miscco merged commit c6b777b into NVIDIA:main Sep 3, 2024
70 checks passed
@ericniebler
Copy link
Collaborator Author

ericniebler commented Sep 3, 2024

rather than putting this into individual classes, we should have a CPO that verifies that a
class is a range and provides device_accessible memory

This would be a nice default behavior for the __launch_transform CPO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA Next Feature intended for the Cuda Next experimental library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants