Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cudax::distribute<threadsPrBlock>(numElements) #2210

Merged
merged 1 commit into from
Aug 9, 2024

Conversation

ericniebler
Copy link
Collaborator

… as a way to evenly distribute elements over thread blocks.

Description

cudax::distribute<threadsPrBlock>(numElements) is a shortcut notation for:

int blocksPerGrid = (numElements + threadsPrBlock - 1) / threadsPrBlock;
return cudax::make_hierarchy(cudax::grid_dims(blocksPerGrid), cudax::block_dims<threadsPrBlock>());

This will be used in the compute all hands presentation about cudax

@ericniebler ericniebler added the CUDA Next Feature intended for the Cuda Next experimental library label Aug 8, 2024
@ericniebler ericniebler requested a review from pciolkosz August 8, 2024 22:11
@ericniebler ericniebler requested a review from a team as a code owner August 8, 2024 22:11
Copy link
Contributor

github-actions bot commented Aug 9, 2024

🟩 CI finished in 2h 09m: Pass: 100%/56 | Total: 2h 46m | Avg: 2m 58s | Max: 11m 03s | Hits: 51%/94
  • 🟩 cudax: Pass: 100%/55 | Total: 2h 35m | Avg: 2m 49s | Max: 8m 34s | Hits: 51%/94

    🟩 cpu
      🟩 amd64              Pass: 100%/51  | Total:  2h 24m | Avg:  2m 49s | Max:  8m 34s | Hits:  51%/94    
      🟩 arm64              Pass: 100%/4   | Total: 11m 40s | Avg:  2m 55s | Max:  3m 26s
    🟩 ctk
      🟩 12.0               Pass: 100%/23  | Total:  1h 04m | Avg:  2m 48s | Max:  8m 34s | Hits:  51%/47    
      🟩 12.5               Pass: 100%/32  | Total:  1h 31m | Avg:  2m 50s | Max:  8m 29s | Hits:  51%/47    
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/23  | Total:  1h 04m | Avg:  2m 48s | Max:  8m 34s | Hits:  51%/47    
      🟩 nvcc12.5           Pass: 100%/32  | Total:  1h 31m | Avg:  2m 50s | Max:  8m 29s | Hits:  51%/47    
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/55  | Total:  2h 35m | Avg:  2m 49s | Max:  8m 34s | Hits:  51%/94    
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  4m 41s | Avg:  2m 20s | Max:  2m 31s
      🟩 Clang10            Pass: 100%/2   | Total:  4m 46s | Avg:  2m 23s | Max:  2m 30s
      🟩 Clang11            Pass: 100%/4   | Total: 10m 08s | Avg:  2m 32s | Max:  2m 59s
      🟩 Clang12            Pass: 100%/4   | Total:  9m 15s | Avg:  2m 18s | Max:  2m 32s
      🟩 Clang13            Pass: 100%/4   | Total:  9m 29s | Avg:  2m 22s | Max:  2m 27s
      🟩 Clang14            Pass: 100%/6   | Total: 17m 03s | Avg:  2m 50s | Max:  3m 51s
      🟩 Clang15            Pass: 100%/2   | Total:  4m 38s | Avg:  2m 19s | Max:  2m 19s
      🟩 Clang16            Pass: 100%/6   | Total: 18m 46s | Avg:  3m 07s | Max:  4m 16s
      🟩 GCC9               Pass: 100%/2   | Total:  4m 54s | Avg:  2m 27s | Max:  2m 50s
      🟩 GCC10              Pass: 100%/4   | Total:  9m 21s | Avg:  2m 20s | Max:  2m 26s
      🟩 GCC11              Pass: 100%/4   | Total:  9m 14s | Avg:  2m 18s | Max:  2m 37s
      🟩 GCC12              Pass: 100%/12  | Total: 33m 28s | Avg:  2m 47s | Max:  3m 40s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  2m 55s | Avg:  2m 55s | Max:  2m 55s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  8m 34s | Avg:  8m 34s | Max:  8m 34s | Hits:  51%/47    
      🟩 MSVC14.39          Pass: 100%/1   | Total:  8m 29s | Avg:  8m 29s | Max:  8m 29s | Hits:  51%/47    
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  1h 18m | Avg:  2m 37s | Max:  4m 16s
      🟩 GCC                Pass: 100%/22  | Total: 56m 57s | Avg:  2m 35s | Max:  3m 40s
      🟩 Intel              Pass: 100%/1   | Total:  2m 55s | Avg:  2m 55s | Max:  2m 55s
      🟩 MSVC               Pass: 100%/2   | Total: 17m 03s | Avg:  8m 31s | Max:  8m 34s | Hits:  51%/94    
    🟩 gpu
      🟩 v100               Pass: 100%/55  | Total:  2h 35m | Avg:  2m 49s | Max:  8m 34s | Hits:  51%/94    
    🟩 jobs
      🟩 Build              Pass: 100%/47  | Total:  2h 06m | Avg:  2m 40s | Max:  8m 34s | Hits:  51%/94    
      🟩 Test               Pass: 100%/8   | Total: 29m 37s | Avg:  3m 42s | Max:  4m 16s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  1m 53s | Avg:  1m 53s | Max:  1m 53s
      🟩 90a                Pass: 100%/1   | Total:  2m 11s | Avg:  2m 11s | Max:  2m 11s
    🟩 std
      🟩 17                 Pass: 100%/31  | Total:  1h 20m | Avg:  2m 36s | Max:  4m 16s
      🟩 20                 Pass: 100%/24  | Total:  1h 15m | Avg:  3m 07s | Max:  8m 34s | Hits:  51%/94    
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 03s | Avg: 11m 03s | Max: 11m 03s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 03s | Avg: 11m 03s | Max: 11m 03s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 03s | Avg: 11m 03s | Max: 11m 03s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 03s | Avg: 11m 03s | Max: 11m 03s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 03s | Avg: 11m 03s | Max: 11m 03s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 03s | Avg: 11m 03s | Max: 11m 03s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 03s | Avg: 11m 03s | Max: 11m 03s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 03s | Avg: 11m 03s | Max: 11m 03s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 03s | Avg: 11m 03s | Max: 11m 03s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 56)

# Runner
41 linux-amd64-cpu16
9 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

@ericniebler ericniebler merged commit f95f211 into main Aug 9, 2024
72 checks passed
@miscco miscco deleted the add-cudax-distribute-shortcut branch August 9, 2024 06:52
pciolkosz pushed a commit to pciolkosz/cccl that referenced this pull request Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA Next Feature intended for the Cuda Next experimental library
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants