Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for large number of items to DeviceScan #2171

Merged
merged 10 commits into from
Aug 21, 2024

Conversation

elstehle
Copy link
Collaborator

@elstehle elstehle commented Aug 2, 2024

Description

Closes #2062

Currently opened as a draft PR while waiting for resources to perform performance assessments.

Remaining tasks:

  • Benchmark changes

Note, DeviceScan::*ByKey is not yet enabled with this PR.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link
Contributor

github-actions bot commented Aug 2, 2024

🟨 CI finished in 9h 34m: Pass: 99%/250 | Total: 6d 00h | Avg: 34m 37s | Max: 1h 11m | Hits: 70%/248308
  • 🟨 cub: Pass: 98%/131 | Total: 3d 22h | Avg: 43m 03s | Max: 1h 11m | Hits: 69%/109396

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  98%/123 | Total:  3d 14h | Avg: 42m 11s | Max:  1h 11m | Hits:  70%/102460
      🟩 arm64              Pass: 100%/8   | Total:  7h 31m | Avg: 56m 26s | Max:  1h 04m | Hits:  60%/6936  
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total: 11h 17m | Avg: 45m 09s | Max: 50m 14s | Hits:  60%/11792 
      🟩 11.8               Pass: 100%/3   | Total:  3h 26m | Avg:  1h 08m | Max:  1h 11m | Hits:  59%/2601  
      🔍 12.5               Pass:  98%/113 | Total:  3d 07h | Avg: 42m 05s | Max:  1h 05m | Hits:  71%/95003 
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 44m 11s | Avg: 22m 05s | Max: 22m 56s | Hits:  66%/1436  
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 17m | Avg: 45m 09s | Max: 50m 14s | Hits:  60%/11792 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 26m | Avg:  1h 08m | Max:  1h 11m | Hits:  59%/2601  
      🔍 nvcc12.5           Pass:  98%/111 | Total:  3d 06h | Avg: 42m 27s | Max:  1h 05m | Hits:  71%/93567 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 44m 11s | Avg: 22m 05s | Max: 22m 56s | Hits:  66%/1436  
      🔍 nvcc               Pass:  98%/129 | Total:  3d 21h | Avg: 43m 22s | Max:  1h 11m | Hits:  69%/107960
    🟨 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  4h 55m | Avg: 49m 13s | Max: 55m 42s | Hits:  60%/4980  
      🟩 Clang10            Pass: 100%/3   | Total:  2h 30m | Avg: 50m 13s | Max: 50m 49s | Hits:  60%/2607  
      🟩 Clang11            Pass: 100%/4   | Total:  3h 30m | Avg: 52m 40s | Max: 55m 19s | Hits:  60%/3476  
      🟩 Clang12            Pass: 100%/4   | Total:  3h 30m | Avg: 52m 35s | Max: 56m 42s | Hits:  60%/3476  
      🟩 Clang13            Pass: 100%/4   | Total:  3h 25m | Avg: 51m 17s | Max: 55m 19s | Hits:  60%/3476  
      🟩 Clang14            Pass: 100%/4   | Total:  3h 26m | Avg: 51m 30s | Max: 54m 47s | Hits:  60%/3476  
      🟩 Clang15            Pass: 100%/4   | Total:  3h 28m | Avg: 52m 14s | Max: 58m 13s | Hits:  60%/3468  
      🟩 Clang16            Pass: 100%/4   | Total:  3h 26m | Avg: 51m 41s | Max: 54m 58s | Hits:  60%/3468  
      🟨 Clang17            Pass:  96%/26  | Total: 13h 08m | Avg: 30m 19s | Max:  1h 01m | Hits:  84%/21377 
      🟩 GCC6               Pass: 100%/2   | Total:  1h 27m | Avg: 43m 33s | Max: 44m 55s | Hits:  60%/1582  
      🟩 GCC7               Pass: 100%/6   | Total:  4h 49m | Avg: 48m 17s | Max: 54m 02s | Hits:  60%/4983  
      🟩 GCC8               Pass: 100%/6   | Total:  4h 51m | Avg: 48m 39s | Max: 52m 33s | Hits:  60%/4983  
      🟩 GCC9               Pass: 100%/6   | Total:  4h 50m | Avg: 48m 20s | Max: 53m 54s | Hits:  60%/4983  
      🟩 GCC10              Pass: 100%/4   | Total:  3h 29m | Avg: 52m 26s | Max: 53m 21s | Hits:  60%/3476  
      🟩 GCC11              Pass: 100%/7   | Total:  7h 02m | Avg:  1h 00m | Max:  1h 11m | Hits:  59%/6069  
      🟩 GCC12              Pass: 100%/4   | Total:  3h 30m | Avg: 52m 33s | Max: 54m 51s | Hits:  59%/3468  
      🟨 GCC13              Pass:  96%/28  | Total: 13h 59m | Avg: 29m 59s | Max:  1h 04m | Hits:  81%/23409 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 41m | Avg: 53m 40s | Max: 55m 07s | Hits:  60%/2385  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 50m 14s | Avg: 50m 14s | Max: 50m 14s | Hits:  64%/709   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 05m | Hits:  64%/1418  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 01m | Avg:  1h 00m | Max:  1h 01m | Hits:  64%/2127  
    🟨 cxx_family
      🟨 Clang              Pass:  98%/59  | Total:  1d 17h | Avg: 42m 04s | Max:  1h 01m | Hits:  70%/49804 
      🟨 GCC                Pass:  98%/63  | Total:  1d 20h | Avg: 41m 55s | Max:  1h 11m | Hits:  69%/52953 
      🟩 Intel              Pass: 100%/3   | Total:  2h 41m | Avg: 53m 40s | Max: 55m 07s | Hits:  60%/2385  
      🟩 MSVC               Pass: 100%/6   | Total:  5h 55m | Avg: 59m 16s | Max:  1h 05m | Hits:  64%/4254  
    🟨 jobs
      🟩 Build              Pass: 100%/99  | Total:  3d 11h | Avg: 50m 41s | Max:  1h 11m | Hits:  60%/83386 
      🟨 DeviceLaunch       Pass:  87%/8   | Total:  2h 16m | Avg: 17m 01s | Max: 21m 26s | Hits:  99%/6069  
      🟨 GraphCapture       Pass:  87%/8   | Total:  1h 55m | Avg: 14m 24s | Max: 16m 49s | Hits:  99%/6069  
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 38m | Avg: 19m 50s | Max: 21m 16s | Hits:  99%/6936  
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 31m | Avg: 26m 24s | Max: 29m 04s | Hits:  99%/6936  
    🟨 std
      🟩 11                 Pass: 100%/34  | Total:  1d 00h | Avg: 43m 30s | Max:  1h 11m | Hits:  69%/29049 
      🟨 14                 Pass:  97%/37  | Total:  1d 03h | Avg: 44m 43s | Max:  1h 06m | Hits:  68%/30309 
      🟩 17                 Pass: 100%/36  | Total:  1d 01h | Avg: 43m 06s | Max:  1h 08m | Hits:  69%/30394 
      🟨 20                 Pass:  95%/24  | Total: 15h 54m | Avg: 39m 47s | Max:  1h 04m | Hits:  72%/19644 
    🟨 gpu
      🟨 v100               Pass:  98%/131 | Total:  3d 22h | Avg: 43m 03s | Max:  1h 11m | Hits:  69%/109396
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 26m | Avg:  1h 08m | Max:  1h 11m | Hits:  59%/2601  
      🟩 90a                Pass: 100%/4   | Total:  1h 28m | Avg: 22m 10s | Max: 22m 21s | Hits:  59%/3468  
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 02h | Avg: 25m 27s | Max: 51m 31s | Hits: 71%/138912

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 22h | Avg: 25m 23s | Max: 51m 31s | Hits:  71%/129492
      🟩 arm64              Pass: 100%/8   | Total:  3h 30m | Avg: 26m 20s | Max: 30m 52s | Hits:  66%/9420  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 12m | Avg: 24m 48s | Max: 45m 39s | Hits:  66%/17660 
      🟩 11.8               Pass: 100%/3   | Total:  1h 48m | Avg: 36m 00s | Max: 36m 37s | Hits:  66%/3534  
      🟩 12.5               Pass: 100%/100 | Total:  1d 18h | Avg: 25m 13s | Max: 51m 31s | Hits:  72%/117718
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 48m 13s | Avg: 24m 06s | Max: 25m 26s | Hits:  66%/2354  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 12m | Avg: 24m 48s | Max: 45m 39s | Hits:  66%/17660 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 48m | Avg: 36m 00s | Max: 36m 37s | Hits:  66%/3534  
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 17h | Avg: 25m 15s | Max: 51m 31s | Hits:  72%/115364
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 48m 13s | Avg: 24m 06s | Max: 25m 26s | Hits:  66%/2354  
      🟩 nvcc               Pass: 100%/116 | Total:  2d 01h | Avg: 25m 28s | Max: 51m 31s | Hits:  71%/136558
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 25m | Avg: 24m 18s | Max: 28m 47s | Hits:  67%/7062  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 17m | Avg: 25m 44s | Max: 27m 05s | Hits:  67%/3531  
      🟩 Clang11            Pass: 100%/4   | Total:  1h 48m | Avg: 27m 13s | Max: 29m 37s | Hits:  66%/4708  
      🟩 Clang12            Pass: 100%/4   | Total:  1h 50m | Avg: 27m 33s | Max: 29m 42s | Hits:  66%/4708  
      🟩 Clang13            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 11s | Max: 27m 51s | Hits:  66%/4708  
      🟩 Clang14            Pass: 100%/4   | Total:  1h 45m | Avg: 26m 16s | Max: 28m 24s | Hits:  66%/4708  
      🟩 Clang15            Pass: 100%/4   | Total:  1h 45m | Avg: 26m 24s | Max: 27m 53s | Hits:  66%/4708  
      🟩 Clang16            Pass: 100%/4   | Total:  1h 52m | Avg: 28m 06s | Max: 30m 55s | Hits:  66%/4708  
      🟩 Clang17            Pass: 100%/18  | Total:  5h 39m | Avg: 18m 51s | Max: 28m 35s | Hits:  81%/21186 
      🟩 GCC6               Pass: 100%/2   | Total: 43m 36s | Avg: 21m 48s | Max: 24m 09s | Hits:  67%/2354  
      🟩 GCC7               Pass: 100%/6   | Total:  2h 26m | Avg: 24m 21s | Max: 28m 08s | Hits:  66%/7068  
      🟩 GCC8               Pass: 100%/6   | Total:  2h 26m | Avg: 24m 24s | Max: 27m 40s | Hits:  66%/7068  
      🟩 GCC9               Pass: 100%/6   | Total:  2h 37m | Avg: 26m 12s | Max: 29m 30s | Hits:  66%/7068  
      🟩 GCC10              Pass: 100%/4   | Total:  1h 51m | Avg: 27m 51s | Max: 30m 25s | Hits:  66%/4712  
      🟩 GCC11              Pass: 100%/7   | Total:  3h 39m | Avg: 31m 23s | Max: 36m 37s | Hits:  66%/8246  
      🟩 GCC12              Pass: 100%/4   | Total:  1h 50m | Avg: 27m 40s | Max: 29m 27s | Hits:  66%/4712  
      🟩 GCC13              Pass: 100%/20  | Total:  6h 47m | Avg: 20m 23s | Max: 40m 03s | Hits:  76%/23560 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 36m | Avg: 32m 15s | Max: 34m 34s | Hits:  67%/3540  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 45m 39s | Avg: 45m 39s | Max: 45m 39s | Hits:  64%/1173  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 40m | Avg: 50m 26s | Max: 51m 31s | Hits:  64%/2346  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 27m | Avg: 34m 35s | Max: 51m 12s | Hits:  81%/7038  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 20h 09m | Avg: 23m 42s | Max: 30m 55s | Hits:  72%/60027 
      🟩 GCC                Pass: 100%/55  | Total: 22h 23m | Avg: 24m 25s | Max: 40m 03s | Hits:  70%/64788 
      🟩 Intel              Pass: 100%/3   | Total:  1h 36m | Avg: 32m 15s | Max: 34m 34s | Hits:  67%/3540  
      🟩 MSVC               Pass: 100%/9   | Total:  5h 54m | Avg: 39m 20s | Max: 51m 31s | Hits:  76%/10557 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 02h | Avg: 25m 27s | Max: 51m 31s | Hits:  71%/138912
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 21h | Avg: 27m 39s | Max: 51m 31s | Hits:  66%/116553
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 45m | Avg:  9m 36s | Max: 22m 20s | Hits:  99%/12939 
      🟩 TestGPU            Pass: 100%/8   | Total:  2h 38m | Avg: 19m 50s | Max: 40m 03s | Hits:  95%/9420  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 48m | Avg: 36m 00s | Max: 36m 37s | Hits:  66%/3534  
      🟩 90a                Pass: 100%/4   | Total:  1h 01m | Avg: 15m 15s | Max: 16m 14s | Hits:  66%/4712  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 10h 31m | Avg: 21m 02s | Max: 35m 11s | Hits:  72%/35328 
      🟩 14                 Pass: 100%/34  | Total: 15h 46m | Avg: 27m 50s | Max: 51m 31s | Hits:  68%/40020 
      🟩 17                 Pass: 100%/33  | Total: 15h 00m | Avg: 27m 16s | Max: 49m 25s | Hits:  71%/38847 
      🟩 20                 Pass: 100%/21  | Total:  8h 45m | Avg: 25m 01s | Max: 50m 55s | Hits:  74%/24717 
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 10m 59s | Avg: 10m 59s | Max: 10m 59s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@elstehle elstehle force-pushed the enh/offset-device-scan branch from 12e1a9f to 746b108 Compare August 5, 2024 14:01
@elstehle
Copy link
Collaborator Author

elstehle commented Aug 5, 2024

Performance numbers look good.

h100 exclusive.max, CTK 12.5

results
T{ct} Elements{io} I32 u32 u32/i32 time i64 i64/i32 time u64 u64/i32 time
I8 2^16 = 65536 8.91 9.367 105.13% 9.232 103.61% 9.359 105.04%
I8 2^20 = 1048576 13.494 13.368 99.07% 13.124 97.26% 13.187 97.72%
I8 2^24 = 16777216 53.134 52.893 99.55% 53.066 99.87% 53.063 99.87%
I8 2^28 = 268435456 729.843 729.989 100.02% 731.428 100.22% 731.704 100.25%
I16 2^16 = 65536 9.752 9.715 99.62% 9.792 100.41% 9.631 98.76%
I16 2^20 = 1048576 13.951 14.187 101.69% 14.069 100.85% 14.206 101.83%
I16 2^24 = 16777216 67.863 67.839 99.96% 67.01 98.74% 66.97 98.68%
I16 2^28 = 268435456 952.537 951.78 99.92% 978.596 102.74% 977.729 102.64%
I32 2^16 = 65536 9.769 9.261 94.80% 9.289 95.09% 9.176 93.93%
I32 2^20 = 1048576 15.022 14.511 96.60% 14.512 96.60% 14.564 96.95%
I32 2^24 = 16777216 99.151 98.738 99.58% 98.688 99.53% 98.659 99.50%
I32 2^28 = 268435456 1456 1456 100.00% 1455 99.93% 1455 99.93%
I64 2^16 = 65536 9.573 9.634 100.64% 9.478 99.01% 9.584 100.11%
I64 2^20 = 1048576 19.037 18.873 99.14% 19.032 99.97% 18.964 99.62%
I64 2^24 = 16777216 171.205 171.247 100.02% 171.164 99.98% 171.331 100.07%
I64 2^28 = 268435456 2631 2632 100.04% 2631 100.00% 2630 99.96%
I128 2^16 = 65536 16.374 16.465 100.56% 16.523 100.91% 16.542 101.03%
I128 2^20 = 1048576 65.597 65.4 99.70% 65.036 99.14% 65.177 99.36%
I128 2^24 = 16777216 868.064 868.039 100.00% 866.639 99.84% 867.766 99.97%
I128 2^28 = 268435456 13727 13724 99.98% 13715 99.91% 13716 99.92%
F32 2^16 = 65536 9.441 9.475 100.36% 9.38 99.35% 9.433 99.92%
F32 2^20 = 1048576 14.47 14.477 100.05% 14.584 100.79% 14.568 100.68%
F32 2^24 = 16777216 98.968 98.861 99.89% 98.991 100.02% 99.018 100.05%
F32 2^28 = 268435456 1459 1459 100.00% 1459 100.00% 1459 100.00%
F64 2^16 = 65536 9.767 9.579 98.08% 9.485 97.11% 9.711 99.43%
F64 2^20 = 1048576 18.928 18.857 99.62% 19.034 100.56% 18.935 100.04%
F64 2^24 = 16777216 173.845 173.919 100.04% 173.85 100.00% 173.954 100.06%
F64 2^28 = 268435456 2695 2695 100.00% 2695 100.00% 2694 99.96%
C64 2^16 = 65536 67.742 67.006 98.91% 66.853 98.69% 67.309 99.36%
C64 2^20 = 1048576 114.314 114.531 100.19% 115.019 100.62% 115.035 100.63%
C64 2^24 = 16777216 1165 1162 99.74% 1155 99.14% 1150 98.71%
C64 2^28 = 268435456 17419 17458 100.22% 17380 99.78% 17475 100.32%

h100 exclusive.sum, CTK 12.5

results
T{ct} Elements{io} I32 u32 u32/i32 time i64 i64/i32 time u64 u64/i32 time
I8 2^16 = 65536 8.761 9.183 104.82% 9.104 103.92% 9.29 106.04%
I8 2^20 = 1048576 12.446 12.639 101.55% 12.35 99.23% 12.103 97.24%
I8 2^24 = 16777216 49.837 50.075 100.48% 52.084 104.51% 52.143 104.63%
I8 2^28 = 268435456 649.015 650.869 100.29% 685.588 105.64% 686.283 105.74%
I16 2^16 = 65536 9.739 9.557 98.13% 9.748 100.09% 9.58 98.37%
I16 2^20 = 1048576 13.246 13.363 100.88% 13.202 99.67% 13.321 100.57%
I16 2^24 = 16777216 60.795 60.863 100.11% 60.789 99.99% 60.878 100.14%
I16 2^28 = 268435456 799.135 798.496 99.92% 799.297 100.02% 799.511 100.05%
I32 2^16 = 65536 9.598 9.212 95.98% 9.342 97.33% 9.176 95.60%
I32 2^20 = 1048576 14.984 14.715 98.20% 14.78 98.64% 14.795 98.74%
I32 2^24 = 16777216 90.713 90.552 99.82% 90.508 99.77% 90.405 99.66%
I32 2^28 = 268435456 1366 1366 100.00% 1367 100.07% 1366 100.00%
I64 2^16 = 65536 9.714 9.792 100.80% 9.626 99.09% 9.938 102.31%
I64 2^20 = 1048576 17.97 17.932 99.79% 18.348 102.10% 18.371 102.23%
I64 2^24 = 16777216 160.284 160.186 99.94% 160.507 100.14% 160.582 100.19%
I64 2^28 = 268435456 2476 2475 99.96% 2476 100.00% 2476 100.00%
I128 2^16 = 65536 14.439 14.311 99.11% 14.591 101.05% 14.559 100.83%
I128 2^20 = 1048576 38.383 38.23 99.60% 38.186 99.49% 38.178 99.47%
I128 2^24 = 16777216 388.32 388.111 99.95% 388.071 99.94% 387.488 99.79%
I128 2^28 = 268435456 6036 6034 99.97% 6027 99.85% 6026 99.83%
F32 2^16 = 65536 9.63 9.58 99.48% 9.682 100.54% 9.694 100.66%
F32 2^20 = 1048576 14.878 14.867 99.93% 14.885 100.05% 14.976 100.66%
F32 2^24 = 16777216 90.628 90.576 99.94% 90.608 99.98% 90.615 99.99%
F32 2^28 = 268435456 1364 1364 100.00% 1363 99.93% 1363 99.93%
F64 2^16 = 65536 10.154 9.662 95.15% 9.764 96.16% 9.629 94.83%
F64 2^20 = 1048576 18.582 18.327 98.63% 18.171 97.79% 18.377 98.90%
F64 2^24 = 16777216 161.011 160.632 99.76% 160.657 99.78% 160.797 99.87%
F64 2^28 = 268435456 2479 2480 100.04% 2479 100.00% 2479 100.00%
C64 2^16 = 65536 12.738 12.733 99.96% 12.641 99.24% 12.891 101.20%
C64 2^20 = 1048576 29.051 28.972 99.73% 29.436 101.33% 29.412 101.24%
C64 2^24 = 16777216 291.342 291.891 100.19% 291.113 99.92% 291.2 99.95%
C64 2^28 = 268435456 4543 4550 100.15% 4530 99.71% 4530 99.71%

h100 exclusive.max, CTK 12.3

results
T{ct} Elements{io} I32 u32 u32/i32 time i64 i64/i32 time u64 u64/i32 time
I8 2^16 = 65536 9.022 9.699 107.50% 9.827 108.92% 10.089 111.83%
I8 2^20 = 1048576 13.85 13.685 98.81% 13.956 100.77% 13.693 98.87%
I8 2^24 = 16777216 53.515 53.572 100.11% 53.723 100.39% 53.492 99.96%
I8 2^28 = 268435456 728.579 728.888 100.04% 730.289 100.23% 728.182 99.95%
I16 2^16 = 65536 9.733 9.719 99.86% 9.638 99.02% 9.624 98.88%
I16 2^20 = 1048576 13.82 13.96 101.01% 13.931 100.80% 13.971 101.09%
I16 2^24 = 16777216 67.724 67.597 99.81% 66.443 98.11% 66.562 98.28%
I16 2^28 = 268435456 947.131 946.892 99.97% 973.968 102.83% 974.157 102.85%
I32 2^16 = 65536 9.606 9.73 101.29% 9.934 103.41% 9.71 101.08%
I32 2^20 = 1048576 14.759 15.054 102.00% 15.136 102.55% 15.145 102.62%
I32 2^24 = 16777216 98.828 98.883 100.06% 98.84 100.01% 99.003 100.18%
I32 2^28 = 268435456 1452 1453 100.07% 1452 100.00% 1452 100.00%
I64 2^16 = 65536 9.923 9.994 100.72% 9.89 99.67% 9.785 98.61%
I64 2^20 = 1048576 19.653 19.206 97.73% 19.524 99.34% 19.225 97.82%
I64 2^24 = 16777216 171.416 171.11 99.82% 171.323 99.95% 171.053 99.79%
I64 2^28 = 268435456 2627 2626 99.96% 2627 100.00% 2625 99.92%
I128 2^16 = 65536 16.486 16.672 101.13% 16.533 100.29% 16.747 101.58%
I128 2^20 = 1048576 64.837 65.09 100.39% 64.58 99.60% 64.94 100.16%
I128 2^24 = 16777216 853.03 854.318 100.15% 852.769 99.97% 854.776 100.20%
I128 2^28 = 268435456 13522 13531 100.07% 13518 99.97% 13546 100.18%
F32 2^16 = 65536 10.024 9.982 99.58% 9.992 99.68% 9.999 99.75%
F32 2^20 = 1048576 14.938 15.004 100.44% 15.209 101.81% 15.094 101.04%
F32 2^24 = 16777216 99.208 99.203 99.99% 99.244 100.04% 99.158 99.95%
F32 2^28 = 268435456 1456 1457 100.07% 1457 100.07% 1457 100.07%
F64 2^16 = 65536 10.357 9.786 94.49% 9.769 94.32% 9.775 94.38%
F64 2^20 = 1048576 19.51 19.211 98.47% 19.207 98.45% 19.145 98.13%
F64 2^24 = 16777216 174.651 174.098 99.68% 174.074 99.67% 174.071 99.67%
F64 2^28 = 268435456 2693 2690 99.89% 2692 99.96% 2690 99.89%
C64 2^16 = 65536 66.521 66.61 100.13% 68.157 102.46% 68.178 102.49%
C64 2^20 = 1048576 115.013 114.007 99.13% 114.196 99.29% 114.597 99.64%
C64 2^24 = 16777216 1165 1156 99.23% 1160 99.57% 1171 100.52%
C64 2^28 = 268435456 17473 17344 99.26% 17419 99.69% 17442 99.82%
C64 2^28 = 268435456 4510 4509 99.98% 4489 99.53% 4487 99.49%

h100 exclusive.sum, CTK 12.3

results
baseline:
T{ct} Elements{io} I32 u32 u32/i32 time i64 i64/i32 time u64 u64/i32 time
I8 2^16 = 65536 8.364 9.262 110.74% 9.257 110.68% 9.438 112.84%
I8 2^20 = 1048576 12.698 12.516 98.57% 12.788 100.71% 12.408 97.72%
I8 2^24 = 16777216 49.91 50.287 100.76% 52.495 105.18% 52.487 105.16%
I8 2^28 = 268435456 648.025 650.262 100.35% 685.339 105.76% 685.475 105.78%
I16 2^16 = 65536 9.861 9.987 101.28% 9.899 100.39% 9.824 99.62%
I16 2^20 = 1048576 13.44 13.3 98.96% 13.383 99.58% 13.349 99.32%
I16 2^24 = 16777216 61.299 61.025 99.55% 61.039 99.58% 60.886 99.33%
I16 2^28 = 268435456 797.713 797.044 99.92% 797.688 100.00% 797.082 99.92%
I32 2^16 = 65536 9.935 9.792 98.56% 9.726 97.90% 9.681 97.44%
I32 2^20 = 1048576 15.07 15.342 101.80% 15.433 102.41% 15.447 102.50%
I32 2^24 = 16777216 91.326 91.125 99.78% 91.258 99.93% 91.344 100.02%
I32 2^28 = 268435456 1363 1363 100.00% 1364 100.07% 1364 100.07%
I64 2^16 = 65536 9.992 9.992 100.00% 10.094 101.02% 9.803 98.11%
I64 2^20 = 1048576 18.543 18.221 98.26% 18.535 99.96% 18.233 98.33%
I64 2^24 = 16777216 161.106 160.77 99.79% 160.966 99.91% 160.782 99.80%
I64 2^28 = 268435456 2474 2474 100.00% 2474 100.00% 2473 99.96%
I128 2^16 = 65536 14.258 14.494 101.66% 14.371 100.79% 14.562 102.13%
I128 2^20 = 1048576 38.324 38.083 99.37% 38.113 99.45% 38.067 99.33%
I128 2^24 = 16777216 386.014 385.711 99.92% 385.943 99.98% 385.618 99.90%
I128 2^28 = 268435456 5995 5992 99.95% 5989 99.90% 5990 99.92%
F32 2^16 = 65536 9.545 9.52 99.74% 9.612 100.70% 9.649 101.09%
F32 2^20 = 1048576 14.746 14.745 99.99% 14.812 100.45% 14.8 100.37%
F32 2^24 = 16777216 90.711 90.775 100.07% 90.706 99.99% 90.851 100.15%
F32 2^28 = 268435456 1361 1362 100.07% 1360 99.93% 1361 100.00%
F64 2^16 = 65536 10.153 10.159 100.06% 10.13 99.77% 10.139 99.86%
F64 2^20 = 1048576 18.41 18.731 101.74% 18.644 101.27% 18.712 101.64%
F64 2^24 = 16777216 161.142 161.001 99.91% 161.433 100.18% 161.188 100.03%
F64 2^28 = 268435456 2477 2478 100.04% 2477 100.00% 2477 100.00%
C64 2^16 = 65536 12.925 13.161 101.83% 13.05 100.97% 12.897 99.78%
C64 2^20 = 1048576 29.355 29.085 99.08% 29.337 99.94% 29.06 99.00%
C64 2^24 = 16777216 291.247 291.089 99.95% 290.084 99.60% 289.553 99.42%
C64 2^28 = 268435456 4510 4509 99.98% 4489 99.53% 4487 99.49%

@elstehle elstehle marked this pull request as ready for review August 5, 2024 15:18
@elstehle elstehle requested review from a team as code owners August 5, 2024 15:18
Copy link
Contributor

github-actions bot commented Aug 5, 2024

🟨 CI finished in 2h 56m: Pass: 98%/250 | Total: 5d 10h | Avg: 31m 12s | Max: 1h 04m | Hits: 73%/247441
  • 🟨 cub: Pass: 97%/131 | Total: 3d 13h | Avg: 38m 58s | Max: 1h 04m | Hits: 73%/108529

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  97%/123 | Total:  3d 06h | Avg: 38m 23s | Max:  1h 04m | Hits:  73%/101593
      🟩 arm64              Pass: 100%/8   | Total:  6h 22m | Avg: 47m 49s | Max: 53m 08s | Hits:  68%/6936  
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total:  9h 09m | Avg: 36m 39s | Max: 51m 17s | Hits:  69%/11792 
      🟩 11.8               Pass: 100%/3   | Total:  3h 04m | Avg:  1h 01m | Max:  1h 04m | Hits:  67%/2601  
      🔍 12.5               Pass:  97%/113 | Total:  3d 00h | Avg: 38m 40s | Max:  1h 02m | Hits:  73%/94136 
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 30m 49s | Avg: 15m 24s | Max: 16m 57s | Hits:  79%/1436  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  9h 09m | Avg: 36m 39s | Max: 51m 17s | Hits:  69%/11792 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 04m | Avg:  1h 01m | Max:  1h 04m | Hits:  67%/2601  
      🔍 nvcc12.5           Pass:  97%/111 | Total:  3d 00h | Avg: 39m 05s | Max:  1h 02m | Hits:  73%/92700 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 30m 49s | Avg: 15m 24s | Max: 16m 57s | Hits:  79%/1436  
      🔍 nvcc               Pass:  97%/129 | Total:  3d 12h | Avg: 39m 20s | Max:  1h 04m | Hits:  73%/107093
    🟨 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  4h 11m | Avg: 41m 50s | Max: 51m 28s | Hits:  65%/4980  
      🟩 Clang10            Pass: 100%/3   | Total:  2h 15m | Avg: 45m 05s | Max: 46m 47s | Hits:  68%/2607  
      🟩 Clang11            Pass: 100%/4   | Total:  2h 54m | Avg: 43m 32s | Max: 44m 39s | Hits:  68%/3476  
      🟩 Clang12            Pass: 100%/4   | Total:  2h 55m | Avg: 43m 58s | Max: 47m 02s | Hits:  68%/3476  
      🟩 Clang13            Pass: 100%/4   | Total:  2h 55m | Avg: 43m 52s | Max: 45m 10s | Hits:  68%/3476  
      🟩 Clang14            Pass: 100%/4   | Total:  3h 06m | Avg: 46m 36s | Max: 47m 37s | Hits:  68%/3476  
      🟩 Clang15            Pass: 100%/4   | Total:  3h 00m | Avg: 45m 13s | Max: 47m 15s | Hits:  68%/3468  
      🟩 Clang16            Pass: 100%/4   | Total:  3h 33m | Avg: 53m 23s | Max: 56m 47s | Hits:  59%/3468  
      🟨 Clang17            Pass:  96%/26  | Total: 12h 05m | Avg: 27m 53s | Max: 53m 14s | Hits:  86%/21377 
      🟩 GCC6               Pass: 100%/2   | Total:  1h 08m | Avg: 34m 15s | Max: 35m 27s | Hits:  70%/1582  
      🟩 GCC7               Pass: 100%/6   | Total:  4h 30m | Avg: 45m 00s | Max: 58m 05s | Hits:  62%/4983  
      🟩 GCC8               Pass: 100%/6   | Total:  4h 23m | Avg: 43m 58s | Max: 54m 39s | Hits:  65%/4983  
      🟩 GCC9               Pass: 100%/6   | Total:  4h 29m | Avg: 44m 57s | Max: 55m 58s | Hits:  64%/4983  
      🟩 GCC10              Pass: 100%/4   | Total:  3h 23m | Avg: 50m 51s | Max: 52m 06s | Hits:  58%/3476  
      🟨 GCC11              Pass:  85%/7   | Total:  5h 53m | Avg: 50m 31s | Max:  1h 04m | Hits:  63%/5202  
      🟩 GCC12              Pass: 100%/4   | Total:  3h 24m | Avg: 51m 02s | Max: 52m 26s | Hits:  58%/3468  
      🟨 GCC13              Pass:  96%/28  | Total: 12h 47m | Avg: 27m 24s | Max: 52m 29s | Hits:  82%/23409 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 19m | Avg: 46m 23s | Max: 49m 12s | Hits:  65%/2385  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 51m 17s | Avg: 51m 17s | Max: 51m 17s | Hits:  61%/709   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 02s | Max: 57m 11s | Hits:  66%/1418  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 01m | Avg:  1h 00m | Max:  1h 02m | Hits:  65%/2127  
    🟨 cxx_family
      🟨 Clang              Pass:  98%/59  | Total:  1d 12h | Avg: 37m 35s | Max: 56m 47s | Hits:  75%/49804 
      🟨 GCC                Pass:  96%/63  | Total:  1d 16h | Avg: 38m 06s | Max:  1h 04m | Hits:  71%/52086 
      🟩 Intel              Pass: 100%/3   | Total:  2h 19m | Avg: 46m 23s | Max: 49m 12s | Hits:  65%/2385  
      🟩 MSVC               Pass: 100%/6   | Total:  5h 47m | Avg: 57m 51s | Max:  1h 02m | Hits:  65%/4254  
    🟨 jobs
      🟨 Build              Pass:  98%/99  | Total:  3d 03h | Avg: 45m 42s | Max:  1h 04m | Hits:  64%/82519 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 35m | Avg: 19m 27s | Max: 20m 44s | Hits:  99%/6936  
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 02m | Avg: 15m 22s | Max: 18m 16s | Hits:  99%/6936  
      🟨 HostLaunch         Pass:  87%/8   | Total:  2h 13m | Avg: 16m 43s | Max: 20m 28s | Hits:  99%/6069  
      🟨 TestGPU            Pass:  87%/8   | Total:  2h 48m | Avg: 21m 02s | Max: 24m 47s | Hits:  99%/6069  
    🟨 std
      🟨 11                 Pass:  97%/34  | Total: 21h 46m | Avg: 38m 25s | Max:  1h 04m | Hits:  73%/28182 
      🟨 14                 Pass:  97%/37  | Total:  1d 00h | Avg: 39m 47s | Max:  1h 01m | Hits:  70%/30309 
      🟨 17                 Pass:  97%/36  | Total: 23h 55m | Avg: 39m 52s | Max: 59m 04s | Hits:  72%/29527 
      🟩 20                 Pass: 100%/24  | Total: 14h 51m | Avg: 37m 09s | Max:  1h 02m | Hits:  77%/20511 
    🟨 gpu
      🟨 v100               Pass:  97%/131 | Total:  3d 13h | Avg: 38m 58s | Max:  1h 04m | Hits:  73%/108529
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 04m | Avg:  1h 01m | Max:  1h 04m | Hits:  67%/2601  
      🟩 90a                Pass: 100%/4   | Total:  1h 30m | Avg: 22m 30s | Max: 23m 38s | Hits:  58%/3468  
    
  • 🟩 thrust: Pass: 100%/118 | Total: 1d 20h | Avg: 22m 45s | Max: 51m 32s | Hits: 74%/138912

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 18h | Avg: 23m 01s | Max: 51m 32s | Hits:  73%/129492
      🟩 arm64              Pass: 100%/8   | Total:  2h 33m | Avg: 19m 10s | Max: 24m 17s | Hits:  78%/9420  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  4h 05m | Avg: 16m 23s | Max: 44m 29s | Hits:  78%/17660 
      🟩 11.8               Pass: 100%/3   | Total:  1h 18m | Avg: 26m 12s | Max: 31m 22s | Hits:  76%/3534  
      🟩 12.5               Pass: 100%/100 | Total:  1d 15h | Avg: 23m 36s | Max: 51m 32s | Hits:  73%/117718
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 36m 03s | Avg: 18m 01s | Max: 19m 11s | Hits:  79%/2354  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  4h 05m | Avg: 16m 23s | Max: 44m 29s | Hits:  78%/17660 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 18m | Avg: 26m 12s | Max: 31m 22s | Hits:  76%/3534  
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 14h | Avg: 23m 43s | Max: 51m 32s | Hits:  73%/115364
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 36m 03s | Avg: 18m 01s | Max: 19m 11s | Hits:  79%/2354  
      🟩 nvcc               Pass: 100%/116 | Total:  1d 20h | Avg: 22m 50s | Max: 51m 32s | Hits:  74%/136558
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 00m | Avg: 20m 02s | Max: 28m 25s | Hits:  72%/7062  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 20m | Avg: 26m 51s | Max: 28m 54s | Hits:  65%/3531  
      🟩 Clang11            Pass: 100%/4   | Total:  1h 45m | Avg: 26m 29s | Max: 31m 35s | Hits:  65%/4708  
      🟩 Clang12            Pass: 100%/4   | Total:  1h 46m | Avg: 26m 40s | Max: 30m 27s | Hits:  65%/4708  
      🟩 Clang13            Pass: 100%/4   | Total:  1h 43m | Avg: 25m 54s | Max: 27m 55s | Hits:  65%/4708  
      🟩 Clang14            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 09s | Max: 28m 09s | Hits:  65%/4708  
      🟩 Clang15            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 00s | Max: 28m 13s | Hits:  65%/4708  
      🟩 Clang16            Pass: 100%/4   | Total:  1h 43m | Avg: 25m 49s | Max: 27m 48s | Hits:  65%/4708  
      🟩 Clang17            Pass: 100%/18  | Total:  4h 39m | Avg: 15m 31s | Max: 28m 22s | Hits:  85%/21186 
      🟩 GCC6               Pass: 100%/2   | Total: 26m 06s | Avg: 13m 03s | Max: 15m 35s | Hits:  79%/2354  
      🟩 GCC7               Pass: 100%/6   | Total:  1h 58m | Avg: 19m 41s | Max: 29m 14s | Hits:  72%/7068  
      🟩 GCC8               Pass: 100%/6   | Total:  2h 02m | Avg: 20m 25s | Max: 27m 16s | Hits:  72%/7068  
      🟩 GCC9               Pass: 100%/6   | Total:  2h 08m | Avg: 21m 22s | Max: 31m 28s | Hits:  72%/7068  
      🟩 GCC10              Pass: 100%/4   | Total:  1h 49m | Avg: 27m 24s | Max: 30m 52s | Hits:  65%/4712  
      🟩 GCC11              Pass: 100%/7   | Total:  3h 14m | Avg: 27m 44s | Max: 33m 06s | Hits:  70%/8246  
      🟩 GCC12              Pass: 100%/4   | Total:  1h 47m | Avg: 26m 48s | Max: 29m 03s | Hits:  65%/4712  
      🟩 GCC13              Pass: 100%/20  | Total:  5h 28m | Avg: 16m 25s | Max: 30m 25s | Hits:  81%/23560 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 40m | Avg: 33m 37s | Max: 40m 23s | Hits:  63%/3540  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 44m 29s | Avg: 44m 29s | Max: 44m 29s | Hits:  62%/1173  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 33m | Avg: 46m 45s | Max: 48m 27s | Hits:  70%/2346  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 23m | Avg: 33m 57s | Max: 51m 32s | Hits:  83%/7038  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 18h 28m | Avg: 21m 43s | Max: 31m 35s | Hits:  73%/60027 
      🟩 GCC                Pass: 100%/55  | Total: 18h 54m | Avg: 20m 37s | Max: 33m 06s | Hits:  74%/64788 
      🟩 Intel              Pass: 100%/3   | Total:  1h 40m | Avg: 33m 37s | Max: 40m 23s | Hits:  63%/3540  
      🟩 MSVC               Pass: 100%/9   | Total:  5h 41m | Avg: 37m 58s | Max: 51m 32s | Hits:  78%/10557 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  1d 20h | Avg: 22m 45s | Max: 51m 32s | Hits:  74%/138912
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 17h | Avg: 25m 05s | Max: 51m 32s | Hits:  69%/116553
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 45m | Avg:  9m 32s | Max: 20m 01s | Hits:  99%/12939 
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 37m | Avg: 12m 08s | Max: 13m 18s | Hits:  99%/9420  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 18m | Avg: 26m 12s | Max: 31m 22s | Hits:  76%/3534  
      🟩 90a                Pass: 100%/4   | Total: 59m 54s | Avg: 14m 58s | Max: 16m 54s | Hits:  65%/4712  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  8h 56m | Avg: 17m 52s | Max: 25m 34s | Hits:  76%/35328 
      🟩 14                 Pass: 100%/34  | Total: 14h 04m | Avg: 24m 49s | Max: 48m 27s | Hits:  71%/40020 
      🟩 17                 Pass: 100%/33  | Total: 13h 30m | Avg: 24m 32s | Max: 51m 32s | Hits:  74%/38847 
      🟩 20                 Pass: 100%/21  | Total:  8h 15m | Avg: 23m 35s | Max: 51m 20s | Hits:  75%/24717 
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Copy link
Contributor

github-actions bot commented Aug 6, 2024

🟩 CI finished in 15h 08m: Pass: 100%/250 | Total: 5d 11h | Avg: 31m 27s | Max: 1h 04m | Hits: 73%/250042
  • 🟩 cub: Pass: 100%/131 | Total: 3d 14h | Avg: 39m 26s | Max: 1h 04m | Hits: 73%/111130

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total:  3d 07h | Avg: 38m 53s | Max:  1h 04m | Hits:  73%/104194
      🟩 arm64              Pass: 100%/8   | Total:  6h 22m | Avg: 47m 49s | Max: 53m 08s | Hits:  68%/6936  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  9h 09m | Avg: 36m 39s | Max: 51m 17s | Hits:  69%/11792 
      🟩 11.8               Pass: 100%/3   | Total:  3h 04m | Avg:  1h 01m | Max:  1h 04m | Hits:  67%/2601  
      🟩 12.5               Pass: 100%/113 | Total:  3d 01h | Avg: 39m 13s | Max:  1h 02m | Hits:  74%/96737 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 30m 49s | Avg: 15m 24s | Max: 16m 57s | Hits:  79%/1436  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  9h 09m | Avg: 36m 39s | Max: 51m 17s | Hits:  69%/11792 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 04m | Avg:  1h 01m | Max:  1h 04m | Hits:  67%/2601  
      🟩 nvcc12.5           Pass: 100%/111 | Total:  3d 01h | Avg: 39m 38s | Max:  1h 02m | Hits:  74%/95301 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 30m 49s | Avg: 15m 24s | Max: 16m 57s | Hits:  79%/1436  
      🟩 nvcc               Pass: 100%/129 | Total:  3d 13h | Avg: 39m 48s | Max:  1h 04m | Hits:  73%/109694
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  4h 11m | Avg: 41m 50s | Max: 51m 28s | Hits:  65%/4980  
      🟩 Clang10            Pass: 100%/3   | Total:  2h 15m | Avg: 45m 05s | Max: 46m 47s | Hits:  68%/2607  
      🟩 Clang11            Pass: 100%/4   | Total:  2h 54m | Avg: 43m 32s | Max: 44m 39s | Hits:  68%/3476  
      🟩 Clang12            Pass: 100%/4   | Total:  2h 55m | Avg: 43m 58s | Max: 47m 02s | Hits:  68%/3476  
      🟩 Clang13            Pass: 100%/4   | Total:  2h 55m | Avg: 43m 52s | Max: 45m 10s | Hits:  68%/3476  
      🟩 Clang14            Pass: 100%/4   | Total:  3h 06m | Avg: 46m 36s | Max: 47m 37s | Hits:  68%/3476  
      🟩 Clang15            Pass: 100%/4   | Total:  3h 00m | Avg: 45m 13s | Max: 47m 15s | Hits:  68%/3468  
      🟩 Clang16            Pass: 100%/4   | Total:  3h 33m | Avg: 53m 23s | Max: 56m 47s | Hits:  59%/3468  
      🟩 Clang17            Pass: 100%/26  | Total: 12h 17m | Avg: 28m 22s | Max: 53m 14s | Hits:  87%/22244 
      🟩 GCC6               Pass: 100%/2   | Total:  1h 08m | Avg: 34m 15s | Max: 35m 27s | Hits:  70%/1582  
      🟩 GCC7               Pass: 100%/6   | Total:  4h 30m | Avg: 45m 00s | Max: 58m 05s | Hits:  62%/4983  
      🟩 GCC8               Pass: 100%/6   | Total:  4h 23m | Avg: 43m 58s | Max: 54m 39s | Hits:  65%/4983  
      🟩 GCC9               Pass: 100%/6   | Total:  4h 29m | Avg: 44m 57s | Max: 55m 58s | Hits:  64%/4983  
      🟩 GCC10              Pass: 100%/4   | Total:  3h 23m | Avg: 50m 51s | Max: 52m 06s | Hits:  58%/3476  
      🟩 GCC11              Pass: 100%/7   | Total:  6h 26m | Avg: 55m 09s | Max:  1h 04m | Hits:  63%/6069  
      🟩 GCC12              Pass: 100%/4   | Total:  3h 24m | Avg: 51m 02s | Max: 52m 26s | Hits:  58%/3468  
      🟩 GCC13              Pass: 100%/28  | Total: 13h 03m | Avg: 27m 59s | Max: 52m 29s | Hits:  83%/24276 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 19m | Avg: 46m 23s | Max: 49m 12s | Hits:  65%/2385  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 51m 17s | Avg: 51m 17s | Max: 51m 17s | Hits:  61%/709   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 02s | Max: 57m 11s | Hits:  66%/1418  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 01m | Avg:  1h 00m | Max:  1h 02m | Hits:  65%/2127  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total:  1d 13h | Avg: 37m 48s | Max: 56m 47s | Hits:  75%/50671 
      🟩 GCC                Pass: 100%/63  | Total:  1d 16h | Avg: 38m 52s | Max:  1h 04m | Hits:  72%/53820 
      🟩 Intel              Pass: 100%/3   | Total:  2h 19m | Avg: 46m 23s | Max: 49m 12s | Hits:  65%/2385  
      🟩 MSVC               Pass: 100%/6   | Total:  5h 47m | Avg: 57m 51s | Max:  1h 02m | Hits:  65%/4254  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total:  3d 14h | Avg: 39m 26s | Max:  1h 04m | Hits:  73%/111130
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  3d 03h | Avg: 46m 01s | Max:  1h 04m | Hits:  64%/83386 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 35m | Avg: 19m 27s | Max: 20m 44s | Hits:  99%/6936  
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 02m | Avg: 15m 22s | Max: 18m 16s | Hits:  99%/6936  
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 26m | Avg: 18m 16s | Max: 20m 28s | Hits:  99%/6936  
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 04m | Avg: 23m 03s | Max: 24m 47s | Hits:  99%/6936  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 04m | Avg:  1h 01m | Max:  1h 04m | Hits:  67%/2601  
      🟩 90a                Pass: 100%/4   | Total:  1h 30m | Avg: 22m 30s | Max: 23m 38s | Hits:  58%/3468  
    🟩 std
      🟩 11                 Pass: 100%/34  | Total: 22h 02m | Avg: 38m 53s | Max:  1h 04m | Hits:  74%/29049 
      🟩 14                 Pass: 100%/37  | Total:  1d 01h | Avg: 40m 39s | Max:  1h 01m | Hits:  70%/31176 
      🟩 17                 Pass: 100%/36  | Total:  1d 00h | Avg: 40m 12s | Max: 59m 04s | Hits:  73%/30394 
      🟩 20                 Pass: 100%/24  | Total: 14h 51m | Avg: 37m 09s | Max:  1h 02m | Hits:  77%/20511 
    
  • 🟩 thrust: Pass: 100%/118 | Total: 1d 20h | Avg: 22m 45s | Max: 51m 32s | Hits: 74%/138912

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 18h | Avg: 23m 01s | Max: 51m 32s | Hits:  73%/129492
      🟩 arm64              Pass: 100%/8   | Total:  2h 33m | Avg: 19m 10s | Max: 24m 17s | Hits:  78%/9420  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  4h 05m | Avg: 16m 23s | Max: 44m 29s | Hits:  78%/17660 
      🟩 11.8               Pass: 100%/3   | Total:  1h 18m | Avg: 26m 12s | Max: 31m 22s | Hits:  76%/3534  
      🟩 12.5               Pass: 100%/100 | Total:  1d 15h | Avg: 23m 36s | Max: 51m 32s | Hits:  73%/117718
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 36m 03s | Avg: 18m 01s | Max: 19m 11s | Hits:  79%/2354  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  4h 05m | Avg: 16m 23s | Max: 44m 29s | Hits:  78%/17660 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 18m | Avg: 26m 12s | Max: 31m 22s | Hits:  76%/3534  
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 14h | Avg: 23m 43s | Max: 51m 32s | Hits:  73%/115364
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 36m 03s | Avg: 18m 01s | Max: 19m 11s | Hits:  79%/2354  
      🟩 nvcc               Pass: 100%/116 | Total:  1d 20h | Avg: 22m 50s | Max: 51m 32s | Hits:  74%/136558
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 00m | Avg: 20m 02s | Max: 28m 25s | Hits:  72%/7062  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 20m | Avg: 26m 51s | Max: 28m 54s | Hits:  65%/3531  
      🟩 Clang11            Pass: 100%/4   | Total:  1h 45m | Avg: 26m 29s | Max: 31m 35s | Hits:  65%/4708  
      🟩 Clang12            Pass: 100%/4   | Total:  1h 46m | Avg: 26m 40s | Max: 30m 27s | Hits:  65%/4708  
      🟩 Clang13            Pass: 100%/4   | Total:  1h 43m | Avg: 25m 54s | Max: 27m 55s | Hits:  65%/4708  
      🟩 Clang14            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 09s | Max: 28m 09s | Hits:  65%/4708  
      🟩 Clang15            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 00s | Max: 28m 13s | Hits:  65%/4708  
      🟩 Clang16            Pass: 100%/4   | Total:  1h 43m | Avg: 25m 49s | Max: 27m 48s | Hits:  65%/4708  
      🟩 Clang17            Pass: 100%/18  | Total:  4h 39m | Avg: 15m 31s | Max: 28m 22s | Hits:  85%/21186 
      🟩 GCC6               Pass: 100%/2   | Total: 26m 06s | Avg: 13m 03s | Max: 15m 35s | Hits:  79%/2354  
      🟩 GCC7               Pass: 100%/6   | Total:  1h 58m | Avg: 19m 41s | Max: 29m 14s | Hits:  72%/7068  
      🟩 GCC8               Pass: 100%/6   | Total:  2h 02m | Avg: 20m 25s | Max: 27m 16s | Hits:  72%/7068  
      🟩 GCC9               Pass: 100%/6   | Total:  2h 08m | Avg: 21m 22s | Max: 31m 28s | Hits:  72%/7068  
      🟩 GCC10              Pass: 100%/4   | Total:  1h 49m | Avg: 27m 24s | Max: 30m 52s | Hits:  65%/4712  
      🟩 GCC11              Pass: 100%/7   | Total:  3h 14m | Avg: 27m 44s | Max: 33m 06s | Hits:  70%/8246  
      🟩 GCC12              Pass: 100%/4   | Total:  1h 47m | Avg: 26m 48s | Max: 29m 03s | Hits:  65%/4712  
      🟩 GCC13              Pass: 100%/20  | Total:  5h 28m | Avg: 16m 25s | Max: 30m 25s | Hits:  81%/23560 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 40m | Avg: 33m 37s | Max: 40m 23s | Hits:  63%/3540  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 44m 29s | Avg: 44m 29s | Max: 44m 29s | Hits:  62%/1173  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 33m | Avg: 46m 45s | Max: 48m 27s | Hits:  70%/2346  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 23m | Avg: 33m 57s | Max: 51m 32s | Hits:  83%/7038  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 18h 28m | Avg: 21m 43s | Max: 31m 35s | Hits:  73%/60027 
      🟩 GCC                Pass: 100%/55  | Total: 18h 54m | Avg: 20m 37s | Max: 33m 06s | Hits:  74%/64788 
      🟩 Intel              Pass: 100%/3   | Total:  1h 40m | Avg: 33m 37s | Max: 40m 23s | Hits:  63%/3540  
      🟩 MSVC               Pass: 100%/9   | Total:  5h 41m | Avg: 37m 58s | Max: 51m 32s | Hits:  78%/10557 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  1d 20h | Avg: 22m 45s | Max: 51m 32s | Hits:  74%/138912
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 17h | Avg: 25m 05s | Max: 51m 32s | Hits:  69%/116553
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 45m | Avg:  9m 32s | Max: 20m 01s | Hits:  99%/12939 
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 37m | Avg: 12m 08s | Max: 13m 18s | Hits:  99%/9420  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 18m | Avg: 26m 12s | Max: 31m 22s | Hits:  76%/3534  
      🟩 90a                Pass: 100%/4   | Total: 59m 54s | Avg: 14m 58s | Max: 16m 54s | Hits:  65%/4712  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  8h 56m | Avg: 17m 52s | Max: 25m 34s | Hits:  76%/35328 
      🟩 14                 Pass: 100%/34  | Total: 14h 04m | Avg: 24m 49s | Max: 48m 27s | Hits:  71%/40020 
      🟩 17                 Pass: 100%/33  | Total: 13h 30m | Avg: 24m 32s | Max: 51m 32s | Hits:  74%/38847 
      🟩 20                 Pass: 100%/21  | Total:  8h 15m | Avg: 23m 35s | Max: 51m 20s | Hits:  75%/24717 
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Copy link
Contributor

🟩 CI finished in 7h 17m: Pass: 100%/250 | Total: 6d 00h | Avg: 34m 42s | Max: 1h 05m | Hits: 67%/17283
  • 🟩 cub: Pass: 100%/131 | Total: 3d 22h | Avg: 43m 08s | Max: 1h 05m | Hits: 55%/4278

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total:  3d 15h | Avg: 42m 27s | Max:  1h 05m | Hits:  55%/4278  
      🟩 arm64              Pass: 100%/8   | Total:  7h 08m | Avg: 53m 34s | Max: 54m 46s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total: 11h 06m | Avg: 44m 25s | Max: 51m 06s | Hits:  55%/713   
      🟩 11.8               Pass: 100%/3   | Total:  3h 16m | Avg:  1h 05m | Max:  1h 05m
      🟩 12.5               Pass: 100%/113 | Total:  3d 07h | Avg: 42m 22s | Max:  1h 05m | Hits:  55%/3565  
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 45m 41s | Avg: 22m 50s | Max: 23m 56s
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 06m | Avg: 44m 25s | Max: 51m 06s | Hits:  55%/713   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 16m | Avg:  1h 05m | Max:  1h 05m
      🟩 nvcc12.5           Pass: 100%/111 | Total:  3d 07h | Avg: 42m 43s | Max:  1h 05m | Hits:  55%/3565  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 45m 41s | Avg: 22m 50s | Max: 23m 56s
      🟩 nvcc               Pass: 100%/129 | Total:  3d 21h | Avg: 43m 27s | Max:  1h 05m | Hits:  55%/4278  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  4h 40m | Avg: 46m 47s | Max: 51m 44s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 46m | Avg: 55m 21s | Max: 57m 23s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 30m | Avg: 52m 39s | Max: 56m 17s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 26m | Avg: 51m 43s | Max: 55m 51s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 29m | Avg: 52m 23s | Max: 55m 31s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 27m | Avg: 51m 53s | Max: 53m 58s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 27m | Avg: 51m 58s | Max: 54m 25s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 26m | Avg: 51m 34s | Max: 53m 47s
      🟩 Clang17            Pass: 100%/26  | Total: 13h 08m | Avg: 30m 19s | Max: 57m 19s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 32m | Avg: 46m 25s | Max: 47m 37s
      🟩 GCC7               Pass: 100%/6   | Total:  4h 45m | Avg: 47m 30s | Max: 53m 50s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 47m | Avg: 47m 56s | Max: 53m 53s
      🟩 GCC9               Pass: 100%/6   | Total:  5h 01m | Avg: 50m 11s | Max: 56m 49s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 26m | Avg: 51m 34s | Max: 53m 20s
      🟩 GCC11              Pass: 100%/7   | Total:  6h 41m | Avg: 57m 20s | Max:  1h 05m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 31m | Avg: 52m 51s | Max: 55m 54s
      🟩 GCC13              Pass: 100%/28  | Total: 14h 16m | Avg: 30m 35s | Max: 55m 17s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 42m | Avg: 54m 16s | Max: 56m 20s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 51m 06s | Avg: 51m 06s | Max: 51m 06s | Hits:  55%/713   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 58m | Avg: 59m 12s | Max: 59m 17s | Hits:  55%/1426  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 13m | Avg:  1h 04m | Max:  1h 05m | Hits:  55%/2139  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total:  1d 17h | Avg: 42m 05s | Max: 57m 23s
      🟩 GCC                Pass: 100%/63  | Total:  1d 20h | Avg: 41m 56s | Max:  1h 05m
      🟩 Intel              Pass: 100%/3   | Total:  2h 42m | Avg: 54m 16s | Max: 56m 20s
      🟩 MSVC               Pass: 100%/6   | Total:  6h 02m | Avg:  1h 00m | Max:  1h 05m | Hits:  55%/4278  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total:  3d 22h | Avg: 43m 08s | Max:  1h 05m | Hits:  55%/4278  
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  3d 11h | Avg: 50m 24s | Max:  1h 05m | Hits:  55%/4278  
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 37m | Avg: 19m 42s | Max: 21m 12s
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 11m | Avg: 16m 24s | Max: 18m 36s
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 35m | Avg: 19m 25s | Max: 21m 09s
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 37m | Avg: 27m 11s | Max: 31m 15s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 16m | Avg:  1h 05m | Max:  1h 05m
      🟩 90a                Pass: 100%/4   | Total:  1h 28m | Avg: 22m 10s | Max: 22m 41s
    🟩 std
      🟩 11                 Pass: 100%/34  | Total:  1d 00h | Avg: 42m 42s | Max:  1h 05m
      🟩 14                 Pass: 100%/37  | Total:  1d 03h | Avg: 44m 46s | Max:  1h 05m | Hits:  55%/2139  
      🟩 17                 Pass: 100%/36  | Total:  1d 02h | Avg: 43m 39s | Max:  1h 05m | Hits:  55%/1426  
      🟩 20                 Pass: 100%/24  | Total: 16h 11m | Avg: 40m 27s | Max:  1h 02m | Hits:  55%/713   
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 02h | Avg: 25m 33s | Max: 1h 00m | Hits: 71%/13005

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 22h | Avg: 25m 31s | Max:  1h 00m | Hits:  71%/13005 
      🟩 arm64              Pass: 100%/8   | Total:  3h 27m | Avg: 25m 53s | Max: 28m 42s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 23m | Avg: 25m 32s | Max: 48m 02s | Hits:  57%/1445  
      🟩 11.8               Pass: 100%/3   | Total:  1h 40m | Avg: 33m 24s | Max: 37m 06s
      🟩 12.5               Pass: 100%/100 | Total:  1d 18h | Avg: 25m 18s | Max:  1h 00m | Hits:  73%/11560 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 51m 28s | Avg: 25m 44s | Max: 27m 35s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 23m | Avg: 25m 32s | Max: 48m 02s | Hits:  57%/1445  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 40m | Avg: 33m 24s | Max: 37m 06s
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 17h | Avg: 25m 18s | Max:  1h 00m | Hits:  73%/11560 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 51m 28s | Avg: 25m 44s | Max: 27m 35s
      🟩 nvcc               Pass: 100%/116 | Total:  2d 01h | Avg: 25m 32s | Max:  1h 00m | Hits:  71%/13005 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 27m | Avg: 24m 32s | Max: 30m 03s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 21m | Avg: 27m 19s | Max: 29m 25s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 49m | Avg: 27m 23s | Max: 30m 09s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 47m | Avg: 26m 52s | Max: 28m 43s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 48m | Avg: 27m 07s | Max: 30m 37s
      🟩 Clang14            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 03s | Max: 28m 03s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 45m | Avg: 26m 23s | Max: 28m 47s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 50m | Avg: 27m 37s | Max: 30m 23s
      🟩 Clang17            Pass: 100%/18  | Total:  5h 37m | Avg: 18m 44s | Max: 31m 33s
      🟩 GCC6               Pass: 100%/2   | Total: 45m 06s | Avg: 22m 33s | Max: 24m 04s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 31m | Avg: 25m 18s | Max: 31m 06s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 38m | Avg: 26m 27s | Max: 30m 39s
      🟩 GCC9               Pass: 100%/6   | Total:  2h 39m | Avg: 26m 31s | Max: 30m 57s
      🟩 GCC10              Pass: 100%/4   | Total:  1h 54m | Avg: 28m 33s | Max: 32m 26s
      🟩 GCC11              Pass: 100%/7   | Total:  3h 39m | Avg: 31m 20s | Max: 37m 06s
      🟩 GCC12              Pass: 100%/4   | Total:  1h 55m | Avg: 28m 47s | Max: 32m 13s
      🟩 GCC13              Pass: 100%/20  | Total:  5h 55m | Avg: 17m 46s | Max: 31m 03s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 42m | Avg: 34m 13s | Max: 40m 37s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 48m 02s | Avg: 48m 02s | Max: 48m 02s | Hits:  57%/1445  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 44m | Avg: 52m 23s | Max: 54m 56s | Hits:  57%/2890  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 48m | Avg: 38m 02s | Max:  1h 00m | Hits:  78%/8670  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 20h 12m | Avg: 23m 46s | Max: 31m 33s
      🟩 GCC                Pass: 100%/55  | Total: 21h 59m | Avg: 23m 58s | Max: 37m 06s
      🟩 Intel              Pass: 100%/3   | Total:  1h 42m | Avg: 34m 13s | Max: 40m 37s
      🟩 MSVC               Pass: 100%/9   | Total:  6h 21m | Avg: 42m 20s | Max:  1h 00m | Hits:  71%/13005 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 02h | Avg: 25m 33s | Max:  1h 00m | Hits:  71%/13005 
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 22h | Avg: 28m 22s | Max:  1h 00m | Hits:  57%/8670  
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 48m | Avg:  9m 49s | Max: 21m 14s | Hits:  99%/4335  
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 37m | Avg: 12m 08s | Max: 13m 27s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 40m | Avg: 33m 24s | Max: 37m 06s
      🟩 90a                Pass: 100%/4   | Total:  1h 04m | Avg: 16m 06s | Max: 18m 27s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 10h 18m | Avg: 20m 37s | Max: 27m 47s
      🟩 14                 Pass: 100%/34  | Total: 15h 17m | Avg: 26m 59s | Max: 51m 19s | Hits:  68%/5780  
      🟩 17                 Pass: 100%/33  | Total: 15h 31m | Avg: 28m 13s | Max:  1h 00m | Hits:  71%/4335  
      🟩 20                 Pass: 100%/21  | Total:  9h 07m | Avg: 26m 02s | Max: 59m 41s | Hits:  78%/2890  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 50s | Avg: 11m 50s | Max: 11m 50s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

thrust/thrust/system/cuda/detail/scan.h Show resolved Hide resolved
CUB_RUNTIME_FUNCTION static cudaError_t InclusiveSum(
void* d_temp_storage, size_t& temp_storage_bytes, IteratorT d_data, int num_items, cudaStream_t stream = 0)
void* d_temp_storage, size_t& temp_storage_bytes, IteratorT d_data, NumItemsT num_items, cudaStream_t stream = 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: when this PR is merged, it'd be good to consider contributing a replacement for:

https://github.com/pytorch/pytorch/blob/ae000635700e78161e0ed1a18f62b5db4030e343/aten/src/ATen/cuda/cub.cuh#L251

cub/cub/device/device_scan.cuh Show resolved Hide resolved
Copy link
Contributor

🟨 CI finished in 5h 21m: Pass: 90%/250 | Total: 6d 04h | Avg: 35m 35s | Max: 1h 28m | Hits: 53%/17283
  • 🟨 cub: Pass: 85%/131 | Total: 3d 17h | Avg: 40m 56s | Max: 1h 28m | Hits: 64%/4278

    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 44m 24s | Avg: 22m 12s | Max: 23m 04s
      🔍 nvcc               Pass:  85%/129 | Total:  3d 16h | Avg: 41m 14s | Max:  1h 28m | Hits:  64%/4278  
    🔍 sm: 90a 🔍
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 26m | Avg:  1h 08m | Max:  1h 10m
      🔍 90a                Pass:  50%/4   | Total:  1h 49m | Avg: 27m 27s | Max: 32m 29s
    🟨 ctk
      🟨 11.1               Pass:  53%/15  | Total:  9h 50m | Avg: 39m 22s | Max: 56m 26s | Hits:  64%/713   
      🟩 11.8               Pass: 100%/3   | Total:  3h 26m | Avg:  1h 08m | Max:  1h 10m
      🟨 12.5               Pass:  89%/113 | Total:  3d 04h | Avg: 40m 25s | Max:  1h 28m | Hits:  64%/3565  
    🟨 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 44m 24s | Avg: 22m 12s | Max: 23m 04s
      🟨 nvcc11.1           Pass:  53%/15  | Total:  9h 50m | Avg: 39m 22s | Max: 56m 26s | Hits:  64%/713   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 26m | Avg:  1h 08m | Max:  1h 10m
      🟨 nvcc12.5           Pass:  89%/111 | Total:  3d 03h | Avg: 40m 44s | Max:  1h 28m | Hits:  64%/3565  
    🟨 cxx
      🟨 Clang9             Pass:  83%/6   | Total:  4h 43m | Avg: 47m 18s | Max:  1h 01m
      🟩 Clang10            Pass: 100%/3   | Total:  2h 38m | Avg: 52m 57s | Max: 56m 07s
      🟨 Clang11            Pass:  75%/4   | Total:  3h 02m | Avg: 45m 40s | Max: 50m 50s
      🟨 Clang12            Pass:  75%/4   | Total:  3h 04m | Avg: 46m 05s | Max: 51m 58s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 17m | Avg: 49m 28s | Max: 51m 02s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 21m | Avg: 50m 22s | Max: 53m 43s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 23m | Avg: 50m 45s | Max: 54m 04s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 22m | Avg: 50m 43s | Max: 52m 35s
      🟩 Clang17            Pass: 100%/26  | Total: 12h 36m | Avg: 29m 05s | Max: 54m 59s
      🟨 GCC6               Pass:  50%/2   | Total:  1h 15m | Avg: 37m 38s | Max: 42m 40s
      🟨 GCC7               Pass:  66%/6   | Total:  4h 24m | Avg: 44m 01s | Max: 53m 56s
      🟨 GCC8               Pass:  83%/6   | Total:  4h 28m | Avg: 44m 48s | Max: 49m 58s
      🟨 GCC9               Pass:  50%/6   | Total:  4h 02m | Avg: 40m 21s | Max: 51m 59s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 22m | Avg: 50m 36s | Max: 51m 14s
      🟨 GCC11              Pass:  71%/7   | Total:  6h 16m | Avg: 53m 45s | Max:  1h 10m
      🟨 GCC12              Pass:  75%/4   | Total:  3h 08m | Avg: 47m 02s | Max: 52m 50s
      🟨 GCC13              Pass:  82%/28  | Total: 14h 24m | Avg: 30m 52s | Max:  1h 28m
      🟨 Intel2023.2.0      Pass:  66%/3   | Total:  2h 23m | Avg: 47m 58s | Max: 57m 47s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 56m 26s | Avg: 56m 26s | Max: 56m 26s | Hits:  64%/713   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 06m | Hits:  64%/1426  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 06m | Avg:  1h 02m | Max:  1h 05m | Hits:  64%/2139  
    🟨 cxx_family
      🟨 Clang              Pass:  94%/59  | Total:  1d 15h | Avg: 40m 11s | Max:  1h 01m
      🟨 GCC                Pass:  76%/63  | Total:  1d 17h | Avg: 39m 23s | Max:  1h 28m
      🟨 Intel              Pass:  66%/3   | Total:  2h 23m | Avg: 47m 58s | Max: 57m 47s
      🟩 MSVC               Pass: 100%/6   | Total:  6h 07m | Avg:  1h 01m | Max:  1h 06m | Hits:  64%/4278  
    🟨 jobs
      🟨 Build              Pass:  81%/99  | Total:  3d 06h | Avg: 47m 27s | Max:  1h 10m | Hits:  64%/4278  
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 26m | Avg: 18m 19s | Max: 24m 30s
      🟩 GraphCapture       Pass: 100%/8   | Total:  1h 57m | Avg: 14m 39s | Max: 16m 43s
      🟨 HostLaunch         Pass:  87%/8   | Total:  3h 14m | Avg: 24m 16s | Max:  1h 28m
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 27m | Avg: 25m 56s | Max: 31m 15s
    🟨 gpu
      🟨 v100               Pass:  85%/131 | Total:  3d 17h | Avg: 40m 56s | Max:  1h 28m | Hits:  64%/4278  
    🟨 cpu
      🟨 amd64              Pass:  86%/123 | Total:  3d 10h | Avg: 40m 25s | Max:  1h 28m | Hits:  64%/4278  
      🟨 arm64              Pass:  75%/8   | Total:  6h 31m | Avg: 48m 55s | Max: 56m 35s
    🟨 std
      🟨 11                 Pass:  91%/34  | Total:  1d 00h | Avg: 42m 51s | Max:  1h 28m
      🟨 14                 Pass:  81%/37  | Total:  1d 01h | Avg: 40m 34s | Max:  1h 06m | Hits:  64%/2139  
      🟨 17                 Pass:  88%/36  | Total:  1d 01h | Avg: 42m 31s | Max:  1h 10m | Hits:  64%/1426  
      🟨 20                 Pass:  79%/24  | Total: 14h 35m | Avg: 36m 28s | Max:  1h 01m | Hits:  64%/713   
    
  • 🟨 thrust: Pass: 95%/118 | Total: 2d 10h | Avg: 29m 50s | Max: 1h 11m | Hits: 49%/13005

    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 04m | Avg: 32m 05s | Max: 32m 19s
      🔍 nvcc               Pass:  95%/116 | Total:  2d 09h | Avg: 29m 48s | Max:  1h 11m | Hits:  49%/13005 
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  94%/99  | Total:  2d 06h | Avg: 33m 15s | Max:  1h 11m | Hits:  24%/8670  
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 50m | Avg: 10m 04s | Max: 19m 53s | Hits:  99%/4335  
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 58m | Avg: 14m 46s | Max: 16m 02s
    🟨 ctk
      🟨 11.1               Pass:  86%/15  | Total:  7h 47m | Avg: 31m 10s | Max: 59m 08s | Hits:  24%/1445  
      🟩 11.8               Pass: 100%/3   | Total:  2h 01m | Avg: 40m 37s | Max: 43m 35s
      🟨 12.5               Pass:  97%/100 | Total:  2d 00h | Avg: 29m 19s | Max:  1h 11m | Hits:  52%/11560 
    🟨 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  1h 04m | Avg: 32m 05s | Max: 32m 19s
      🟨 nvcc11.1           Pass:  86%/15  | Total:  7h 47m | Avg: 31m 10s | Max: 59m 08s | Hits:  24%/1445  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 01m | Avg: 40m 37s | Max: 43m 35s
      🟨 nvcc12.5           Pass:  96%/98  | Total:  1d 23h | Avg: 29m 16s | Max:  1h 11m | Hits:  52%/11560 
    🟨 cxx
      🟨 Clang9             Pass:  66%/6   | Total:  3h 05m | Avg: 30m 55s | Max: 35m 14s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 31m | Avg: 30m 34s | Max: 33m 37s
      🟩 Clang11            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 48s | Max: 33m 31s
      🟩 Clang12            Pass: 100%/4   | Total:  2h 05m | Avg: 31m 17s | Max: 34m 55s
      🟩 Clang13            Pass: 100%/4   | Total:  2h 08m | Avg: 32m 02s | Max: 33m 22s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 05m | Avg: 31m 21s | Max: 35m 56s
      🟩 Clang15            Pass: 100%/4   | Total:  2h 09m | Avg: 32m 29s | Max: 37m 21s
      🟩 Clang16            Pass: 100%/4   | Total:  2h 06m | Avg: 31m 42s | Max: 34m 35s
      🟨 Clang17            Pass:  94%/18  | Total:  6h 33m | Avg: 21m 50s | Max: 34m 11s
      🟩 GCC6               Pass: 100%/2   | Total: 56m 41s | Avg: 28m 20s | Max: 31m 20s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 56m | Avg: 29m 23s | Max: 33m 34s
      🟩 GCC8               Pass: 100%/6   | Total:  3h 01m | Avg: 30m 17s | Max: 33m 25s
      🟩 GCC9               Pass: 100%/6   | Total:  3h 02m | Avg: 30m 29s | Max: 34m 04s
      🟩 GCC10              Pass: 100%/4   | Total:  2h 13m | Avg: 33m 15s | Max: 36m 59s
      🟩 GCC11              Pass: 100%/7   | Total:  3h 59m | Avg: 34m 15s | Max: 43m 35s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 16m | Avg: 34m 07s | Max: 37m 12s
      🟨 GCC13              Pass:  90%/20  | Total:  6h 58m | Avg: 20m 54s | Max: 36m 10s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 04m | Avg: 41m 29s | Max: 45m 20s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 59m 08s | Avg: 59m 08s | Max: 59m 08s | Hits:  24%/1445  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 07m | Hits:  24%/2890  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  4h 18m | Avg: 43m 06s | Max:  1h 11m | Hits:  61%/8670  
    🟨 cxx_family
      🟨 Clang              Pass:  94%/51  | Total: 23h 49m | Avg: 28m 01s | Max: 37m 21s
      🟨 GCC                Pass:  96%/55  | Total:  1d 01h | Avg: 27m 43s | Max: 43m 35s
      🟩 Intel              Pass: 100%/3   | Total:  2h 04m | Avg: 41m 29s | Max: 45m 20s
      🟩 MSVC               Pass: 100%/9   | Total:  7h 22m | Avg: 49m 13s | Max:  1h 11m | Hits:  49%/13005 
    🟨 gpu
      🟨 v100               Pass:  95%/118 | Total:  2d 10h | Avg: 29m 50s | Max:  1h 11m | Hits:  49%/13005 
    🟨 cpu
      🟨 amd64              Pass:  98%/110 | Total:  2d 06h | Avg: 29m 48s | Max:  1h 11m | Hits:  49%/13005 
      🟨 arm64              Pass:  62%/8   | Total:  4h 03m | Avg: 30m 22s | Max: 32m 28s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 01m | Avg: 40m 37s | Max: 43m 35s
      🟩 90a                Pass: 100%/4   | Total:  1h 22m | Avg: 20m 44s | Max: 23m 34s
    🟨 std
      🟨 11                 Pass:  96%/30  | Total: 12h 29m | Avg: 24m 59s | Max: 37m 29s
      🟨 14                 Pass:  94%/34  | Total: 18h 02m | Avg: 31m 50s | Max:  1h 06m | Hits:  43%/5780  
      🟨 17                 Pass:  96%/33  | Total: 17h 38m | Avg: 32m 05s | Max:  1h 07m | Hits:  49%/4335  
      🟨 20                 Pass:  95%/21  | Total: 10h 30m | Avg: 30m 01s | Max:  1h 11m | Hits:  61%/2890  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Copy link
Contributor

🟩 CI finished in 11h 08m: Pass: 100%/250 | Total: 6d 03h | Avg: 35m 19s | Max: 1h 28m | Hits: 53%/17283
  • 🟩 cub: Pass: 100%/131 | Total: 3d 18h | Avg: 41m 15s | Max: 1h 28m | Hits: 64%/4278

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total:  3d 11h | Avg: 40m 32s | Max:  1h 28m | Hits:  64%/4278  
      🟩 arm64              Pass: 100%/8   | Total:  6h 59m | Avg: 52m 25s | Max: 56m 35s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  9h 46m | Avg: 39m 07s | Max: 56m 26s | Hits:  64%/713   
      🟩 11.8               Pass: 100%/3   | Total:  3h 26m | Avg:  1h 08m | Max:  1h 10m
      🟩 12.5               Pass: 100%/113 | Total:  3d 04h | Avg: 40m 48s | Max:  1h 28m | Hits:  64%/3565  
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 44m 24s | Avg: 22m 12s | Max: 23m 04s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  9h 46m | Avg: 39m 07s | Max: 56m 26s | Hits:  64%/713   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 26m | Avg:  1h 08m | Max:  1h 10m
      🟩 nvcc12.5           Pass: 100%/111 | Total:  3d 04h | Avg: 41m 08s | Max:  1h 28m | Hits:  64%/3565  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 44m 24s | Avg: 22m 12s | Max: 23m 04s
      🟩 nvcc               Pass: 100%/129 | Total:  3d 17h | Avg: 41m 33s | Max:  1h 28m | Hits:  64%/4278  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  4h 42m | Avg: 47m 01s | Max:  1h 01m
      🟩 Clang10            Pass: 100%/3   | Total:  2h 38m | Avg: 52m 57s | Max: 56m 07s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 10m | Avg: 47m 32s | Max: 50m 50s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 09m | Avg: 47m 26s | Max: 51m 58s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 17m | Avg: 49m 28s | Max: 51m 02s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 21m | Avg: 50m 22s | Max: 53m 43s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 23m | Avg: 50m 45s | Max: 54m 04s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 22m | Avg: 50m 43s | Max: 52m 35s
      🟩 Clang17            Pass: 100%/26  | Total: 12h 36m | Avg: 29m 05s | Max: 54m 59s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 13m | Avg: 36m 40s | Max: 42m 40s
      🟩 GCC7               Pass: 100%/6   | Total:  4h 22m | Avg: 43m 47s | Max: 53m 56s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 28m | Avg: 44m 48s | Max: 49m 58s
      🟩 GCC9               Pass: 100%/6   | Total:  4h 08m | Avg: 41m 22s | Max: 51m 59s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 22m | Avg: 50m 36s | Max: 51m 14s
      🟩 GCC11              Pass: 100%/7   | Total:  6h 26m | Avg: 55m 10s | Max:  1h 10m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 13m | Avg: 48m 25s | Max: 52m 50s
      🟩 GCC13              Pass: 100%/28  | Total: 14h 26m | Avg: 30m 55s | Max:  1h 28m
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 34m | Avg: 51m 20s | Max: 57m 47s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 56m 26s | Avg: 56m 26s | Max: 56m 26s | Hits:  64%/713   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 06m | Hits:  64%/1426  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 06m | Avg:  1h 02m | Max:  1h 05m | Hits:  64%/2139  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total:  1d 15h | Avg: 40m 22s | Max:  1h 01m
      🟩 GCC                Pass: 100%/63  | Total:  1d 17h | Avg: 39m 42s | Max:  1h 28m
      🟩 Intel              Pass: 100%/3   | Total:  2h 34m | Avg: 51m 20s | Max: 57m 47s
      🟩 MSVC               Pass: 100%/6   | Total:  6h 07m | Avg:  1h 01m | Max:  1h 06m | Hits:  64%/4278  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total:  3d 18h | Avg: 41m 15s | Max:  1h 28m | Hits:  64%/4278  
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  3d 06h | Avg: 47m 45s | Max:  1h 10m | Hits:  64%/4278  
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 26m | Avg: 18m 19s | Max: 24m 30s
      🟩 GraphCapture       Pass: 100%/8   | Total:  1h 57m | Avg: 14m 39s | Max: 16m 43s
      🟩 HostLaunch         Pass: 100%/8   | Total:  3h 25m | Avg: 25m 43s | Max:  1h 28m
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 27m | Avg: 25m 56s | Max: 31m 15s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 26m | Avg:  1h 08m | Max:  1h 10m
      🟩 90a                Pass: 100%/4   | Total:  1h 11m | Avg: 17m 58s | Max: 23m 19s
    🟩 std
      🟩 11                 Pass: 100%/34  | Total:  1d 00h | Avg: 43m 12s | Max:  1h 28m
      🟩 14                 Pass: 100%/37  | Total:  1d 01h | Avg: 41m 36s | Max:  1h 06m | Hits:  64%/2139  
      🟩 17                 Pass: 100%/36  | Total:  1d 01h | Avg: 41m 57s | Max:  1h 10m | Hits:  64%/1426  
      🟩 20                 Pass: 100%/24  | Total: 14h 46m | Avg: 36m 55s | Max:  1h 01m | Hits:  64%/713   
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 08h | Avg: 28m 56s | Max: 1h 11m | Hits: 49%/13005

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  2d 05h | Avg: 29m 20s | Max:  1h 11m | Hits:  49%/13005 
      🟩 arm64              Pass: 100%/8   | Total:  3h 06m | Avg: 23m 16s | Max: 32m 28s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 57m | Avg: 27m 49s | Max: 59m 08s | Hits:  24%/1445  
      🟩 11.8               Pass: 100%/3   | Total:  2h 01m | Avg: 40m 37s | Max: 43m 35s
      🟩 12.5               Pass: 100%/100 | Total:  1d 23h | Avg: 28m 45s | Max:  1h 11m | Hits:  52%/11560 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  1h 04m | Avg: 32m 05s | Max: 32m 19s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 57m | Avg: 27m 49s | Max: 59m 08s | Hits:  24%/1445  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 01m | Avg: 40m 37s | Max: 43m 35s
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 22h | Avg: 28m 41s | Max:  1h 11m | Hits:  52%/11560 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 04m | Avg: 32m 05s | Max: 32m 19s
      🟩 nvcc               Pass: 100%/116 | Total:  2d 07h | Avg: 28m 53s | Max:  1h 11m | Hits:  49%/13005 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 15m | Avg: 22m 31s | Max: 35m 14s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 31m | Avg: 30m 34s | Max: 33m 37s
      🟩 Clang11            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 48s | Max: 33m 31s
      🟩 Clang12            Pass: 100%/4   | Total:  2h 05m | Avg: 31m 17s | Max: 34m 55s
      🟩 Clang13            Pass: 100%/4   | Total:  2h 08m | Avg: 32m 02s | Max: 33m 22s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 05m | Avg: 31m 21s | Max: 35m 56s
      🟩 Clang15            Pass: 100%/4   | Total:  2h 09m | Avg: 32m 29s | Max: 37m 21s
      🟩 Clang16            Pass: 100%/4   | Total:  2h 06m | Avg: 31m 42s | Max: 34m 35s
      🟩 Clang17            Pass: 100%/18  | Total:  6h 12m | Avg: 20m 40s | Max: 34m 11s
      🟩 GCC6               Pass: 100%/2   | Total: 56m 41s | Avg: 28m 20s | Max: 31m 20s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 56m | Avg: 29m 23s | Max: 33m 34s
      🟩 GCC8               Pass: 100%/6   | Total:  3h 01m | Avg: 30m 17s | Max: 33m 25s
      🟩 GCC9               Pass: 100%/6   | Total:  3h 02m | Avg: 30m 29s | Max: 34m 04s
      🟩 GCC10              Pass: 100%/4   | Total:  2h 13m | Avg: 33m 15s | Max: 36m 59s
      🟩 GCC11              Pass: 100%/7   | Total:  3h 59m | Avg: 34m 15s | Max: 43m 35s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 16m | Avg: 34m 07s | Max: 37m 12s
      🟩 GCC13              Pass: 100%/20  | Total:  6h 22m | Avg: 19m 07s | Max: 36m 10s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 04m | Avg: 41m 29s | Max: 45m 20s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 59m 08s | Avg: 59m 08s | Max: 59m 08s | Hits:  24%/1445  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 07m | Hits:  24%/2890  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  4h 18m | Avg: 43m 06s | Max:  1h 11m | Hits:  61%/8670  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 22h 37m | Avg: 26m 37s | Max: 37m 21s
      🟩 GCC                Pass: 100%/55  | Total:  1d 00h | Avg: 27m 04s | Max: 43m 35s
      🟩 Intel              Pass: 100%/3   | Total:  2h 04m | Avg: 41m 29s | Max: 45m 20s
      🟩 MSVC               Pass: 100%/9   | Total:  7h 22m | Avg: 49m 13s | Max:  1h 11m | Hits:  49%/13005 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 08h | Avg: 28m 56s | Max:  1h 11m | Hits:  49%/13005 
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  2d 05h | Avg: 32m 10s | Max:  1h 11m | Hits:  24%/8670  
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 50m | Avg: 10m 04s | Max: 19m 53s | Hits:  99%/4335  
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 58m | Avg: 14m 46s | Max: 16m 02s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 01m | Avg: 40m 37s | Max: 43m 35s
      🟩 90a                Pass: 100%/4   | Total:  1h 22m | Avg: 20m 44s | Max: 23m 34s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 12h 01m | Avg: 24m 02s | Max: 37m 29s
      🟩 14                 Pass: 100%/34  | Total: 17h 22m | Avg: 30m 40s | Max:  1h 06m | Hits:  43%/5780  
      🟩 17                 Pass: 100%/33  | Total: 17h 16m | Avg: 31m 25s | Max:  1h 07m | Hits:  49%/4335  
      🟩 20                 Pass: 100%/21  | Total: 10h 13m | Avg: 29m 12s | Max:  1h 11m | Hits:  61%/2890  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@elstehle elstehle merged commit 06e334f into NVIDIA:main Aug 21, 2024
258 of 263 checks passed
pciolkosz pushed a commit to pciolkosz/cccl that referenced this pull request Aug 21, 2024
* make DeviceScan offset type a template parameter

* updates tests to use device interface

* moves thrust scan to unsigned offset types

* adjusts benchmarks to account for used offset types

* uses dynamic dispatch to unsigned type

* adds tparam docs for NumItemsT

* fixes warning about different signedness comparison

* adds check for negative num_items in thrust::scan

* fixes unused param in is_negative
pciolkosz pushed a commit to pciolkosz/cccl that referenced this pull request Aug 23, 2024
* make DeviceScan offset type a template parameter

* updates tests to use device interface

* moves thrust scan to unsigned offset types

* adjusts benchmarks to account for used offset types

* uses dynamic dispatch to unsigned type

* adds tparam docs for NumItemsT

* fixes warning about different signedness comparison

* adds check for negative num_items in thrust::scan

* fixes unused param in is_negative
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Add support for large num_items to device_scan.cuh
3 participants