Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the thrust dispatch mechanisms configurable #2310

Merged
merged 13 commits into from
Aug 30, 2024

Conversation

miscco
Copy link
Collaborator

@miscco miscco commented Aug 28, 2024

The current dispatch mechanisms trades compile time and binary size for performance and flexibility.

Allow users to tune that depending on their needs

Fixes #1958

The current dispatch mechanisms trades compile time and binary size for performance and flexibility.

Allow users to tune that depending on their needs

Fixes NVIDIA#1958
@miscco miscco requested review from a team as code owners August 28, 2024 14:10
@miscco miscco self-assigned this Aug 28, 2024
@miscco miscco added feature request New feature or request. thrust For all items related to Thrust. labels Aug 28, 2024
@miscco miscco requested review from elstehle and gevtushenko August 28, 2024 14:13
Co-authored-by: Jake Hemstad <[email protected]>
Copy link
Collaborator

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add a test that verifies that an algorithm invocation will throw as expected. Something simple like this:

#define THRUST_FORCE_32BIT_OFFSET_TYPE
try{
   thrust::reduce( thrust::counting_iterator<size_t>(0), thrust::counting_iterator<size_t>( std::numeric_limits<int32_t>::max() + 1));
} catch (std::exception const& e) {
   // expect this to throw
}
}

#define THRUST_FORCE_64BIT_OFFSET_TYPE
try{
   thrust::reduce( thrust::counting_iterator<size_t>(0), thrust::counting_iterator<size_t>( std::numeric_limits<int32_t>::max() + 1));
} catch (std::exception const& e) {
   // expect this to *NOT* throw
}
}

@miscco miscco force-pushed the thrust_dispatch_switch branch from 06930d7 to 6c2efe1 Compare August 28, 2024 15:13
@miscco miscco force-pushed the thrust_dispatch_switch branch from 3e6b83b to b28c913 Compare August 28, 2024 16:34
docs/thrust/cmake_options.rst Outdated Show resolved Hide resolved
docs/thrust/cmake_options.rst Outdated Show resolved Hide resolved
docs/thrust/cmake_options.rst Outdated Show resolved Hide resolved
docs/thrust/cmake_options.rst Outdated Show resolved Hide resolved
thrust/CMakeLists.txt Outdated Show resolved Hide resolved
thrust/cmake/ThrustBuildCompilerTargets.cmake Outdated Show resolved Hide resolved
thrust/cmake/ThrustBuildCompilerTargets.cmake Outdated Show resolved Hide resolved
thrust/thrust/system/cuda/detail/dispatch.h Outdated Show resolved Hide resolved
+ std::to_string(thrust::detail::integer_traits<index_type>::const_max) \
+ "). " #index_type " was used because the macro THRUST_FORCE_32BIT_OFFSET_TYPE was defined. " \
"To handle larger input sizes, either remove this macro to dynamically dispatch " \
"between 32-bit and 64-bit index types, or define THRUST_FORCE_64BIT_OFFSET_TYPE.") \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s not much the user can do if they encounter this at runtime unless they rebuild the application. I wonder whether this error should be written to target developers or end users.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think that we should target this to a specific audience. In the end it does not matter whether it is an end user or developer.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miscco Would you like me to test this in libcudf?

docs/thrust/cmake_options.rst Outdated Show resolved Hide resolved
docs/thrust/cmake_options.rst Outdated Show resolved Hide resolved
thrust/thrust/system/cuda/detail/dispatch.h Outdated Show resolved Hide resolved
docs/thrust/cmake_options.rst Outdated Show resolved Hide resolved
docs/thrust/cmake_options.rst Outdated Show resolved Hide resolved
@miscco
Copy link
Collaborator Author

miscco commented Aug 29, 2024

That would be awesome, I can build everything but dont know how to measure the binary size / runtim stuff

Copy link
Contributor

🟨 CI finished in 6h 46m: Pass: 99%/250 | Total: 1d 20h | Avg: 10m 44s | Max: 1h 33m | Hits: 79%/24375
  • 🟨 cub: Pass: 99%/131 | Total: 1d 01h | Avg: 11m 42s | Max: 1h 33m | Hits: 66%/4296

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  99%/123 | Total:  1d 01h | Avg: 12m 12s | Max:  1h 33m | Hits:  66%/4296  
      🟩 arm64              Pass: 100%/8   | Total: 33m 07s | Avg:  4m 08s | Max:  4m 22s
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total:  1h 41m | Avg:  6m 44s | Max: 49m 51s | Hits:  66%/716   
      🟩 11.8               Pass: 100%/3   | Total: 13m 06s | Avg:  4m 22s | Max:  4m 29s
      🔍 12.5               Pass:  99%/113 | Total: 23h 40m | Avg: 12m 34s | Max:  1h 33m | Hits:  66%/3580  
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  7m 21s | Avg:  3m 40s | Max:  3m 42s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 41m | Avg:  6m 44s | Max: 49m 51s | Hits:  66%/716   
      🟩 nvcc11.8           Pass: 100%/3   | Total: 13m 06s | Avg:  4m 22s | Max:  4m 29s
      🔍 nvcc12.5           Pass:  99%/111 | Total: 23h 32m | Avg: 12m 43s | Max:  1h 33m | Hits:  66%/3580  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  7m 21s | Avg:  3m 40s | Max:  3m 42s
      🔍 nvcc               Pass:  99%/129 | Total:  1d 01h | Avg: 11m 50s | Max:  1h 33m | Hits:  66%/4296  
    🔍 cxx: Clang17 🔍
      🟩 Clang9             Pass: 100%/6   | Total: 39m 10s | Avg:  6m 31s | Max:  9m 19s
      🟩 Clang10            Pass: 100%/3   | Total: 28m 53s | Avg:  9m 37s | Max: 10m 15s
      🟩 Clang11            Pass: 100%/4   | Total: 35m 21s | Avg:  8m 50s | Max:  9m 20s
      🟩 Clang12            Pass: 100%/4   | Total: 37m 31s | Avg:  9m 22s | Max:  9m 39s
      🟩 Clang13            Pass: 100%/4   | Total: 36m 13s | Avg:  9m 03s | Max:  9m 39s
      🟩 Clang14            Pass: 100%/4   | Total: 18m 04s | Avg:  4m 31s | Max:  4m 59s
      🟩 Clang15            Pass: 100%/4   | Total: 18m 19s | Avg:  4m 34s | Max:  4m 49s
      🟩 Clang16            Pass: 100%/4   | Total: 18m 07s | Avg:  4m 31s | Max:  4m 44s
      🔍 Clang17            Pass:  96%/26  | Total:  5h 57m | Avg: 13m 44s | Max: 33m 18s
      🟩 GCC6               Pass: 100%/2   | Total:  7m 46s | Avg:  3m 53s | Max:  4m 05s
      🟩 GCC7               Pass: 100%/6   | Total: 23m 21s | Avg:  3m 53s | Max:  4m 27s
      🟩 GCC8               Pass: 100%/6   | Total: 23m 35s | Avg:  3m 55s | Max:  4m 18s
      🟩 GCC9               Pass: 100%/6   | Total: 24m 20s | Avg:  4m 03s | Max:  5m 05s
      🟩 GCC10              Pass: 100%/4   | Total: 17m 47s | Avg:  4m 26s | Max:  4m 50s
      🟩 GCC11              Pass: 100%/7   | Total: 31m 37s | Avg:  4m 31s | Max:  4m 53s
      🟩 GCC12              Pass: 100%/4   | Total: 18m 18s | Avg:  4m 34s | Max:  5m 18s
      🟩 GCC13              Pass: 100%/28  | Total:  7h 01m | Avg: 15m 03s | Max:  1h 33m
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 16m 11s | Avg:  5m 23s | Max:  5m 30s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 49m 51s | Avg: 49m 51s | Max: 49m 51s | Hits:  66%/716   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 59m | Avg: 59m 51s | Max:  1h 02m | Hits:  66%/1432  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 11m | Avg:  1h 03m | Max:  1h 07m | Hits:  66%/2148  
    🔍 cxx_family: Clang 🔍
      🔍 Clang              Pass:  98%/59  | Total:  9h 48m | Avg:  9m 58s | Max: 33m 18s
      🟩 GCC                Pass: 100%/63  | Total:  9h 28m | Avg:  9m 01s | Max:  1h 33m
      🟩 Intel              Pass: 100%/3   | Total: 16m 11s | Avg:  5m 23s | Max:  5m 30s
      🟩 MSVC               Pass: 100%/6   | Total:  6h 01m | Avg:  1h 00m | Max:  1h 07m | Hits:  66%/4296  
    🔍 jobs: DeviceLaunch 🔍
      🟩 Build              Pass: 100%/99  | Total: 14h 09m | Avg:  8m 34s | Max:  1h 07m | Hits:  66%/4296  
      🔍 DeviceLaunch       Pass:  87%/8   | Total:  2h 25m | Avg: 18m 13s | Max: 33m 18s
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 12m | Avg: 16m 32s | Max: 28m 08s
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 30m | Avg: 18m 45s | Max: 29m 24s
      🟩 TestGPU            Pass: 100%/8   | Total:  4h 16m | Avg: 32m 05s | Max:  1h 33m
    🔍 std: 17 🔍
      🟩 11                 Pass: 100%/34  | Total:  4h 41m | Avg:  8m 16s | Max: 25m 28s
      🟩 14                 Pass: 100%/37  | Total:  7h 28m | Avg: 12m 06s | Max: 59m 55s | Hits:  66%/2148  
      🔍 17                 Pass:  97%/36  | Total:  6h 38m | Avg: 11m 04s | Max:  1h 07m | Hits:  66%/1432  
      🟩 20                 Pass: 100%/24  | Total:  6h 46m | Avg: 16m 56s | Max:  1h 33m | Hits:  66%/716   
    🟨 gpu
      🟨 v100               Pass:  99%/131 | Total:  1d 01h | Avg: 11m 42s | Max:  1h 33m | Hits:  66%/4296  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 13m 06s | Avg:  4m 22s | Max:  4m 29s
      🟩 90a                Pass: 100%/4   | Total: 16m 28s | Avg:  4m 07s | Max:  5m 00s
    
  • 🟩 thrust: Pass: 100%/118 | Total: 19h 00m | Avg: 9m 40s | Max: 1h 11m | Hits: 82%/20079

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total: 18h 25m | Avg: 10m 03s | Max:  1h 11m | Hits:  82%/20079 
      🟩 arm64              Pass: 100%/8   | Total: 35m 09s | Avg:  4m 23s | Max:  5m 04s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  2h 19m | Avg:  9m 18s | Max: 51m 46s | Hits:  73%/2231  
      🟩 11.8               Pass: 100%/3   | Total: 13m 27s | Avg:  4m 29s | Max:  5m 07s
      🟩 12.5               Pass: 100%/100 | Total: 16h 27m | Avg:  9m 52s | Max:  1h 11m | Hits:  83%/17848 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  9m 24s | Avg:  4m 42s | Max:  5m 00s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  2h 19m | Avg:  9m 18s | Max: 51m 46s | Hits:  73%/2231  
      🟩 nvcc11.8           Pass: 100%/3   | Total: 13m 27s | Avg:  4m 29s | Max:  5m 07s
      🟩 nvcc12.5           Pass: 100%/98  | Total: 16h 18m | Avg:  9m 58s | Max:  1h 11m | Hits:  83%/17848 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 24s | Avg:  4m 42s | Max:  5m 00s
      🟩 nvcc               Pass: 100%/116 | Total: 18h 51m | Avg:  9m 45s | Max:  1h 11m | Hits:  82%/20079 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 42m 59s | Avg:  7m 09s | Max:  8m 24s
      🟩 Clang10            Pass: 100%/3   | Total: 25m 37s | Avg:  8m 32s | Max:  9m 01s
      🟩 Clang11            Pass: 100%/4   | Total: 28m 52s | Avg:  7m 13s | Max:  8m 08s
      🟩 Clang12            Pass: 100%/4   | Total: 28m 02s | Avg:  7m 00s | Max:  7m 28s
      🟩 Clang13            Pass: 100%/4   | Total: 28m 32s | Avg:  7m 08s | Max:  7m 39s
      🟩 Clang14            Pass: 100%/4   | Total: 18m 39s | Avg:  4m 39s | Max:  4m 53s
      🟩 Clang15            Pass: 100%/4   | Total: 19m 02s | Avg:  4m 45s | Max:  5m 13s
      🟩 Clang16            Pass: 100%/4   | Total: 18m 37s | Avg:  4m 39s | Max:  4m 50s
      🟩 Clang17            Pass: 100%/18  | Total:  2h 17m | Avg:  7m 37s | Max: 22m 05s
      🟩 GCC6               Pass: 100%/2   | Total:  7m 14s | Avg:  3m 37s | Max:  3m 56s
      🟩 GCC7               Pass: 100%/6   | Total: 25m 30s | Avg:  4m 15s | Max:  5m 00s
      🟩 GCC8               Pass: 100%/6   | Total: 24m 06s | Avg:  4m 01s | Max:  4m 34s
      🟩 GCC9               Pass: 100%/6   | Total: 53m 33s | Avg:  8m 55s | Max: 30m 45s
      🟩 GCC10              Pass: 100%/4   | Total: 18m 33s | Avg:  4m 38s | Max:  5m 13s
      🟩 GCC11              Pass: 100%/7   | Total: 32m 09s | Avg:  4m 35s | Max:  5m 12s
      🟩 GCC12              Pass: 100%/4   | Total: 44m 29s | Avg: 11m 07s | Max: 29m 59s
      🟩 GCC13              Pass: 100%/20  | Total:  2h 25m | Avg:  7m 16s | Max: 17m 43s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 17m 43s | Avg:  5m 54s | Max:  6m 37s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 51m 46s | Avg: 51m 46s | Max: 51m 46s | Hits:  73%/2231  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 11m | Hits:  73%/4462  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  4h 09m | Avg: 41m 31s | Max:  1h 07m | Hits:  86%/13386 
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total:  5h 47m | Avg:  6m 48s | Max: 22m 05s
      🟩 GCC                Pass: 100%/55  | Total:  5h 50m | Avg:  6m 22s | Max: 30m 45s
      🟩 Intel              Pass: 100%/3   | Total: 17m 43s | Avg:  5m 54s | Max:  6m 37s
      🟩 MSVC               Pass: 100%/9   | Total:  7h 04m | Avg: 47m 10s | Max:  1h 11m | Hits:  82%/20079 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total: 19h 00m | Avg:  9m 40s | Max:  1h 11m | Hits:  82%/20079 
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total: 14h 48m | Avg:  8m 58s | Max:  1h 11m | Hits:  73%/13386 
      🟩 TestCPU            Pass: 100%/11  | Total:  2h 06m | Avg: 11m 31s | Max: 23m 55s | Hits:  99%/6693  
      🟩 TestGPU            Pass: 100%/8   | Total:  2h 05m | Avg: 15m 42s | Max: 22m 05s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 13m 27s | Avg:  4m 29s | Max:  5m 07s
      🟩 90a                Pass: 100%/4   | Total: 16m 07s | Avg:  4m 01s | Max:  4m 09s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  2h 39m | Avg:  5m 19s | Max: 12m 11s
      🟩 14                 Pass: 100%/34  | Total:  6h 53m | Avg: 12m 10s | Max: 54m 35s | Hits:  80%/8924  
      🟩 17                 Pass: 100%/33  | Total:  5h 50m | Avg: 10m 37s | Max:  1h 11m | Hits:  82%/6693  
      🟩 20                 Pass: 100%/21  | Total:  3h 36m | Avg: 10m 18s | Max:  1h 07m | Hits:  86%/4462  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
+/- Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Copy link
Contributor

🟩 CI finished in 15h 32m: Pass: 100%/250 | Total: 1d 20h | Avg: 10m 47s | Max: 1h 33m | Hits: 79%/24375
  • 🟩 cub: Pass: 100%/131 | Total: 1d 01h | Avg: 11m 48s | Max: 1h 33m | Hits: 66%/4296

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total:  1d 01h | Avg: 12m 18s | Max:  1h 33m | Hits:  66%/4296  
      🟩 arm64              Pass: 100%/8   | Total: 33m 07s | Avg:  4m 08s | Max:  4m 22s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 41m | Avg:  6m 44s | Max: 49m 51s | Hits:  66%/716   
      🟩 11.8               Pass: 100%/3   | Total: 13m 06s | Avg:  4m 22s | Max:  4m 29s
      🟩 12.5               Pass: 100%/113 | Total: 23h 53m | Avg: 12m 40s | Max:  1h 33m | Hits:  66%/3580  
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  7m 21s | Avg:  3m 40s | Max:  3m 42s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 41m | Avg:  6m 44s | Max: 49m 51s | Hits:  66%/716   
      🟩 nvcc11.8           Pass: 100%/3   | Total: 13m 06s | Avg:  4m 22s | Max:  4m 29s
      🟩 nvcc12.5           Pass: 100%/111 | Total: 23h 45m | Avg: 12m 50s | Max:  1h 33m | Hits:  66%/3580  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  7m 21s | Avg:  3m 40s | Max:  3m 42s
      🟩 nvcc               Pass: 100%/129 | Total:  1d 01h | Avg: 11m 56s | Max:  1h 33m | Hits:  66%/4296  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 39m 10s | Avg:  6m 31s | Max:  9m 19s
      🟩 Clang10            Pass: 100%/3   | Total: 28m 53s | Avg:  9m 37s | Max: 10m 15s
      🟩 Clang11            Pass: 100%/4   | Total: 35m 21s | Avg:  8m 50s | Max:  9m 20s
      🟩 Clang12            Pass: 100%/4   | Total: 37m 31s | Avg:  9m 22s | Max:  9m 39s
      🟩 Clang13            Pass: 100%/4   | Total: 36m 13s | Avg:  9m 03s | Max:  9m 39s
      🟩 Clang14            Pass: 100%/4   | Total: 18m 04s | Avg:  4m 31s | Max:  4m 59s
      🟩 Clang15            Pass: 100%/4   | Total: 18m 19s | Avg:  4m 34s | Max:  4m 49s
      🟩 Clang16            Pass: 100%/4   | Total: 18m 07s | Avg:  4m 31s | Max:  4m 44s
      🟩 Clang17            Pass: 100%/26  | Total:  6h 10m | Avg: 14m 14s | Max: 33m 18s
      🟩 GCC6               Pass: 100%/2   | Total:  7m 46s | Avg:  3m 53s | Max:  4m 05s
      🟩 GCC7               Pass: 100%/6   | Total: 23m 21s | Avg:  3m 53s | Max:  4m 27s
      🟩 GCC8               Pass: 100%/6   | Total: 23m 35s | Avg:  3m 55s | Max:  4m 18s
      🟩 GCC9               Pass: 100%/6   | Total: 24m 20s | Avg:  4m 03s | Max:  5m 05s
      🟩 GCC10              Pass: 100%/4   | Total: 17m 47s | Avg:  4m 26s | Max:  4m 50s
      🟩 GCC11              Pass: 100%/7   | Total: 31m 37s | Avg:  4m 31s | Max:  4m 53s
      🟩 GCC12              Pass: 100%/4   | Total: 18m 18s | Avg:  4m 34s | Max:  5m 18s
      🟩 GCC13              Pass: 100%/28  | Total:  7h 01m | Avg: 15m 03s | Max:  1h 33m
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 16m 11s | Avg:  5m 23s | Max:  5m 30s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 49m 51s | Avg: 49m 51s | Max: 49m 51s | Hits:  66%/716   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 59m | Avg: 59m 51s | Max:  1h 02m | Hits:  66%/1432  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 11m | Avg:  1h 03m | Max:  1h 07m | Hits:  66%/2148  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total: 10h 01m | Avg: 10m 12s | Max: 33m 18s
      🟩 GCC                Pass: 100%/63  | Total:  9h 28m | Avg:  9m 01s | Max:  1h 33m
      🟩 Intel              Pass: 100%/3   | Total: 16m 11s | Avg:  5m 23s | Max:  5m 30s
      🟩 MSVC               Pass: 100%/6   | Total:  6h 01m | Avg:  1h 00m | Max:  1h 07m | Hits:  66%/4296  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total:  1d 01h | Avg: 11m 48s | Max:  1h 33m | Hits:  66%/4296  
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total: 14h 09m | Avg:  8m 34s | Max:  1h 07m | Hits:  66%/4296  
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 38m | Avg: 19m 49s | Max: 33m 18s
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 12m | Avg: 16m 32s | Max: 28m 08s
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 30m | Avg: 18m 45s | Max: 29m 24s
      🟩 TestGPU            Pass: 100%/8   | Total:  4h 16m | Avg: 32m 05s | Max:  1h 33m
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 13m 06s | Avg:  4m 22s | Max:  4m 29s
      🟩 90a                Pass: 100%/4   | Total: 16m 28s | Avg:  4m 07s | Max:  5m 00s
    🟩 std
      🟩 11                 Pass: 100%/34  | Total:  4h 41m | Avg:  8m 16s | Max: 25m 28s
      🟩 14                 Pass: 100%/37  | Total:  7h 28m | Avg: 12m 06s | Max: 59m 55s | Hits:  66%/2148  
      🟩 17                 Pass: 100%/36  | Total:  6h 51m | Avg: 11m 25s | Max:  1h 07m | Hits:  66%/1432  
      🟩 20                 Pass: 100%/24  | Total:  6h 46m | Avg: 16m 56s | Max:  1h 33m | Hits:  66%/716   
    
  • 🟩 thrust: Pass: 100%/118 | Total: 19h 00m | Avg: 9m 40s | Max: 1h 11m | Hits: 82%/20079

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total: 18h 25m | Avg: 10m 03s | Max:  1h 11m | Hits:  82%/20079 
      🟩 arm64              Pass: 100%/8   | Total: 35m 09s | Avg:  4m 23s | Max:  5m 04s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  2h 19m | Avg:  9m 18s | Max: 51m 46s | Hits:  73%/2231  
      🟩 11.8               Pass: 100%/3   | Total: 13m 27s | Avg:  4m 29s | Max:  5m 07s
      🟩 12.5               Pass: 100%/100 | Total: 16h 27m | Avg:  9m 52s | Max:  1h 11m | Hits:  83%/17848 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  9m 24s | Avg:  4m 42s | Max:  5m 00s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  2h 19m | Avg:  9m 18s | Max: 51m 46s | Hits:  73%/2231  
      🟩 nvcc11.8           Pass: 100%/3   | Total: 13m 27s | Avg:  4m 29s | Max:  5m 07s
      🟩 nvcc12.5           Pass: 100%/98  | Total: 16h 18m | Avg:  9m 58s | Max:  1h 11m | Hits:  83%/17848 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 24s | Avg:  4m 42s | Max:  5m 00s
      🟩 nvcc               Pass: 100%/116 | Total: 18h 51m | Avg:  9m 45s | Max:  1h 11m | Hits:  82%/20079 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 42m 59s | Avg:  7m 09s | Max:  8m 24s
      🟩 Clang10            Pass: 100%/3   | Total: 25m 37s | Avg:  8m 32s | Max:  9m 01s
      🟩 Clang11            Pass: 100%/4   | Total: 28m 52s | Avg:  7m 13s | Max:  8m 08s
      🟩 Clang12            Pass: 100%/4   | Total: 28m 02s | Avg:  7m 00s | Max:  7m 28s
      🟩 Clang13            Pass: 100%/4   | Total: 28m 32s | Avg:  7m 08s | Max:  7m 39s
      🟩 Clang14            Pass: 100%/4   | Total: 18m 39s | Avg:  4m 39s | Max:  4m 53s
      🟩 Clang15            Pass: 100%/4   | Total: 19m 02s | Avg:  4m 45s | Max:  5m 13s
      🟩 Clang16            Pass: 100%/4   | Total: 18m 37s | Avg:  4m 39s | Max:  4m 50s
      🟩 Clang17            Pass: 100%/18  | Total:  2h 17m | Avg:  7m 37s | Max: 22m 05s
      🟩 GCC6               Pass: 100%/2   | Total:  7m 14s | Avg:  3m 37s | Max:  3m 56s
      🟩 GCC7               Pass: 100%/6   | Total: 25m 30s | Avg:  4m 15s | Max:  5m 00s
      🟩 GCC8               Pass: 100%/6   | Total: 24m 06s | Avg:  4m 01s | Max:  4m 34s
      🟩 GCC9               Pass: 100%/6   | Total: 53m 33s | Avg:  8m 55s | Max: 30m 45s
      🟩 GCC10              Pass: 100%/4   | Total: 18m 33s | Avg:  4m 38s | Max:  5m 13s
      🟩 GCC11              Pass: 100%/7   | Total: 32m 09s | Avg:  4m 35s | Max:  5m 12s
      🟩 GCC12              Pass: 100%/4   | Total: 44m 29s | Avg: 11m 07s | Max: 29m 59s
      🟩 GCC13              Pass: 100%/20  | Total:  2h 25m | Avg:  7m 16s | Max: 17m 43s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 17m 43s | Avg:  5m 54s | Max:  6m 37s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 51m 46s | Avg: 51m 46s | Max: 51m 46s | Hits:  73%/2231  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 11m | Hits:  73%/4462  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  4h 09m | Avg: 41m 31s | Max:  1h 07m | Hits:  86%/13386 
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total:  5h 47m | Avg:  6m 48s | Max: 22m 05s
      🟩 GCC                Pass: 100%/55  | Total:  5h 50m | Avg:  6m 22s | Max: 30m 45s
      🟩 Intel              Pass: 100%/3   | Total: 17m 43s | Avg:  5m 54s | Max:  6m 37s
      🟩 MSVC               Pass: 100%/9   | Total:  7h 04m | Avg: 47m 10s | Max:  1h 11m | Hits:  82%/20079 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total: 19h 00m | Avg:  9m 40s | Max:  1h 11m | Hits:  82%/20079 
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total: 14h 48m | Avg:  8m 58s | Max:  1h 11m | Hits:  73%/13386 
      🟩 TestCPU            Pass: 100%/11  | Total:  2h 06m | Avg: 11m 31s | Max: 23m 55s | Hits:  99%/6693  
      🟩 TestGPU            Pass: 100%/8   | Total:  2h 05m | Avg: 15m 42s | Max: 22m 05s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 13m 27s | Avg:  4m 29s | Max:  5m 07s
      🟩 90a                Pass: 100%/4   | Total: 16m 07s | Avg:  4m 01s | Max:  4m 09s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  2h 39m | Avg:  5m 19s | Max: 12m 11s
      🟩 14                 Pass: 100%/34  | Total:  6h 53m | Avg: 12m 10s | Max: 54m 35s | Hits:  80%/8924  
      🟩 17                 Pass: 100%/33  | Total:  5h 50m | Avg: 10m 37s | Max:  1h 11m | Hits:  82%/6693  
      🟩 20                 Pass: 100%/21  | Total:  3h 36m | Avg: 10m 18s | Max:  1h 07m | Hits:  86%/4462  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 44s | Avg: 11m 44s | Max: 11m 44s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
+/- Thrust
CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 250)

# Runner
178 linux-amd64-cpu16
41 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@miscco miscco merged commit 89702de into NVIDIA:main Aug 30, 2024
266 checks passed
@miscco miscco deleted the thrust_dispatch_switch branch August 30, 2024 14:41
@@ -59,6 +59,10 @@ option(THRUST_ENABLE_TESTING "Build Thrust testing suite." "ON")
option(THRUST_ENABLE_EXAMPLES "Build Thrust examples." "ON")
option(THRUST_ENABLE_BENCHMARKS "Build Thrust runtime benchmarks." "${CCCL_ENABLE_BENCHMARKS}")

# Allow the user to optionally select offset type dispatch to fixed 32 or 64 bit types
set(THRUST_DISPATCH_TYPE "Dynamic" CACHE STRING "Select Thrust offset type dispatch." FORCE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FORCE here means that a user can't override the value

@@ -59,6 +59,10 @@ option(THRUST_ENABLE_TESTING "Build Thrust testing suite." "ON")
option(THRUST_ENABLE_EXAMPLES "Build Thrust examples." "ON")
option(THRUST_ENABLE_BENCHMARKS "Build Thrust runtime benchmarks." "${CCCL_ENABLE_BENCHMARKS}")

# Allow the user to optionally select offset type dispatch to fixed 32 or 64 bit types
set(THRUST_DISPATCH_TYPE "Dynamic" CACHE STRING "Select Thrust offset type dispatch." FORCE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option is not visible to users consuming our CMake packages (e.g. CPM, find_package(CCCL), add_subdirectory(...), etc). They're only visible in the developer build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this option to be usable when consuming CCCL from libcudf. That was the purpose of introducing this, in #1958.

set(header_definitions
"THRUST_WRAPPED_NAMESPACE=wrapped_thrust"
"CUB_WRAPPED_NAMESPACE=wrapped_cub"
"THRUST_FORCE_32_BIT_OFFSET_TYPE")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These header tests are linked to the interface targets built in ThrustBuildCompilerTargets.cmake, which already may have these options set, potentially to conflicting values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request. thrust For all items related to Thrust.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[FEA]: Add Thrust build option to disable dynamic offset type dispatch
7 participants