Ensure that we only use the inline variable trait when it is actually available #2712

miscco · 2024-11-06T08:01:43Z

We were defining _CCCL_TRAIT solely based on the standard version, but guarded actually defining the inline variables by __cpp_variable_templates

We should use that as the condition

… available

miscco · 2024-11-06T08:02:58Z

libcudacxx/include/cuda/std/__cccl/dialect.h

@@ -110,6 +104,13 @@
 #  define _CCCL_NO_VARIABLE_TEMPLATES
 #endif // _CCCL_STD_VER <= 2011

+// Variable templates are more efficient most of the time, so we want to use them rather than structs when possible
+#if defined(_CCCL_NO_VARIABLE_TEMPLATES)


This actually relaxes the condition to C++14 and sufficient support for variable templates.

Should we stay with support for inline variables?

If _CCCL_TRAIT is also use to refer to std traits, then this won't work because variable templates are available in C++14 but all the variable templates for std traits were added in C++17.

When is _CCCL_TRAIT used? Only for stuff in ::cuda::std? Then it should be fine.

That is one of the reason we are using our internal traits all the time ;)

I mean I am open to keep it as is and use _CCCL_HAS_NO_INLINE_VARIABLES as the condition, however, I believe we can do better

If we only use it for the internal traits, then all is good! I just wondered whether we would cause any breakage.

github-actions · 2024-11-06T11:18:51Z

🟩 CI finished in 2h 36m: Pass: 100%/394 | Total: 6d 08h | Avg: 23m 11s | Max: 2h 15m | Hits: 35%/25847

🟩 libcudacxx: Pass: 100%/118 | Total: 1d 04h | Avg: 14m 26s | Max: 2h 15m | Hits: 37%/9496

🟩 cpu
  🟩 amd64              Pass: 100%/110 | Total:  1d 03h | Avg: 14m 51s | Max:  2h 15m | Hits:  37%/9496  
  🟩 arm64              Pass: 100%/8   | Total:  1h 09m | Avg:  8m 44s | Max: 19m 19s
🟩 ctk
  🟩 11.1               Pass: 100%/15  | Total:  4h 03m | Avg: 16m 15s | Max: 35m 42s | Hits:  33%/2180  
  🟩 11.8               Pass: 100%/3   | Total:  1h 16m | Avg: 25m 33s | Max: 31m 22s
  🟩 12.5               Pass: 100%/4   | Total:  1h 59m | Avg: 29m 50s | Max: 43m 33s
  🟩 12.6               Pass: 100%/96  | Total: 21h 04m | Avg: 13m 10s | Max:  2h 15m | Hits:  38%/7316  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/12  | Total:  2h 28m | Avg: 12m 20s | Max: 20m 33s
  🟩 nvcc11.1           Pass: 100%/15  | Total:  4h 03m | Avg: 16m 15s | Max: 35m 42s | Hits:  33%/2180  
  🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 16m | Avg: 25m 33s | Max: 31m 22s
  🟩 nvcc12.5           Pass: 100%/4   | Total:  1h 59m | Avg: 29m 50s | Max: 43m 33s
  🟩 nvcc12.6           Pass: 100%/84  | Total: 18h 36m | Avg: 13m 17s | Max:  2h 15m | Hits:  38%/7316  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/12  | Total:  2h 28m | Avg: 12m 20s | Max: 20m 33s
  🟩 nvcc               Pass: 100%/106 | Total:  1d 01h | Avg: 14m 40s | Max:  2h 15m | Hits:  37%/9496  
🟩 cxx
  🟩 Clang9             Pass: 100%/6   | Total:  1h 17m | Avg: 12m 59s | Max: 27m 41s
  🟩 Clang10            Pass: 100%/3   | Total:  2h 26m | Avg: 48m 46s | Max:  2h 15m
  🟩 Clang11            Pass: 100%/4   | Total: 47m 57s | Avg: 11m 59s | Max: 24m 40s
  🟩 Clang12            Pass: 100%/4   | Total: 34m 15s | Avg:  8m 33s | Max: 21m 44s
  🟩 Clang13            Pass: 100%/4   | Total: 35m 00s | Avg:  8m 45s | Max: 21m 55s
  🟩 Clang14            Pass: 100%/4   | Total: 36m 03s | Avg:  9m 00s | Max: 22m 32s
  🟩 Clang15            Pass: 100%/4   | Total: 35m 24s | Avg:  8m 51s | Max: 21m 37s
  🟩 Clang16            Pass: 100%/4   | Total: 36m 16s | Avg:  9m 04s | Max: 23m 04s
  🟩 Clang17            Pass: 100%/4   | Total: 35m 50s | Avg:  8m 57s | Max: 22m 07s
  🟩 Clang18            Pass: 100%/18  | Total:  3h 36m | Avg: 12m 01s | Max: 23m 07s
  🟩 GCC6               Pass: 100%/2   | Total: 21m 34s | Avg: 10m 47s | Max: 18m 48s
  🟩 GCC7               Pass: 100%/6   | Total:  1h 35m | Avg: 15m 55s | Max: 24m 32s
  🟩 GCC8               Pass: 100%/6   | Total:  1h 11m | Avg: 11m 51s | Max: 21m 58s
  🟩 GCC9               Pass: 100%/6   | Total:  1h 09m | Avg: 11m 39s | Max: 22m 33s
  🟩 GCC10              Pass: 100%/4   | Total: 32m 54s | Avg:  8m 13s | Max: 21m 26s
  🟩 GCC11              Pass: 100%/7   | Total:  2h 00m | Avg: 17m 14s | Max: 31m 22s
  🟩 GCC12              Pass: 100%/4   | Total: 45m 25s | Avg: 11m 21s | Max: 22m 58s
  🟩 GCC13              Pass: 100%/17  | Total:  3h 35m | Avg: 12m 41s | Max: 36m 32s
  🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 12m | Avg: 24m 02s | Max: 30m 15s
  🟩 MSVC14.16          Pass: 100%/1   | Total: 35m 42s | Avg: 35m 42s | Max: 35m 42s | Hits:  33%/2180  
  🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 06m | Avg: 33m 29s | Max: 35m 17s | Hits:  30%/4723  
  🟩 MSVC14.39          Pass: 100%/1   | Total: 35m 41s | Avg: 35m 41s | Max: 35m 41s | Hits:  52%/2593  
  🟩 NVHPC24.7          Pass: 100%/4   | Total:  1h 59m | Avg: 29m 50s | Max: 43m 33s
🟩 cxx_family
  🟩 Clang              Pass: 100%/55  | Total: 11h 41m | Avg: 12m 45s | Max:  2h 15m
  🟩 GCC                Pass: 100%/52  | Total: 11h 12m | Avg: 12m 56s | Max: 36m 32s
  🟩 Intel              Pass: 100%/3   | Total:  1h 12m | Avg: 24m 02s | Max: 30m 15s
  🟩 MSVC               Pass: 100%/4   | Total:  2h 18m | Avg: 34m 35s | Max: 35m 42s | Hits:  37%/9496  
  🟩 NVHPC              Pass: 100%/4   | Total:  1h 59m | Avg: 29m 50s | Max: 43m 33s
🟩 gpu
  🟩 v100               Pass: 100%/118 | Total:  1d 04h | Avg: 14m 26s | Max:  2h 15m | Hits:  37%/9496  
🟩 jobs
  🟩 Build              Pass: 100%/110 | Total:  1d 01h | Avg: 13m 55s | Max:  2h 15m | Hits:  37%/9496  
  🟩 NVRTC              Pass: 100%/4   | Total:  1h 50m | Avg: 27m 34s | Max: 36m 32s
  🟩 Test               Pass: 100%/3   | Total:  1h 00m | Avg: 20m 18s | Max: 23m 07s
  🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 05s | Avg:  2m 05s | Max:  2m 05s
🟩 sm
  🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 16m | Avg: 25m 33s | Max: 31m 22s
  🟩 90                 Pass: 100%/4   | Total: 39m 42s | Avg:  9m 55s | Max: 11m 44s
  🟩 90a                Pass: 100%/8   | Total:  1h 10m | Avg:  8m 50s | Max: 16m 25s
🟩 std
  🟩 11                 Pass: 100%/32  | Total:  3h 48m | Avg:  7m 09s | Max: 20m 56s
  🟩 14                 Pass: 100%/32  | Total: 13h 43m | Avg: 25m 44s | Max:  2h 15m | Hits:  32%/4463  
  🟩 17                 Pass: 100%/30  | Total:  6h 16m | Avg: 12m 32s | Max: 43m 33s | Hits:  30%/2440  
  🟩 20                 Pass: 100%/23  | Total:  4h 33m | Avg: 11m 53s | Max: 36m 32s | Hits:  52%/2593

🟩 cub: Pass: 100%/110 | Total: 3d 04h | Avg: 41m 48s | Max: 1h 15m | Hits: 3%/2948

🟩 cpu
  🟩 amd64              Pass: 100%/102 | Total:  2d 22h | Avg: 41m 32s | Max:  1h 15m | Hits:   3%/2948  
  🟩 arm64              Pass: 100%/8   | Total:  6h 01m | Avg: 45m 12s | Max: 56m 16s
🟩 ctk
  🟩 11.1               Pass: 100%/15  | Total: 10h 05m | Avg: 40m 20s | Max: 56m 08s | Hits:   3%/737   
  🟩 11.8               Pass: 100%/3   | Total:  2h 57m | Avg: 59m 01s | Max:  1h 15m
  🟩 12.5               Pass: 100%/4   | Total:  4h 27m | Avg:  1h 06m | Max:  1h 11m
  🟩 12.6               Pass: 100%/88  | Total:  2d 11h | Avg: 40m 19s | Max:  1h 08m | Hits:   3%/2211  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/4   | Total:  3h 38m | Avg: 54m 44s | Max:  1h 03m
  🟩 nvcc11.1           Pass: 100%/15  | Total: 10h 05m | Avg: 40m 20s | Max: 56m 08s | Hits:   3%/737   
  🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 57m | Avg: 59m 01s | Max:  1h 15m
  🟩 nvcc12.5           Pass: 100%/4   | Total:  4h 27m | Avg:  1h 06m | Max:  1h 11m
  🟩 nvcc12.6           Pass: 100%/84  | Total:  2d 07h | Avg: 39m 38s | Max:  1h 08m | Hits:   3%/2211  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/4   | Total:  3h 38m | Avg: 54m 44s | Max:  1h 03m
  🟩 nvcc               Pass: 100%/106 | Total:  3d 00h | Avg: 41m 18s | Max:  1h 15m | Hits:   3%/2948  
🟩 cxx
  🟩 Clang9             Pass: 100%/6   | Total:  4h 04m | Avg: 40m 45s | Max:  1h 01m
  🟩 Clang10            Pass: 100%/3   | Total:  2h 15m | Avg: 45m 10s | Max: 59m 53s
  🟩 Clang11            Pass: 100%/4   | Total:  2h 40m | Avg: 40m 13s | Max: 53m 15s
  🟩 Clang12            Pass: 100%/4   | Total:  2h 51m | Avg: 42m 54s | Max: 58m 41s
  🟩 Clang13            Pass: 100%/4   | Total:  2h 45m | Avg: 41m 24s | Max: 56m 34s
  🟩 Clang14            Pass: 100%/4   | Total:  2h 43m | Avg: 40m 46s | Max: 55m 15s
  🟩 Clang15            Pass: 100%/4   | Total:  2h 46m | Avg: 41m 43s | Max: 55m 29s
  🟩 Clang16            Pass: 100%/4   | Total:  2h 48m | Avg: 42m 12s | Max: 54m 13s
  🟩 Clang17            Pass: 100%/4   | Total:  2h 49m | Avg: 42m 26s | Max:  1h 00m
  🟩 Clang18            Pass: 100%/11  | Total:  7h 42m | Avg: 42m 03s | Max:  1h 03m
  🟩 GCC6               Pass: 100%/2   | Total:  1h 36m | Avg: 48m 00s | Max: 50m 25s
  🟩 GCC7               Pass: 100%/6   | Total:  3h 54m | Avg: 39m 08s | Max: 56m 54s
  🟩 GCC8               Pass: 100%/6   | Total:  4h 20m | Avg: 43m 26s | Max: 56m 12s
  🟩 GCC9               Pass: 100%/6   | Total:  4h 09m | Avg: 41m 38s | Max: 56m 08s
  🟩 GCC10              Pass: 100%/4   | Total:  2h 44m | Avg: 41m 08s | Max: 54m 50s
  🟩 GCC11              Pass: 100%/7   | Total:  5h 40m | Avg: 48m 36s | Max:  1h 15m
  🟩 GCC12              Pass: 100%/4   | Total:  2h 56m | Avg: 44m 04s | Max:  1h 01m
  🟩 GCC13              Pass: 100%/16  | Total:  6h 21m | Avg: 23m 49s | Max: 56m 16s
  🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 52m | Avg: 57m 38s | Max:  1h 05m
  🟩 MSVC14.16          Pass: 100%/1   | Total: 54m 23s | Avg: 54m 23s | Max: 54m 23s | Hits:   3%/737   
  🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 02m | Hits:   3%/1474  
  🟩 MSVC14.39          Pass: 100%/1   | Total:  1h 08m | Avg:  1h 08m | Max:  1h 08m | Hits:   3%/737   
  🟩 NVHPC24.7          Pass: 100%/4   | Total:  4h 27m | Avg:  1h 06m | Max:  1h 11m
🟩 cxx_family
  🟩 Clang              Pass: 100%/48  | Total:  1d 09h | Avg: 41m 51s | Max:  1h 03m
  🟩 GCC                Pass: 100%/51  | Total:  1d 07h | Avg: 37m 19s | Max:  1h 15m
  🟩 Intel              Pass: 100%/3   | Total:  2h 52m | Avg: 57m 38s | Max:  1h 05m
  🟩 MSVC               Pass: 100%/4   | Total:  4h 05m | Avg:  1h 01m | Max:  1h 08m | Hits:   3%/2948  
  🟩 NVHPC              Pass: 100%/4   | Total:  4h 27m | Avg:  1h 06m | Max:  1h 11m
🟩 gpu
  🟩 v100               Pass: 100%/110 | Total:  3d 04h | Avg: 41m 48s | Max:  1h 15m | Hits:   3%/2948  
🟩 jobs
  🟩 Build              Pass: 100%/102 | Total:  3d 01h | Avg: 43m 01s | Max:  1h 15m | Hits:   3%/2948  
  🟩 DeviceLaunch       Pass: 100%/1   | Total: 25m 11s | Avg: 25m 11s | Max: 25m 11s
  🟩 GraphCapture       Pass: 100%/1   | Total: 26m 05s | Avg: 26m 05s | Max: 26m 05s
  🟩 HostLaunch         Pass: 100%/3   | Total:  1h 09m | Avg: 23m 05s | Max: 25m 34s
  🟩 TestGPU            Pass: 100%/3   | Total:  1h 29m | Avg: 29m 55s | Max: 32m 41s
🟩 sm
  🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 57m | Avg: 59m 01s | Max:  1h 15m
  🟩 90a                Pass: 100%/4   | Total: 37m 23s | Avg:  9m 20s | Max: 25m 48s
🟩 std
  🟩 11                 Pass: 100%/30  | Total: 18h 32m | Avg: 37m 05s | Max:  1h 05m
  🟩 14                 Pass: 100%/29  | Total:  1d 03h | Avg: 55m 57s | Max:  1h 15m | Hits:   3%/1474  
  🟩 17                 Pass: 100%/27  | Total: 17h 16m | Avg: 38m 23s | Max:  1h 07m | Hits:   3%/737   
  🟩 20                 Pass: 100%/24  | Total: 13h 46m | Avg: 34m 25s | Max:  1h 11m | Hits:   3%/737

🟩 thrust: Pass: 100%/109 | Total: 1d 18h | Avg: 23m 19s | Max: 1h 16m | Hits: 40%/13165

🟩 cpu
  🟩 amd64              Pass: 100%/101 | Total:  1d 16h | Avg: 23m 51s | Max:  1h 16m | Hits:  40%/13165 
  🟩 arm64              Pass: 100%/8   | Total:  2h 13m | Avg: 16m 42s | Max: 38m 03s
🟩 ctk
  🟩 11.1               Pass: 100%/15  | Total:  5h 27m | Avg: 21m 51s | Max:  1h 09m | Hits:  22%/2633  
  🟩 11.8               Pass: 100%/3   | Total:  1h 24m | Avg: 28m 11s | Max: 50m 49s
  🟩 12.5               Pass: 100%/4   | Total:  4h 48m | Avg:  1h 12m | Max:  1h 16m
  🟩 12.6               Pass: 100%/87  | Total:  1d 06h | Avg: 21m 10s | Max:  1h 15m | Hits:  45%/10532 
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/4   | Total: 50m 05s | Avg: 12m 31s | Max: 35m 46s
  🟩 nvcc11.1           Pass: 100%/15  | Total:  5h 27m | Avg: 21m 51s | Max:  1h 09m | Hits:  22%/2633  
  🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 24m | Avg: 28m 11s | Max: 50m 49s
  🟩 nvcc12.5           Pass: 100%/4   | Total:  4h 48m | Avg:  1h 12m | Max:  1h 16m
  🟩 nvcc12.6           Pass: 100%/83  | Total:  1d 05h | Avg: 21m 35s | Max:  1h 15m | Hits:  45%/10532 
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/4   | Total: 50m 05s | Avg: 12m 31s | Max: 35m 46s
  🟩 nvcc               Pass: 100%/105 | Total:  1d 17h | Avg: 23m 44s | Max:  1h 16m | Hits:  40%/13165 
🟩 cxx
  🟩 Clang9             Pass: 100%/6   | Total:  1h 56m | Avg: 19m 26s | Max: 40m 04s
  🟩 Clang10            Pass: 100%/3   | Total:  1h 13m | Avg: 24m 33s | Max: 41m 39s
  🟩 Clang11            Pass: 100%/4   | Total:  1h 18m | Avg: 19m 41s | Max: 41m 54s
  🟩 Clang12            Pass: 100%/4   | Total:  1h 20m | Avg: 20m 08s | Max: 40m 35s
  🟩 Clang13            Pass: 100%/4   | Total:  1h 19m | Avg: 19m 59s | Max: 37m 48s
  🟩 Clang14            Pass: 100%/4   | Total:  1h 16m | Avg: 19m 10s | Max: 38m 53s
  🟩 Clang15            Pass: 100%/4   | Total:  1h 21m | Avg: 20m 17s | Max: 37m 18s
  🟩 Clang16            Pass: 100%/4   | Total:  1h 26m | Avg: 21m 31s | Max: 40m 42s
  🟩 Clang17            Pass: 100%/4   | Total:  1h 25m | Avg: 21m 25s | Max: 42m 04s
  🟩 Clang18            Pass: 100%/11  | Total:  2h 30m | Avg: 13m 39s | Max: 35m 46s
  🟩 GCC6               Pass: 100%/2   | Total: 40m 10s | Avg: 20m 05s | Max: 35m 19s
  🟩 GCC7               Pass: 100%/6   | Total:  1h 52m | Avg: 18m 42s | Max: 39m 40s
  🟩 GCC8               Pass: 100%/6   | Total:  2h 05m | Avg: 20m 52s | Max: 42m 41s
  🟩 GCC9               Pass: 100%/6   | Total:  2h 01m | Avg: 20m 12s | Max: 40m 47s
  🟩 GCC10              Pass: 100%/4   | Total:  1h 17m | Avg: 19m 29s | Max: 40m 49s
  🟩 GCC11              Pass: 100%/7   | Total:  2h 43m | Avg: 23m 21s | Max: 50m 49s
  🟩 GCC12              Pass: 100%/4   | Total:  1h 25m | Avg: 21m 21s | Max: 39m 39s
  🟩 GCC13              Pass: 100%/14  | Total:  2h 55m | Avg: 12m 31s | Max: 38m 03s
  🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 12m | Avg: 44m 05s | Max: 52m 58s
  🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 09m | Avg:  1h 09m | Max:  1h 09m | Hits:  22%/2633  
  🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 14m | Hits:  26%/5266  
  🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 37m | Avg: 48m 47s | Max:  1h 15m | Hits:  65%/5266  
  🟩 NVHPC24.7          Pass: 100%/4   | Total:  4h 48m | Avg:  1h 12m | Max:  1h 16m
🟩 cxx_family
  🟩 Clang              Pass: 100%/48  | Total: 15h 09m | Avg: 18m 56s | Max: 42m 04s
  🟩 GCC                Pass: 100%/49  | Total: 15h 01m | Avg: 18m 23s | Max: 50m 49s
  🟩 Intel              Pass: 100%/3   | Total:  2h 12m | Avg: 44m 05s | Max: 52m 58s
  🟩 MSVC               Pass: 100%/5   | Total:  5h 11m | Avg:  1h 02m | Max:  1h 15m | Hits:  40%/13165 
  🟩 NVHPC              Pass: 100%/4   | Total:  4h 48m | Avg:  1h 12m | Max:  1h 16m
🟩 gpu
  🟩 v100               Pass: 100%/109 | Total:  1d 18h | Avg: 23m 19s | Max:  1h 16m | Hits:  40%/13165 
🟩 jobs
  🟩 Build              Pass: 100%/102 | Total:  1d 16h | Avg: 23m 53s | Max:  1h 16m | Hits:  26%/10532 
  🟩 TestCPU            Pass: 100%/4   | Total: 43m 37s | Avg: 10m 54s | Max: 21m 42s | Hits:  99%/2633  
  🟩 TestGPU            Pass: 100%/3   | Total:  1h 02m | Avg: 20m 59s | Max: 27m 07s
🟩 sm
  🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 24m | Avg: 28m 11s | Max: 50m 49s
  🟩 90a                Pass: 100%/4   | Total: 40m 55s | Avg: 10m 13s | Max: 27m 18s
🟩 std
  🟩 11                 Pass: 100%/30  | Total:  5h 16m | Avg: 10m 32s | Max:  1h 00m
  🟩 14                 Pass: 100%/29  | Total: 20h 43m | Avg: 42m 52s | Max:  1h 16m | Hits:  22%/5266  
  🟩 17                 Pass: 100%/27  | Total:  8h 56m | Avg: 19m 52s | Max:  1h 16m | Hits:  30%/2633  
  🟩 20                 Pass: 100%/23  | Total:  7h 26m | Avg: 19m 24s | Max:  1h 15m | Hits:  65%/5266

🟩 cudax: Pass: 100%/54 | Total: 4h 23m | Avg: 4m 53s | Max: 24m 54s | Hits: 65%/238

🟩 cpu
  🟩 amd64              Pass: 100%/50  | Total:  4h 13m | Avg:  5m 04s | Max: 24m 54s | Hits:  65%/238   
  🟩 arm64              Pass: 100%/4   | Total: 10m 35s | Avg:  2m 38s | Max:  3m 03s
🟩 ctk
  🟩 12.0               Pass: 100%/19  | Total:  1h 31m | Avg:  4m 48s | Max: 20m 18s | Hits:  65%/119   
  🟩 12.5               Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 35s
  🟩 12.6               Pass: 100%/33  | Total:  2h 39m | Avg:  4m 50s | Max: 24m 54s | Hits:  65%/119   
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 31m | Avg:  4m 48s | Max: 20m 18s | Hits:  65%/119   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 35s
  🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 39m | Avg:  4m 50s | Max: 24m 54s | Hits:  65%/119   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/54  | Total:  4h 23m | Avg:  4m 53s | Max: 24m 54s | Hits:  65%/238   
🟩 cxx
  🟩 Clang9             Pass: 100%/2   | Total:  6m 10s | Avg:  3m 05s | Max:  3m 17s
  🟩 Clang10            Pass: 100%/2   | Total:  6m 23s | Avg:  3m 11s | Max:  3m 19s
  🟩 Clang11            Pass: 100%/4   | Total: 11m 50s | Avg:  2m 57s | Max:  3m 15s
  🟩 Clang12            Pass: 100%/4   | Total: 11m 44s | Avg:  2m 56s | Max:  3m 09s
  🟩 Clang13            Pass: 100%/4   | Total: 11m 40s | Avg:  2m 55s | Max:  3m 06s
  🟩 Clang14            Pass: 100%/4   | Total: 29m 03s | Avg:  7m 15s | Max: 20m 18s
  🟩 Clang15            Pass: 100%/2   | Total:  6m 14s | Avg:  3m 07s | Max:  3m 13s
  🟩 Clang16            Pass: 100%/4   | Total: 16m 57s | Avg:  4m 14s | Max:  8m 08s
  🟩 Clang17            Pass: 100%/2   | Total:  6m 14s | Avg:  3m 07s | Max:  3m 18s
  🟩 Clang18            Pass: 100%/2   | Total: 27m 57s | Avg: 13m 58s | Max: 24m 54s
  🟩 GCC9               Pass: 100%/2   | Total:  5m 57s | Avg:  2m 58s | Max:  3m 07s
  🟩 GCC10              Pass: 100%/4   | Total: 11m 54s | Avg:  2m 58s | Max:  3m 13s
  🟩 GCC11              Pass: 100%/4   | Total: 11m 40s | Avg:  2m 55s | Max:  3m 02s
  🟩 GCC12              Pass: 100%/7   | Total:  1h 03m | Avg:  9m 02s | Max: 17m 41s
  🟩 GCC13              Pass: 100%/3   | Total:  7m 49s | Avg:  2m 36s | Max:  2m 53s
  🟩 MSVC14.36          Pass: 100%/1   | Total:  8m 08s | Avg:  8m 08s | Max:  8m 08s | Hits:  65%/119   
  🟩 MSVC14.39          Pass: 100%/1   | Total:  8m 09s | Avg:  8m 09s | Max:  8m 09s | Hits:  65%/119   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 35s
🟩 cxx_family
  🟩 Clang              Pass: 100%/30  | Total:  2h 14m | Avg:  4m 28s | Max: 24m 54s
  🟩 GCC                Pass: 100%/20  | Total:  1h 40m | Avg:  5m 01s | Max: 17m 41s
  🟩 MSVC               Pass: 100%/2   | Total: 16m 17s | Avg:  8m 08s | Max:  8m 09s | Hits:  65%/238   
  🟩 NVHPC              Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 35s
🟩 gpu
  🟩 v100               Pass: 100%/54  | Total:  4h 23m | Avg:  4m 53s | Max: 24m 54s | Hits:  65%/238   
🟩 jobs
  🟩 Build              Pass: 100%/49  | Total:  2h 47m | Avg:  3m 24s | Max:  8m 09s | Hits:  65%/238   
  🟩 Test               Pass: 100%/5   | Total:  1h 36m | Avg: 19m 22s | Max: 24m 54s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  2m 33s | Avg:  2m 33s | Max:  2m 33s
  🟩 90a                Pass: 100%/1   | Total:  2m 53s | Avg:  2m 53s | Max:  2m 53s
🟩 std
  🟩 17                 Pass: 100%/29  | Total:  2h 02m | Avg:  4m 13s | Max: 17m 41s
  🟩 20                 Pass: 100%/25  | Total:  2h 21m | Avg:  5m 39s | Max: 24m 54s | Hits:  65%/238

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 10s | Avg: 5m 05s | Max: 8m 10s

🟩 cpu
  🟩 amd64              Pass: 100%/2   | Total: 10m 10s | Avg:  5m 05s | Max:  8m 10s
🟩 ctk
  🟩 12.6               Pass: 100%/2   | Total: 10m 10s | Avg:  5m 05s | Max:  8m 10s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 10s | Avg:  5m 05s | Max:  8m 10s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/2   | Total: 10m 10s | Avg:  5m 05s | Max:  8m 10s
🟩 cxx
  🟩 GCC13              Pass: 100%/2   | Total: 10m 10s | Avg:  5m 05s | Max:  8m 10s
🟩 cxx_family
  🟩 GCC                Pass: 100%/2   | Total: 10m 10s | Avg:  5m 05s | Max:  8m 10s
🟩 gpu
  🟩 v100               Pass: 100%/2   | Total: 10m 10s | Avg:  5m 05s | Max:  8m 10s
🟩 jobs
  🟩 Build              Pass: 100%/1   | Total:  2m 00s | Avg:  2m 00s | Max:  2m 00s
  🟩 Test               Pass: 100%/1   | Total:  8m 10s | Avg:  8m 10s | Max:  8m 10s

🟩 python: Pass: 100%/1 | Total: 15m 20s | Avg: 15m 20s | Max: 15m 20s

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total: 15m 20s | Avg: 15m 20s | Max: 15m 20s
🟩 ctk
  🟩 12.6               Pass: 100%/1   | Total: 15m 20s | Avg: 15m 20s | Max: 15m 20s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/1   | Total: 15m 20s | Avg: 15m 20s | Max: 15m 20s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total: 15m 20s | Avg: 15m 20s | Max: 15m 20s
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total: 15m 20s | Avg: 15m 20s | Max: 15m 20s
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total: 15m 20s | Avg: 15m 20s | Max: 15m 20s
🟩 gpu
  🟩 v100               Pass: 100%/1   | Total: 15m 20s | Avg: 15m 20s | Max: 15m 20s
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total: 15m 20s | Avg: 15m 20s | Max: 15m 20s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
+/-	libcu++
	CUB
	Thrust
	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
+/-	libcu++
+/-	CUB
+/-	Thrust
+/-	CUDA Experimental
+/-	python
+/-	CCCL C Parallel Library
+/-	Catch2Helper

🏃‍ Runner counts (total jobs: 394)

#	Runner
326	`linux-amd64-cpu16`
28	`linux-arm64-cpu16`
25	`linux-amd64-gpu-v100-latest-1`
15	`windows-amd64-cpu16`

ericniebler

i like this change. i think using _CCCL_NO_VARIABLE_TEMPLATES as the condition is the right call. inline variables are not necessary. i don't believe we have code that depends on the address of, e.g., is_same_v<T,U> being the same in all translation units.

miscco · 2024-11-06T17:06:10Z

i like this change. i think using _CCCL_NO_VARIABLE_TEMPLATES as the condition is the right call. inline variables are not necessary. i don't believe we have code that depends on the address of, e.g., is_same_v<T,U> being the same in all translation units.

There is one difference that might matter. We cannot use any variable template that actually specializes the value of an inline variable

The reason being that if we then link multiple TUs we get duplicate symbol warnings. But I believe that should be fine and in that case we actually should default back to using the struct for the variable tempalte

… available (NVIDIA#2712) * Ensure that we only use the inline variable trait when it is actually available * Use the right define for internal traits

* copy pasted sample * First draft * Kernel functor and some other things * Clean up and break up long main function * Needs launch fix * Switch to copy_bytes and cleanups * Missing include * Add exception print and waive value * Adjust copy count * Add license and switch benchmark streams * Remove a function left as a mistake * Update copyright date Co-authored-by: Eric Niebler <[email protected]> * Setup cudax examples. (#2697) * Move the sample to new location and fix warning * build fixes and 0 return code on waive * Some new MSVC errors * explicit cast * Rename enable/disable peer access and separate the sample loop * Add `cuda::minimum` and `cuda::maximum` (#2681) * Add cuda::minimum and cuda::maximum * Various fixes to cub::DeviceTransform (#2709) * Workaround non-copyable iterators * Use a named constant for SMEM * Cast to raw reference 2 * Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg * Make `thrust::transform` use `cub::DeviceTransform` (#2389) * Add transform benchmark requiring a stable address * Make thrust::transform use cub::DeviceTransform * Introduces address stability detection and opt-in in libcu++ * Mark lambdas in Thrust BabelStream benchmark address oblivious * Optimize prefetch cub::DeviceTransform for small problems Fixes: #2263 * Ensure that we only use the inline variable trait when it is actually available (#2712) * Ensure that we only use the inline variable trait when it is actually available * Use the right define for internal traits * [CUDAX] Rename memory resource and memory pool from async to device (#2710) * Rename the type * Update tests * Rename async memory pool * Rename the tests * Change name in the docs * Generalise the memory_pool_properties name * Fix docs --------- Co-authored-by: Michael Schellenberger Costa <[email protected]> * Update memory resource name --------- Co-authored-by: Eric Niebler <[email protected]> Co-authored-by: Allison Piper <[email protected]> Co-authored-by: Jacob Faibussowitsch <[email protected]> Co-authored-by: Bernhard Manfred Gruber <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>

* copy pasted sample * First draft * Kernel functor and some other things * Clean up and break up long main function * Needs launch fix * Switch to copy_bytes and cleanups * Missing include * Add exception print and waive value * Adjust copy count * Add license and switch benchmark streams * Remove a function left as a mistake * Update copyright date Co-authored-by: Eric Niebler <[email protected]> * Setup cudax examples. (NVIDIA#2697) * Move the sample to new location and fix warning * build fixes and 0 return code on waive * Some new MSVC errors * explicit cast * Rename enable/disable peer access and separate the sample loop * Add `cuda::minimum` and `cuda::maximum` (NVIDIA#2681) * Add cuda::minimum and cuda::maximum * Various fixes to cub::DeviceTransform (NVIDIA#2709) * Workaround non-copyable iterators * Use a named constant for SMEM * Cast to raw reference 2 * Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg * Make `thrust::transform` use `cub::DeviceTransform` (NVIDIA#2389) * Add transform benchmark requiring a stable address * Make thrust::transform use cub::DeviceTransform * Introduces address stability detection and opt-in in libcu++ * Mark lambdas in Thrust BabelStream benchmark address oblivious * Optimize prefetch cub::DeviceTransform for small problems Fixes: NVIDIA#2263 * Ensure that we only use the inline variable trait when it is actually available (NVIDIA#2712) * Ensure that we only use the inline variable trait when it is actually available * Use the right define for internal traits * [CUDAX] Rename memory resource and memory pool from async to device (NVIDIA#2710) * Rename the type * Update tests * Rename async memory pool * Rename the tests * Change name in the docs * Generalise the memory_pool_properties name * Fix docs --------- Co-authored-by: Michael Schellenberger Costa <[email protected]> * Update memory resource name --------- Co-authored-by: Eric Niebler <[email protected]> Co-authored-by: Allison Piper <[email protected]> Co-authored-by: Jacob Faibussowitsch <[email protected]> Co-authored-by: Bernhard Manfred Gruber <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>

… available (NVIDIA#2712) * Ensure that we only use the inline variable trait when it is actually available * Use the right define for internal traits

miscco requested review from a team as code owners November 6, 2024 08:01

miscco requested review from ericniebler and griwes November 6, 2024 08:01

Ensure that we only use the inline variable trait when it is actually…

1643f61

… available

miscco force-pushed the fix_cccl_trait branch from 7fc2fbd to 1643f61 Compare November 6, 2024 08:01

miscco commented Nov 6, 2024

View reviewed changes

Use the right define for internal traits

558f642

miscco force-pushed the fix_cccl_trait branch from 16b37cf to 558f642 Compare November 6, 2024 08:37

bernhardmgruber approved these changes Nov 6, 2024

View reviewed changes

ericniebler approved these changes Nov 6, 2024

View reviewed changes

miscco merged commit 2864f2b into NVIDIA:main Nov 6, 2024
413 checks passed

miscco deleted the fix_cccl_trait branch November 6, 2024 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure that we only use the inline variable trait when it is actually available #2712

Ensure that we only use the inline variable trait when it is actually available #2712

miscco commented Nov 6, 2024

miscco Nov 6, 2024 •

edited

Loading

bernhardmgruber Nov 6, 2024

miscco Nov 6, 2024

miscco Nov 6, 2024

bernhardmgruber Nov 6, 2024

github-actions bot commented Nov 6, 2024

🟩 libcudacxx: Pass: 100%/118 | Total: 1d 04h | Avg: 14m 26s | Max: 2h 15m | Hits: 37%/9496

🟩 cub: Pass: 100%/110 | Total: 3d 04h | Avg: 41m 48s | Max: 1h 15m | Hits: 3%/2948

🟩 thrust: Pass: 100%/109 | Total: 1d 18h | Avg: 23m 19s | Max: 1h 16m | Hits: 40%/13165

🟩 cudax: Pass: 100%/54 | Total: 4h 23m | Avg: 4m 53s | Max: 24m 54s | Hits: 65%/238

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 10s | Avg: 5m 05s | Max: 8m 10s

🟩 python: Pass: 100%/1 | Total: 15m 20s | Avg: 15m 20s | Max: 15m 20s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 394)

ericniebler left a comment

miscco commented Nov 6, 2024

Ensure that we only use the inline variable trait when it is actually available #2712

Ensure that we only use the inline variable trait when it is actually available #2712

Conversation

miscco commented Nov 6, 2024

miscco Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

bernhardmgruber Nov 6, 2024

Choose a reason for hiding this comment

miscco Nov 6, 2024

Choose a reason for hiding this comment

miscco Nov 6, 2024

Choose a reason for hiding this comment

bernhardmgruber Nov 6, 2024

Choose a reason for hiding this comment

github-actions bot commented Nov 6, 2024

🟩 libcudacxx: Pass: 100%/118 | Total: 1d 04h | Avg: 14m 26s | Max: 2h 15m | Hits: 37%/9496

🟩 cub: Pass: 100%/110 | Total: 3d 04h | Avg: 41m 48s | Max: 1h 15m | Hits: 3%/2948

🟩 thrust: Pass: 100%/109 | Total: 1d 18h | Avg: 23m 19s | Max: 1h 16m | Hits: 40%/13165

🟩 cudax: Pass: 100%/54 | Total: 4h 23m | Avg: 4m 53s | Max: 24m 54s | Hits: 65%/238

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 10s | Avg: 5m 05s | Max: 8m 10s

🟩 python: Pass: 100%/1 | Total: 15m 20s | Avg: 15m 20s | Max: 15m 20s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 394)

ericniebler left a comment

Choose a reason for hiding this comment

miscco commented Nov 6, 2024

miscco Nov 6, 2024 •

edited

Loading