-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that we only use the inline variable trait when it is actually available #2712
Conversation
7fc2fbd
to
1643f61
Compare
@@ -110,6 +104,13 @@ | |||
# define _CCCL_NO_VARIABLE_TEMPLATES | |||
#endif // _CCCL_STD_VER <= 2011 | |||
|
|||
// Variable templates are more efficient most of the time, so we want to use them rather than structs when possible | |||
#if defined(_CCCL_NO_VARIABLE_TEMPLATES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually relaxes the condition to C++14 and sufficient support for variable templates.
Should we stay with support for inline variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If _CCCL_TRAIT
is also use to refer to std traits, then this won't work because variable templates are available in C++14 but all the variable templates for std traits were added in C++17.
When is _CCCL_TRAIT
used? Only for stuff in ::cuda::std
? Then it should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is one of the reason we are using our internal traits all the time ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean I am open to keep it as is and use _CCCL_HAS_NO_INLINE_VARIABLES
as the condition, however, I believe we can do better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only use it for the internal traits, then all is good! I just wondered whether we would cause any breakage.
16b37cf
to
558f642
Compare
🟩 CI finished in 2h 36m: Pass: 100%/394 | Total: 6d 08h | Avg: 23m 11s | Max: 2h 15m | Hits: 35%/25847
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 394)
# | Runner |
---|---|
326 | linux-amd64-cpu16 |
28 | linux-arm64-cpu16 |
25 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like this change. i think using _CCCL_NO_VARIABLE_TEMPLATES
as the condition is the right call. inline variables are not necessary. i don't believe we have code that depends on the address of, e.g., is_same_v<T,U>
being the same in all translation units.
There is one difference that might matter. We cannot use any variable template that actually specializes the value of an inline variable The reason being that if we then link multiple TUs we get duplicate symbol warnings. But I believe that should be fine and in that case we actually should default back to using the struct for the variable tempalte |
… available (NVIDIA#2712) * Ensure that we only use the inline variable trait when it is actually available * Use the right define for internal traits
… available (NVIDIA#2712) * Ensure that we only use the inline variable trait when it is actually available * Use the right define for internal traits
* copy pasted sample * First draft * Kernel functor and some other things * Clean up and break up long main function * Needs launch fix * Switch to copy_bytes and cleanups * Missing include * Add exception print and waive value * Adjust copy count * Add license and switch benchmark streams * Remove a function left as a mistake * Update copyright date Co-authored-by: Eric Niebler <[email protected]> * Setup cudax examples. (#2697) * Move the sample to new location and fix warning * build fixes and 0 return code on waive * Some new MSVC errors * explicit cast * Rename enable/disable peer access and separate the sample loop * Add `cuda::minimum` and `cuda::maximum` (#2681) * Add cuda::minimum and cuda::maximum * Various fixes to cub::DeviceTransform (#2709) * Workaround non-copyable iterators * Use a named constant for SMEM * Cast to raw reference 2 * Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg * Make `thrust::transform` use `cub::DeviceTransform` (#2389) * Add transform benchmark requiring a stable address * Make thrust::transform use cub::DeviceTransform * Introduces address stability detection and opt-in in libcu++ * Mark lambdas in Thrust BabelStream benchmark address oblivious * Optimize prefetch cub::DeviceTransform for small problems Fixes: #2263 * Ensure that we only use the inline variable trait when it is actually available (#2712) * Ensure that we only use the inline variable trait when it is actually available * Use the right define for internal traits * [CUDAX] Rename memory resource and memory pool from async to device (#2710) * Rename the type * Update tests * Rename async memory pool * Rename the tests * Change name in the docs * Generalise the memory_pool_properties name * Fix docs --------- Co-authored-by: Michael Schellenberger Costa <[email protected]> * Update memory resource name --------- Co-authored-by: Eric Niebler <[email protected]> Co-authored-by: Allison Piper <[email protected]> Co-authored-by: Jacob Faibussowitsch <[email protected]> Co-authored-by: Bernhard Manfred Gruber <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>
* copy pasted sample * First draft * Kernel functor and some other things * Clean up and break up long main function * Needs launch fix * Switch to copy_bytes and cleanups * Missing include * Add exception print and waive value * Adjust copy count * Add license and switch benchmark streams * Remove a function left as a mistake * Update copyright date Co-authored-by: Eric Niebler <[email protected]> * Setup cudax examples. (NVIDIA#2697) * Move the sample to new location and fix warning * build fixes and 0 return code on waive * Some new MSVC errors * explicit cast * Rename enable/disable peer access and separate the sample loop * Add `cuda::minimum` and `cuda::maximum` (NVIDIA#2681) * Add cuda::minimum and cuda::maximum * Various fixes to cub::DeviceTransform (NVIDIA#2709) * Workaround non-copyable iterators * Use a named constant for SMEM * Cast to raw reference 2 * Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg * Make `thrust::transform` use `cub::DeviceTransform` (NVIDIA#2389) * Add transform benchmark requiring a stable address * Make thrust::transform use cub::DeviceTransform * Introduces address stability detection and opt-in in libcu++ * Mark lambdas in Thrust BabelStream benchmark address oblivious * Optimize prefetch cub::DeviceTransform for small problems Fixes: NVIDIA#2263 * Ensure that we only use the inline variable trait when it is actually available (NVIDIA#2712) * Ensure that we only use the inline variable trait when it is actually available * Use the right define for internal traits * [CUDAX] Rename memory resource and memory pool from async to device (NVIDIA#2710) * Rename the type * Update tests * Rename async memory pool * Rename the tests * Change name in the docs * Generalise the memory_pool_properties name * Fix docs --------- Co-authored-by: Michael Schellenberger Costa <[email protected]> * Update memory resource name --------- Co-authored-by: Eric Niebler <[email protected]> Co-authored-by: Allison Piper <[email protected]> Co-authored-by: Jacob Faibussowitsch <[email protected]> Co-authored-by: Bernhard Manfred Gruber <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>
… available (NVIDIA#2712) * Ensure that we only use the inline variable trait when it is actually available * Use the right define for internal traits
We were defining
_CCCL_TRAIT
solely based on the standard version, but guarded actually defining the inline variables by__cpp_variable_templates
We should use that as the condition