Port `thrust::transform` to use `cub::DeviceTransform` #2263

bernhardmgruber · 2024-08-19T18:04:18Z

Once #2086 is merged, thrust::transform should be ported to use cub::DeviceTransform.

Tasks

Design an opt-in for function objects to express they do not require address stability
Port thrust::transform to cub::DeviceTransform
Ensure zip_iterators passed to thrust::transform are decomposed and optimized
Ensure we have enough benchmarks for thrust::transform, including BabelStream
Discuss and finalize the design of the address stability opt-in
[BUG]: CUB device_transform breaks nvc++ -stdpar #2402

The text was updated successfully, but these errors were encountered:

* Introduces address stability detection and opt-in in libcu++ * Mark lambdas in Thrust BabelStream benchmark address oblivious Fixes: NVIDIA#2263

* Add transform benchmark requiring a stable address * Make thrust::transform use cub::DeviceTransform * Introduces address stability detection and opt-in in libcu++ * Mark lambdas in Thrust BabelStream benchmark address oblivious * Optimize prefetch cub::DeviceTransform for small problems Fixes: NVIDIA#2263

* copy pasted sample * First draft * Kernel functor and some other things * Clean up and break up long main function * Needs launch fix * Switch to copy_bytes and cleanups * Missing include * Add exception print and waive value * Adjust copy count * Add license and switch benchmark streams * Remove a function left as a mistake * Update copyright date Co-authored-by: Eric Niebler <[email protected]> * Setup cudax examples. (#2697) * Move the sample to new location and fix warning * build fixes and 0 return code on waive * Some new MSVC errors * explicit cast * Rename enable/disable peer access and separate the sample loop * Add `cuda::minimum` and `cuda::maximum` (#2681) * Add cuda::minimum and cuda::maximum * Various fixes to cub::DeviceTransform (#2709) * Workaround non-copyable iterators * Use a named constant for SMEM * Cast to raw reference 2 * Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg * Make `thrust::transform` use `cub::DeviceTransform` (#2389) * Add transform benchmark requiring a stable address * Make thrust::transform use cub::DeviceTransform * Introduces address stability detection and opt-in in libcu++ * Mark lambdas in Thrust BabelStream benchmark address oblivious * Optimize prefetch cub::DeviceTransform for small problems Fixes: #2263 * Ensure that we only use the inline variable trait when it is actually available (#2712) * Ensure that we only use the inline variable trait when it is actually available * Use the right define for internal traits * [CUDAX] Rename memory resource and memory pool from async to device (#2710) * Rename the type * Update tests * Rename async memory pool * Rename the tests * Change name in the docs * Generalise the memory_pool_properties name * Fix docs --------- Co-authored-by: Michael Schellenberger Costa <[email protected]> * Update memory resource name --------- Co-authored-by: Eric Niebler <[email protected]> Co-authored-by: Allison Piper <[email protected]> Co-authored-by: Jacob Faibussowitsch <[email protected]> Co-authored-by: Bernhard Manfred Gruber <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>

* copy pasted sample * First draft * Kernel functor and some other things * Clean up and break up long main function * Needs launch fix * Switch to copy_bytes and cleanups * Missing include * Add exception print and waive value * Adjust copy count * Add license and switch benchmark streams * Remove a function left as a mistake * Update copyright date Co-authored-by: Eric Niebler <[email protected]> * Setup cudax examples. (NVIDIA#2697) * Move the sample to new location and fix warning * build fixes and 0 return code on waive * Some new MSVC errors * explicit cast * Rename enable/disable peer access and separate the sample loop * Add `cuda::minimum` and `cuda::maximum` (NVIDIA#2681) * Add cuda::minimum and cuda::maximum * Various fixes to cub::DeviceTransform (NVIDIA#2709) * Workaround non-copyable iterators * Use a named constant for SMEM * Cast to raw reference 2 * Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg * Make `thrust::transform` use `cub::DeviceTransform` (NVIDIA#2389) * Add transform benchmark requiring a stable address * Make thrust::transform use cub::DeviceTransform * Introduces address stability detection and opt-in in libcu++ * Mark lambdas in Thrust BabelStream benchmark address oblivious * Optimize prefetch cub::DeviceTransform for small problems Fixes: NVIDIA#2263 * Ensure that we only use the inline variable trait when it is actually available (NVIDIA#2712) * Ensure that we only use the inline variable trait when it is actually available * Use the right define for internal traits * [CUDAX] Rename memory resource and memory pool from async to device (NVIDIA#2710) * Rename the type * Update tests * Rename async memory pool * Rename the tests * Change name in the docs * Generalise the memory_pool_properties name * Fix docs --------- Co-authored-by: Michael Schellenberger Costa <[email protected]> * Update memory resource name --------- Co-authored-by: Eric Niebler <[email protected]> Co-authored-by: Allison Piper <[email protected]> Co-authored-by: Jacob Faibussowitsch <[email protected]> Co-authored-by: Bernhard Manfred Gruber <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>

* Add transform benchmark requiring a stable address * Make thrust::transform use cub::DeviceTransform * Introduces address stability detection and opt-in in libcu++ * Mark lambdas in Thrust BabelStream benchmark address oblivious * Optimize prefetch cub::DeviceTransform for small problems Fixes: NVIDIA#2263

bernhardmgruber mentioned this issue Aug 19, 2024

[EPIC] Optimize thrust::transform for newer architectures #1947

Open

19 tasks

github-project-automation bot added this to CCCL Aug 19, 2024

github-project-automation bot moved this to Todo in CCCL Aug 19, 2024

bernhardmgruber added the thrust For all items related to Thrust. label Aug 19, 2024

bernhardmgruber linked a pull request Sep 6, 2024 that will close this issue

Make thrust::transform use cub::DeviceTransform #2389

Merged

7 tasks

cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Sep 8, 2024

bernhardmgruber self-assigned this Sep 9, 2024

cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Sep 9, 2024

bernhardmgruber mentioned this issue Sep 11, 2024

Automatic address stability detection for thrust::transform #2404

Draft

2 tasks

cccl-authenticator-app bot moved this from In Review to In Progress in CCCL Nov 5, 2024

cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Nov 6, 2024

bernhardmgruber closed this as completed in #2389 Nov 6, 2024

bernhardmgruber closed this as completed in c97f2e3 Nov 6, 2024

github-project-automation bot moved this from In Review to Done in CCCL Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port `thrust::transform` to use `cub::DeviceTransform` #2263

Port `thrust::transform` to use `cub::DeviceTransform` #2263

bernhardmgruber commented Aug 19, 2024 •

edited by jrhemstad

Loading

Port thrust::transform to use cub::DeviceTransform #2263

Port thrust::transform to use cub::DeviceTransform #2263

Comments

bernhardmgruber commented Aug 19, 2024 • edited by jrhemstad Loading

Tasks

Port `thrust::transform` to use `cub::DeviceTransform` #2263

Port `thrust::transform` to use `cub::DeviceTransform` #2263

bernhardmgruber commented Aug 19, 2024 •

edited by jrhemstad

Loading