[CUDAX] Rename memory resource and memory pool from async to device #2710

pciolkosz · 2024-11-06T00:48:11Z

The direction we want to take with memory resources in CUDAX is to use memory pools for all kinds of allocations. Memory resources operating on pools will provide both synchronous and asynchronous allocation functionality.

We will end up with memory pools and memory resources for all kinds of memory. For now we should rename the current async_memory_resource and async_memory_pool to device_*, since the current ones allocate device memory.

This change is a very simple search and replace, lets see if there are any issues in CI.

github-actions · 2024-11-06T01:11:04Z

🟩 CI finished in 21m 09s: Pass: 100%/54 | Total: 4h 37m | Avg: 5m 08s | Max: 16m 37s | Hits: 47%/238

🟩 cudax: Pass: 100%/54 | Total: 4h 37m | Avg: 5m 08s | Max: 16m 37s | Hits: 47%/238

🟩 cpu
  🟩 amd64              Pass: 100%/50  | Total:  4h 23m | Avg:  5m 16s | Max: 16m 37s | Hits:  47%/238   
  🟩 arm64              Pass: 100%/4   | Total: 14m 14s | Avg:  3m 33s | Max:  4m 05s
🟩 ctk
  🟩 12.0               Pass: 100%/19  | Total:  1h 38m | Avg:  5m 11s | Max: 16m 37s | Hits:  47%/119   
  🟩 12.5               Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max:  6m 27s
  🟩 12.6               Pass: 100%/33  | Total:  2h 46m | Avg:  5m 03s | Max: 15m 50s | Hits:  47%/119   
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 38m | Avg:  5m 11s | Max: 16m 37s | Hits:  47%/119   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max:  6m 27s
  🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 46m | Avg:  5m 03s | Max: 15m 50s | Hits:  47%/119   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/54  | Total:  4h 37m | Avg:  5m 08s | Max: 16m 37s | Hits:  47%/238   
🟩 cxx
  🟩 Clang9             Pass: 100%/2   | Total:  7m 53s | Avg:  3m 56s | Max:  4m 18s
  🟩 Clang10            Pass: 100%/2   | Total:  8m 04s | Avg:  4m 02s | Max:  4m 16s
  🟩 Clang11            Pass: 100%/4   | Total: 14m 15s | Avg:  3m 33s | Max:  3m 50s
  🟩 Clang12            Pass: 100%/4   | Total: 14m 52s | Avg:  3m 43s | Max:  4m 01s
  🟩 Clang13            Pass: 100%/4   | Total: 15m 03s | Avg:  3m 45s | Max:  3m 58s
  🟩 Clang14            Pass: 100%/4   | Total: 27m 50s | Avg:  6m 57s | Max: 16m 21s
  🟩 Clang15            Pass: 100%/2   | Total:  7m 57s | Avg:  3m 58s | Max:  4m 10s
  🟩 Clang16            Pass: 100%/4   | Total: 15m 13s | Avg:  3m 48s | Max:  4m 05s
  🟩 Clang17            Pass: 100%/2   | Total:  7m 43s | Avg:  3m 51s | Max:  3m 56s
  🟩 Clang18            Pass: 100%/2   | Total: 18m 34s | Avg:  9m 17s | Max: 14m 51s
  🟩 GCC9               Pass: 100%/2   | Total:  7m 01s | Avg:  3m 30s | Max:  3m 31s
  🟩 GCC10              Pass: 100%/4   | Total: 15m 05s | Avg:  3m 46s | Max:  4m 01s
  🟩 GCC11              Pass: 100%/4   | Total: 14m 23s | Avg:  3m 35s | Max:  3m 42s
  🟩 GCC12              Pass: 100%/7   | Total:  1h 03m | Avg:  9m 01s | Max: 16m 37s
  🟩 GCC13              Pass: 100%/3   | Total:  9m 48s | Avg:  3m 16s | Max:  3m 31s
  🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 17s | Avg:  9m 17s | Max:  9m 17s | Hits:  47%/119   
  🟩 MSVC14.39          Pass: 100%/1   | Total:  9m 07s | Avg:  9m 07s | Max:  9m 07s | Hits:  47%/119   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max:  6m 27s
🟩 cxx_family
  🟩 Clang              Pass: 100%/30  | Total:  2h 17m | Avg:  4m 34s | Max: 16m 21s
  🟩 GCC                Pass: 100%/20  | Total:  1h 49m | Avg:  5m 28s | Max: 16m 37s
  🟩 MSVC               Pass: 100%/2   | Total: 18m 24s | Avg:  9m 12s | Max:  9m 17s | Hits:  47%/238   
  🟩 NVHPC              Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max:  6m 27s
🟩 gpu
  🟩 v100               Pass: 100%/54  | Total:  4h 37m | Avg:  5m 08s | Max: 16m 37s | Hits:  47%/238   
🟩 jobs
  🟩 Build              Pass: 100%/49  | Total:  3h 18m | Avg:  4m 03s | Max:  9m 17s | Hits:  47%/238   
  🟩 Test               Pass: 100%/5   | Total:  1h 19m | Avg: 15m 48s | Max: 16m 37s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  2m 58s | Avg:  2m 58s | Max:  2m 58s
  🟩 90a                Pass: 100%/1   | Total:  2m 55s | Avg:  2m 55s | Max:  2m 55s
🟩 std
  🟩 17                 Pass: 100%/29  | Total:  2h 15m | Avg:  4m 40s | Max: 16m 37s
  🟩 20                 Pass: 100%/25  | Total:  2h 22m | Avg:  5m 42s | Max: 16m 21s | Hits:  47%/238

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

🏃‍ Runner counts (total jobs: 54)

#	Runner
43	`linux-amd64-cpu16`
5	`linux-amd64-gpu-v100-latest-1`
4	`linux-arm64-cpu16`
2	`windows-amd64-cpu16`

github-actions · 2024-11-06T08:50:26Z

🟩 CI finished in 24m 32s: Pass: 100%/54 | Total: 4h 04m | Avg: 4m 32s | Max: 20m 05s | Hits: 89%/238

🟩 cudax: Pass: 100%/54 | Total: 4h 04m | Avg: 4m 32s | Max: 20m 05s | Hits: 89%/238

🟩 cpu
  🟩 amd64              Pass: 100%/50  | Total:  3h 54m | Avg:  4m 41s | Max: 20m 05s | Hits:  89%/238   
  🟩 arm64              Pass: 100%/4   | Total: 10m 01s | Avg:  2m 30s | Max:  2m 31s
🟩 ctk
  🟩 12.0               Pass: 100%/19  | Total:  1h 25m | Avg:  4m 29s | Max: 15m 42s | Hits:  89%/119   
  🟩 12.5               Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  5m 02s
  🟩 12.6               Pass: 100%/33  | Total:  2h 29m | Avg:  4m 32s | Max: 20m 05s | Hits:  89%/119   
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 25m | Avg:  4m 29s | Max: 15m 42s | Hits:  89%/119   
  🟩 nvcc12.5           Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  5m 02s
  🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 29m | Avg:  4m 32s | Max: 20m 05s | Hits:  89%/119   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/54  | Total:  4h 04m | Avg:  4m 32s | Max: 20m 05s | Hits:  89%/238   
🟩 cxx
  🟩 Clang9             Pass: 100%/2   | Total:  9m 57s | Avg:  4m 58s | Max:  7m 02s
  🟩 Clang10            Pass: 100%/2   | Total:  6m 11s | Avg:  3m 05s | Max:  3m 22s
  🟩 Clang11            Pass: 100%/4   | Total: 11m 57s | Avg:  2m 59s | Max:  3m 08s
  🟩 Clang12            Pass: 100%/4   | Total: 12m 24s | Avg:  3m 06s | Max:  3m 17s
  🟩 Clang13            Pass: 100%/4   | Total: 11m 44s | Avg:  2m 56s | Max:  3m 13s
  🟩 Clang14            Pass: 100%/4   | Total: 24m 55s | Avg:  6m 13s | Max: 15m 42s
  🟩 Clang15            Pass: 100%/2   | Total:  6m 21s | Avg:  3m 10s | Max:  3m 14s
  🟩 Clang16            Pass: 100%/4   | Total: 11m 25s | Avg:  2m 51s | Max:  3m 19s
  🟩 Clang17            Pass: 100%/2   | Total:  6m 07s | Avg:  3m 03s | Max:  3m 07s
  🟩 Clang18            Pass: 100%/2   | Total: 19m 10s | Avg:  9m 35s | Max: 16m 02s
  🟩 GCC9               Pass: 100%/2   | Total:  6m 11s | Avg:  3m 05s | Max:  3m 10s
  🟩 GCC10              Pass: 100%/4   | Total: 11m 34s | Avg:  2m 53s | Max:  3m 06s
  🟩 GCC11              Pass: 100%/4   | Total: 11m 34s | Avg:  2m 53s | Max:  2m 56s
  🟩 GCC12              Pass: 100%/7   | Total:  1h 03m | Avg:  9m 01s | Max: 20m 05s
  🟩 GCC13              Pass: 100%/3   | Total:  7m 35s | Avg:  2m 31s | Max:  2m 36s
  🟩 MSVC14.36          Pass: 100%/1   | Total:  7m 30s | Avg:  7m 30s | Max:  7m 30s | Hits:  89%/119   
  🟩 MSVC14.39          Pass: 100%/1   | Total:  7m 10s | Avg:  7m 10s | Max:  7m 10s | Hits:  89%/119   
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  5m 02s
🟩 cxx_family
  🟩 Clang              Pass: 100%/30  | Total:  2h 00m | Avg:  4m 00s | Max: 16m 02s
  🟩 GCC                Pass: 100%/20  | Total:  1h 40m | Avg:  5m 00s | Max: 20m 05s
  🟩 MSVC               Pass: 100%/2   | Total: 14m 40s | Avg:  7m 20s | Max:  7m 30s | Hits:  89%/238   
  🟩 NVHPC              Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  5m 02s
🟩 gpu
  🟩 v100               Pass: 100%/54  | Total:  4h 04m | Avg:  4m 32s | Max: 20m 05s | Hits:  89%/238   
🟩 jobs
  🟩 Build              Pass: 100%/49  | Total:  2h 42m | Avg:  3m 18s | Max:  7m 30s | Hits:  89%/238   
  🟩 Test               Pass: 100%/5   | Total:  1h 22m | Avg: 16m 28s | Max: 20m 05s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  2m 53s | Avg:  2m 53s | Max:  2m 53s
  🟩 90a                Pass: 100%/1   | Total:  2m 36s | Avg:  2m 36s | Max:  2m 36s
🟩 std
  🟩 17                 Pass: 100%/29  | Total:  2h 02m | Avg:  4m 13s | Max: 20m 05s
  🟩 20                 Pass: 100%/25  | Total:  2h 02m | Avg:  4m 54s | Max: 16m 02s | Hits:  89%/238

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
+/-	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

🏃‍ Runner counts (total jobs: 54)

#	Runner
43	`linux-amd64-cpu16`
5	`linux-amd64-gpu-v100-latest-1`
4	`linux-arm64-cpu16`
2	`windows-amd64-cpu16`

miscco · 2024-11-06T17:32:22Z

cudax/include/cuda/experimental/__memory_resource/device_memory_pool.cuh

@@ -8,8 +8,8 @@
 //
 //===----------------------------------------------------------------------===//

-#ifndef _CUDAX__MEMORY_RESOURCE_CUDA_MEMORY_POOL


While we are moving those, do we want to just move everything into cudax?

I think we do, for now I would say it's a low priority item. But if we end up with vector needing allocate_async/deallocate_async, that would bump up the priority

…VIDIA#2710) * Rename the type * Update tests * Rename async memory pool * Rename the tests * Change name in the docs * Generalise the memory_pool_properties name * Fix docs --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

* copy pasted sample * First draft * Kernel functor and some other things * Clean up and break up long main function * Needs launch fix * Switch to copy_bytes and cleanups * Missing include * Add exception print and waive value * Adjust copy count * Add license and switch benchmark streams * Remove a function left as a mistake * Update copyright date Co-authored-by: Eric Niebler <[email protected]> * Setup cudax examples. (#2697) * Move the sample to new location and fix warning * build fixes and 0 return code on waive * Some new MSVC errors * explicit cast * Rename enable/disable peer access and separate the sample loop * Add `cuda::minimum` and `cuda::maximum` (#2681) * Add cuda::minimum and cuda::maximum * Various fixes to cub::DeviceTransform (#2709) * Workaround non-copyable iterators * Use a named constant for SMEM * Cast to raw reference 2 * Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg * Make `thrust::transform` use `cub::DeviceTransform` (#2389) * Add transform benchmark requiring a stable address * Make thrust::transform use cub::DeviceTransform * Introduces address stability detection and opt-in in libcu++ * Mark lambdas in Thrust BabelStream benchmark address oblivious * Optimize prefetch cub::DeviceTransform for small problems Fixes: #2263 * Ensure that we only use the inline variable trait when it is actually available (#2712) * Ensure that we only use the inline variable trait when it is actually available * Use the right define for internal traits * [CUDAX] Rename memory resource and memory pool from async to device (#2710) * Rename the type * Update tests * Rename async memory pool * Rename the tests * Change name in the docs * Generalise the memory_pool_properties name * Fix docs --------- Co-authored-by: Michael Schellenberger Costa <[email protected]> * Update memory resource name --------- Co-authored-by: Eric Niebler <[email protected]> Co-authored-by: Allison Piper <[email protected]> Co-authored-by: Jacob Faibussowitsch <[email protected]> Co-authored-by: Bernhard Manfred Gruber <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>

* copy pasted sample * First draft * Kernel functor and some other things * Clean up and break up long main function * Needs launch fix * Switch to copy_bytes and cleanups * Missing include * Add exception print and waive value * Adjust copy count * Add license and switch benchmark streams * Remove a function left as a mistake * Update copyright date Co-authored-by: Eric Niebler <[email protected]> * Setup cudax examples. (NVIDIA#2697) * Move the sample to new location and fix warning * build fixes and 0 return code on waive * Some new MSVC errors * explicit cast * Rename enable/disable peer access and separate the sample loop * Add `cuda::minimum` and `cuda::maximum` (NVIDIA#2681) * Add cuda::minimum and cuda::maximum * Various fixes to cub::DeviceTransform (NVIDIA#2709) * Workaround non-copyable iterators * Use a named constant for SMEM * Cast to raw reference 2 * Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg * Make `thrust::transform` use `cub::DeviceTransform` (NVIDIA#2389) * Add transform benchmark requiring a stable address * Make thrust::transform use cub::DeviceTransform * Introduces address stability detection and opt-in in libcu++ * Mark lambdas in Thrust BabelStream benchmark address oblivious * Optimize prefetch cub::DeviceTransform for small problems Fixes: NVIDIA#2263 * Ensure that we only use the inline variable trait when it is actually available (NVIDIA#2712) * Ensure that we only use the inline variable trait when it is actually available * Use the right define for internal traits * [CUDAX] Rename memory resource and memory pool from async to device (NVIDIA#2710) * Rename the type * Update tests * Rename async memory pool * Rename the tests * Change name in the docs * Generalise the memory_pool_properties name * Fix docs --------- Co-authored-by: Michael Schellenberger Costa <[email protected]> * Update memory resource name --------- Co-authored-by: Eric Niebler <[email protected]> Co-authored-by: Allison Piper <[email protected]> Co-authored-by: Jacob Faibussowitsch <[email protected]> Co-authored-by: Bernhard Manfred Gruber <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>

…VIDIA#2710) * Rename the type * Update tests * Rename async memory pool * Rename the tests * Change name in the docs * Generalise the memory_pool_properties name * Fix docs --------- Co-authored-by: Michael Schellenberger Costa <[email protected]>

pciolkosz added 6 commits November 5, 2024 16:20

Rename the type

e51bd6f

Update tests

8579e7f

Rename async memory pool

d8c0ecc

Rename the tests

8dea47c

Change name in the docs

afec675

Generalise the memory_pool_properties name

74ab5f5

Fix docs

9ff3154

pciolkosz marked this pull request as ready for review November 6, 2024 17:16

pciolkosz requested review from a team as code owners November 6, 2024 17:16

pciolkosz requested review from robertmaynard and miscco November 6, 2024 17:16

miscco approved these changes Nov 6, 2024

View reviewed changes

pciolkosz merged commit 6edb860 into NVIDIA:main Nov 6, 2024
71 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDAX] Rename memory resource and memory pool from async to device #2710

[CUDAX] Rename memory resource and memory pool from async to device #2710

pciolkosz commented Nov 6, 2024

github-actions bot commented Nov 6, 2024

🟩 cudax: Pass: 100%/54 | Total: 4h 37m | Avg: 5m 08s | Max: 16m 37s | Hits: 47%/238

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 54)

github-actions bot commented Nov 6, 2024

🟩 cudax: Pass: 100%/54 | Total: 4h 04m | Avg: 4m 32s | Max: 20m 05s | Hits: 89%/238

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 54)

miscco Nov 6, 2024

pciolkosz Nov 6, 2024

[CUDAX] Rename memory resource and memory pool from async to device #2710

[CUDAX] Rename memory resource and memory pool from async to device #2710

Conversation

pciolkosz commented Nov 6, 2024

github-actions bot commented Nov 6, 2024

🟩 cudax: Pass: 100%/54 | Total: 4h 37m | Avg: 5m 08s | Max: 16m 37s | Hits: 47%/238

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 54)

github-actions bot commented Nov 6, 2024

🟩 cudax: Pass: 100%/54 | Total: 4h 04m | Avg: 4m 32s | Max: 20m 05s | Hits: 89%/238

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 54)

miscco Nov 6, 2024

Choose a reason for hiding this comment

pciolkosz Nov 6, 2024

Choose a reason for hiding this comment