Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDAX] Rename memory resource and memory pool from async to device #2710

Merged
merged 7 commits into from
Nov 6, 2024

Conversation

pciolkosz
Copy link
Contributor

The direction we want to take with memory resources in CUDAX is to use memory pools for all kinds of allocations. Memory resources operating on pools will provide both synchronous and asynchronous allocation functionality.

We will end up with memory pools and memory resources for all kinds of memory. For now we should rename the current async_memory_resource and async_memory_pool to device_*, since the current ones allocate device memory.

This change is a very simple search and replace, lets see if there are any issues in CI.

Copy link
Contributor

github-actions bot commented Nov 6, 2024

🟩 CI finished in 21m 09s: Pass: 100%/54 | Total: 4h 37m | Avg: 5m 08s | Max: 16m 37s | Hits: 47%/238
  • 🟩 cudax: Pass: 100%/54 | Total: 4h 37m | Avg: 5m 08s | Max: 16m 37s | Hits: 47%/238

    🟩 cpu
      🟩 amd64              Pass: 100%/50  | Total:  4h 23m | Avg:  5m 16s | Max: 16m 37s | Hits:  47%/238   
      🟩 arm64              Pass: 100%/4   | Total: 14m 14s | Avg:  3m 33s | Max:  4m 05s
    🟩 ctk
      🟩 12.0               Pass: 100%/19  | Total:  1h 38m | Avg:  5m 11s | Max: 16m 37s | Hits:  47%/119   
      🟩 12.5               Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max:  6m 27s
      🟩 12.6               Pass: 100%/33  | Total:  2h 46m | Avg:  5m 03s | Max: 15m 50s | Hits:  47%/119   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 38m | Avg:  5m 11s | Max: 16m 37s | Hits:  47%/119   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max:  6m 27s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 46m | Avg:  5m 03s | Max: 15m 50s | Hits:  47%/119   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/54  | Total:  4h 37m | Avg:  5m 08s | Max: 16m 37s | Hits:  47%/238   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  7m 53s | Avg:  3m 56s | Max:  4m 18s
      🟩 Clang10            Pass: 100%/2   | Total:  8m 04s | Avg:  4m 02s | Max:  4m 16s
      🟩 Clang11            Pass: 100%/4   | Total: 14m 15s | Avg:  3m 33s | Max:  3m 50s
      🟩 Clang12            Pass: 100%/4   | Total: 14m 52s | Avg:  3m 43s | Max:  4m 01s
      🟩 Clang13            Pass: 100%/4   | Total: 15m 03s | Avg:  3m 45s | Max:  3m 58s
      🟩 Clang14            Pass: 100%/4   | Total: 27m 50s | Avg:  6m 57s | Max: 16m 21s
      🟩 Clang15            Pass: 100%/2   | Total:  7m 57s | Avg:  3m 58s | Max:  4m 10s
      🟩 Clang16            Pass: 100%/4   | Total: 15m 13s | Avg:  3m 48s | Max:  4m 05s
      🟩 Clang17            Pass: 100%/2   | Total:  7m 43s | Avg:  3m 51s | Max:  3m 56s
      🟩 Clang18            Pass: 100%/2   | Total: 18m 34s | Avg:  9m 17s | Max: 14m 51s
      🟩 GCC9               Pass: 100%/2   | Total:  7m 01s | Avg:  3m 30s | Max:  3m 31s
      🟩 GCC10              Pass: 100%/4   | Total: 15m 05s | Avg:  3m 46s | Max:  4m 01s
      🟩 GCC11              Pass: 100%/4   | Total: 14m 23s | Avg:  3m 35s | Max:  3m 42s
      🟩 GCC12              Pass: 100%/7   | Total:  1h 03m | Avg:  9m 01s | Max: 16m 37s
      🟩 GCC13              Pass: 100%/3   | Total:  9m 48s | Avg:  3m 16s | Max:  3m 31s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 17s | Avg:  9m 17s | Max:  9m 17s | Hits:  47%/119   
      🟩 MSVC14.39          Pass: 100%/1   | Total:  9m 07s | Avg:  9m 07s | Max:  9m 07s | Hits:  47%/119   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max:  6m 27s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  2h 17m | Avg:  4m 34s | Max: 16m 21s
      🟩 GCC                Pass: 100%/20  | Total:  1h 49m | Avg:  5m 28s | Max: 16m 37s
      🟩 MSVC               Pass: 100%/2   | Total: 18m 24s | Avg:  9m 12s | Max:  9m 17s | Hits:  47%/238   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 38s | Avg:  6m 19s | Max:  6m 27s
    🟩 gpu
      🟩 v100               Pass: 100%/54  | Total:  4h 37m | Avg:  5m 08s | Max: 16m 37s | Hits:  47%/238   
    🟩 jobs
      🟩 Build              Pass: 100%/49  | Total:  3h 18m | Avg:  4m 03s | Max:  9m 17s | Hits:  47%/238   
      🟩 Test               Pass: 100%/5   | Total:  1h 19m | Avg: 15m 48s | Max: 16m 37s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 58s | Avg:  2m 58s | Max:  2m 58s
      🟩 90a                Pass: 100%/1   | Total:  2m 55s | Avg:  2m 55s | Max:  2m 55s
    🟩 std
      🟩 17                 Pass: 100%/29  | Total:  2h 15m | Avg:  4m 40s | Max: 16m 37s
      🟩 20                 Pass: 100%/25  | Total:  2h 22m | Avg:  5m 42s | Max: 16m 21s | Hits:  47%/238   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 54)

# Runner
43 linux-amd64-cpu16
5 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

Copy link
Contributor

github-actions bot commented Nov 6, 2024

🟩 CI finished in 24m 32s: Pass: 100%/54 | Total: 4h 04m | Avg: 4m 32s | Max: 20m 05s | Hits: 89%/238
  • 🟩 cudax: Pass: 100%/54 | Total: 4h 04m | Avg: 4m 32s | Max: 20m 05s | Hits: 89%/238

    🟩 cpu
      🟩 amd64              Pass: 100%/50  | Total:  3h 54m | Avg:  4m 41s | Max: 20m 05s | Hits:  89%/238   
      🟩 arm64              Pass: 100%/4   | Total: 10m 01s | Avg:  2m 30s | Max:  2m 31s
    🟩 ctk
      🟩 12.0               Pass: 100%/19  | Total:  1h 25m | Avg:  4m 29s | Max: 15m 42s | Hits:  89%/119   
      🟩 12.5               Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  5m 02s
      🟩 12.6               Pass: 100%/33  | Total:  2h 29m | Avg:  4m 32s | Max: 20m 05s | Hits:  89%/119   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 25m | Avg:  4m 29s | Max: 15m 42s | Hits:  89%/119   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  5m 02s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 29m | Avg:  4m 32s | Max: 20m 05s | Hits:  89%/119   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/54  | Total:  4h 04m | Avg:  4m 32s | Max: 20m 05s | Hits:  89%/238   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  9m 57s | Avg:  4m 58s | Max:  7m 02s
      🟩 Clang10            Pass: 100%/2   | Total:  6m 11s | Avg:  3m 05s | Max:  3m 22s
      🟩 Clang11            Pass: 100%/4   | Total: 11m 57s | Avg:  2m 59s | Max:  3m 08s
      🟩 Clang12            Pass: 100%/4   | Total: 12m 24s | Avg:  3m 06s | Max:  3m 17s
      🟩 Clang13            Pass: 100%/4   | Total: 11m 44s | Avg:  2m 56s | Max:  3m 13s
      🟩 Clang14            Pass: 100%/4   | Total: 24m 55s | Avg:  6m 13s | Max: 15m 42s
      🟩 Clang15            Pass: 100%/2   | Total:  6m 21s | Avg:  3m 10s | Max:  3m 14s
      🟩 Clang16            Pass: 100%/4   | Total: 11m 25s | Avg:  2m 51s | Max:  3m 19s
      🟩 Clang17            Pass: 100%/2   | Total:  6m 07s | Avg:  3m 03s | Max:  3m 07s
      🟩 Clang18            Pass: 100%/2   | Total: 19m 10s | Avg:  9m 35s | Max: 16m 02s
      🟩 GCC9               Pass: 100%/2   | Total:  6m 11s | Avg:  3m 05s | Max:  3m 10s
      🟩 GCC10              Pass: 100%/4   | Total: 11m 34s | Avg:  2m 53s | Max:  3m 06s
      🟩 GCC11              Pass: 100%/4   | Total: 11m 34s | Avg:  2m 53s | Max:  2m 56s
      🟩 GCC12              Pass: 100%/7   | Total:  1h 03m | Avg:  9m 01s | Max: 20m 05s
      🟩 GCC13              Pass: 100%/3   | Total:  7m 35s | Avg:  2m 31s | Max:  2m 36s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  7m 30s | Avg:  7m 30s | Max:  7m 30s | Hits:  89%/119   
      🟩 MSVC14.39          Pass: 100%/1   | Total:  7m 10s | Avg:  7m 10s | Max:  7m 10s | Hits:  89%/119   
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  5m 02s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  2h 00m | Avg:  4m 00s | Max: 16m 02s
      🟩 GCC                Pass: 100%/20  | Total:  1h 40m | Avg:  5m 00s | Max: 20m 05s
      🟩 MSVC               Pass: 100%/2   | Total: 14m 40s | Avg:  7m 20s | Max:  7m 30s | Hits:  89%/238   
      🟩 NVHPC              Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  5m 02s
    🟩 gpu
      🟩 v100               Pass: 100%/54  | Total:  4h 04m | Avg:  4m 32s | Max: 20m 05s | Hits:  89%/238   
    🟩 jobs
      🟩 Build              Pass: 100%/49  | Total:  2h 42m | Avg:  3m 18s | Max:  7m 30s | Hits:  89%/238   
      🟩 Test               Pass: 100%/5   | Total:  1h 22m | Avg: 16m 28s | Max: 20m 05s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 53s | Avg:  2m 53s | Max:  2m 53s
      🟩 90a                Pass: 100%/1   | Total:  2m 36s | Avg:  2m 36s | Max:  2m 36s
    🟩 std
      🟩 17                 Pass: 100%/29  | Total:  2h 02m | Avg:  4m 13s | Max: 20m 05s
      🟩 20                 Pass: 100%/25  | Total:  2h 02m | Avg:  4m 54s | Max: 16m 02s | Hits:  89%/238   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 54)

# Runner
43 linux-amd64-cpu16
5 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

@pciolkosz pciolkosz marked this pull request as ready for review November 6, 2024 17:16
@pciolkosz pciolkosz requested review from a team as code owners November 6, 2024 17:16
@@ -8,8 +8,8 @@
//
//===----------------------------------------------------------------------===//

#ifndef _CUDAX__MEMORY_RESOURCE_CUDA_MEMORY_POOL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are moving those, do we want to just move everything into cudax?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do, for now I would say it's a low priority item. But if we end up with vector needing allocate_async/deallocate_async, that would bump up the priority

@pciolkosz pciolkosz merged commit 6edb860 into NVIDIA:main Nov 6, 2024
71 checks passed
pciolkosz added a commit to pciolkosz/cccl that referenced this pull request Nov 6, 2024
…VIDIA#2710)

* Rename the type

* Update tests

* Rename async memory pool

* Rename the tests

* Change name in the docs

* Generalise the memory_pool_properties name

* Fix docs

---------

Co-authored-by: Michael Schellenberger Costa <[email protected]>
fbusato pushed a commit to fbusato/cccl that referenced this pull request Nov 9, 2024
…VIDIA#2710)

* Rename the type

* Update tests

* Rename async memory pool

* Rename the tests

* Change name in the docs

* Generalise the memory_pool_properties name

* Fix docs

---------

Co-authored-by: Michael Schellenberger Costa <[email protected]>
pciolkosz added a commit that referenced this pull request Nov 11, 2024
* copy pasted sample

* First draft

* Kernel functor and some other things

* Clean up and break up long main function

* Needs launch fix

* Switch to copy_bytes and cleanups

* Missing include

* Add exception print and waive value

* Adjust copy count

* Add license and switch benchmark streams

* Remove a function left as a mistake

* Update copyright date

Co-authored-by: Eric Niebler <[email protected]>

* Setup cudax examples. (#2697)

* Move the sample to new location and fix warning

* build fixes and 0 return code on waive

* Some new MSVC errors

* explicit cast

* Rename enable/disable peer access and separate the sample loop

* Add `cuda::minimum` and `cuda::maximum` (#2681)

* Add cuda::minimum and cuda::maximum

* Various fixes to cub::DeviceTransform (#2709)

* Workaround non-copyable iterators
* Use a named constant for SMEM
* Cast to raw reference 2
* Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg

* Make `thrust::transform` use `cub::DeviceTransform` (#2389)

* Add transform benchmark requiring a stable address
* Make thrust::transform use cub::DeviceTransform
* Introduces address stability detection and opt-in in libcu++
* Mark lambdas in Thrust BabelStream benchmark address oblivious
* Optimize prefetch cub::DeviceTransform for small problems

Fixes: #2263

* Ensure that we only use the inline variable trait when it is actually available (#2712)

* Ensure that we only use the inline variable trait when it is actually available

* Use the right define for internal traits

* [CUDAX] Rename memory resource and memory pool from async to device (#2710)

* Rename the type

* Update tests

* Rename async memory pool

* Rename the tests

* Change name in the docs

* Generalise the memory_pool_properties name

* Fix docs

---------

Co-authored-by: Michael Schellenberger Costa <[email protected]>

* Update memory resource name

---------

Co-authored-by: Eric Niebler <[email protected]>
Co-authored-by: Allison Piper <[email protected]>
Co-authored-by: Jacob Faibussowitsch <[email protected]>
Co-authored-by: Bernhard Manfred Gruber <[email protected]>
Co-authored-by: Michael Schellenberger Costa <[email protected]>
fbusato pushed a commit to fbusato/cccl that referenced this pull request Nov 12, 2024
* copy pasted sample

* First draft

* Kernel functor and some other things

* Clean up and break up long main function

* Needs launch fix

* Switch to copy_bytes and cleanups

* Missing include

* Add exception print and waive value

* Adjust copy count

* Add license and switch benchmark streams

* Remove a function left as a mistake

* Update copyright date

Co-authored-by: Eric Niebler <[email protected]>

* Setup cudax examples. (NVIDIA#2697)

* Move the sample to new location and fix warning

* build fixes and 0 return code on waive

* Some new MSVC errors

* explicit cast

* Rename enable/disable peer access and separate the sample loop

* Add `cuda::minimum` and `cuda::maximum` (NVIDIA#2681)

* Add cuda::minimum and cuda::maximum

* Various fixes to cub::DeviceTransform (NVIDIA#2709)

* Workaround non-copyable iterators
* Use a named constant for SMEM
* Cast to raw reference 2
* Fix passing non-copy-assignable iterators to transform_kernel via kernel_arg

* Make `thrust::transform` use `cub::DeviceTransform` (NVIDIA#2389)

* Add transform benchmark requiring a stable address
* Make thrust::transform use cub::DeviceTransform
* Introduces address stability detection and opt-in in libcu++
* Mark lambdas in Thrust BabelStream benchmark address oblivious
* Optimize prefetch cub::DeviceTransform for small problems

Fixes: NVIDIA#2263

* Ensure that we only use the inline variable trait when it is actually available (NVIDIA#2712)

* Ensure that we only use the inline variable trait when it is actually available

* Use the right define for internal traits

* [CUDAX] Rename memory resource and memory pool from async to device (NVIDIA#2710)

* Rename the type

* Update tests

* Rename async memory pool

* Rename the tests

* Change name in the docs

* Generalise the memory_pool_properties name

* Fix docs

---------

Co-authored-by: Michael Schellenberger Costa <[email protected]>

* Update memory resource name

---------

Co-authored-by: Eric Niebler <[email protected]>
Co-authored-by: Allison Piper <[email protected]>
Co-authored-by: Jacob Faibussowitsch <[email protected]>
Co-authored-by: Bernhard Manfred Gruber <[email protected]>
Co-authored-by: Michael Schellenberger Costa <[email protected]>
fbusato pushed a commit to fbusato/cccl that referenced this pull request Jan 9, 2025
…VIDIA#2710)

* Rename the type

* Update tests

* Rename async memory pool

* Rename the tests

* Change name in the docs

* Generalise the memory_pool_properties name

* Fix docs

---------

Co-authored-by: Michael Schellenberger Costa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants