Add a memory resource based on cudaMallocAsync #4900

mzient · 2023-06-07T11:32:03Z

Category:

New feature (non-breaking change which adds functionality)

Description:

This change adds a new memory resource that uses cudaMallocAsync under the hood.
It can be enabled with an environment variable DALI_USE_CUDA_MALLOC_ASYNC.
Parsing of memory configuration variables is improved and detects contradictory settings.

Additionally, an unnecessary dependence on core/common.h was removed from some very basic header files, sometimes replaced with headers actually required (if any).

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: N/A

dali-automaton · 2023-06-07T11:43:01Z

CI MESSAGE: [8557789]: BUILD STARTED

dali-automaton · 2023-06-07T12:03:12Z

CI MESSAGE: [8557789]: BUILD FAILED

dali-automaton · 2023-06-07T12:03:31Z

CI MESSAGE: [8557915]: BUILD STARTED

JanuszL · 2023-06-07T12:31:27Z

docs/advanced_topics_performance_tuning.rst

+VMM by setting ``DALI_USE_VMM=0``. This will cause ``cudaMalloc`` to be used as an upstream memory
+resource for the internal memory pool.
+
+Using ``cudaMallocAsync`` results in slightly slower execution, but it enables memory pool sharing


Suggested change

Using ``cudaMallocAsync`` results in slightly slower execution, but it enables memory pool sharing

Using ``cudaMallocAsync`` results in execution time dependent on the CUDA toolkit and driver version, but it enables memory pool sharing

I don't think we can for sure claim that is slower in all cases.

I can add some weasel word like "typically" and call it a day.

JanuszL · 2023-06-07T12:39:54Z

dali/core/mm/perf_test.cu

+
+template <typename Rep, typename Period>
+void print_time(std::ostream &os, std::chrono::duration<Rep, Period> time) {
+  return format_time(seconds(time));


Return from void function?

Nice catch.

JanuszL · 2023-06-07T12:40:53Z

dali/core/mm/perf_test.cu

+}
+
+template <typename Rep, typename Period>
+void print_time(std::ostream &os, std::chrono::duration<Rep, Period> time) {


os is unused.

The entire function is unused. Still, it seems quite useful; I'll move it to some test utils so we don't have to reinvent it.

JanuszL · 2023-06-07T12:41:36Z

dali/core/mm/perf_test.cu

+  std::vector<std::thread> threads;
+  for (int tid = 0; tid < num_threads; tid++) {
+    threads.emplace_back([&, tid]() {
+      (void)tid;  // Make clang shut up; I prefer to keep this explicitly captured by value, even


Suggested change

(void)tid; // Make clang shut up; I prefer to keep this explicitly captured by value, even

(void)tid; // Silences clang; I prefer to keep this explicitly captured by value, even

~~Kilroy~~ Killjoy was here.

\|||/ (o o) ----ooO-(_)-Ooo--------

dali/core/mm/perf_test.cu

dali/core/mm/default_resources.cc

dali-automaton · 2023-06-07T14:14:34Z

CI MESSAGE: [8557915]: BUILD FAILED

Signed-off-by: Michal Zientkiewicz <[email protected]>

dali-automaton · 2023-06-12T11:01:11Z

CI MESSAGE: [8610900]: BUILD STARTED

dali-automaton · 2023-06-12T11:14:29Z

CI MESSAGE: [8610900]: BUILD FAILED

Signed-off-by: Michal Zientkiewicz <[email protected]>

dali-automaton · 2023-06-12T11:39:56Z

CI MESSAGE: [8611162]: BUILD STARTED

dali/test/timing.h

JanuszL · 2023-06-12T12:36:31Z

dali/core/mm/default_resources.cc

@@ -235,6 +235,9 @@ inline std::shared_ptr<device_async_resource> CreateDefaultDeviceResource() {
  CUDA_CALL(cudaGetDevice(&device_id));
  if (MMEnv::get().use_cuda_malloc_async) {
    #if CUDA_VERSION >= 11020
+      if (!cuda_malloc_async_memory_resource::is_supported(device_id))


So if user wants async but it is not supported we want to verbosely fail, right? No silent fallbacks or something.

Yes - it's not enabled by default, so I think we can fail if the user specifically requests this method.

include/dali/core/mm/malloc_resource.h

Signed-off-by: Michal Zientkiewicz <[email protected]>

dali-automaton · 2023-06-12T13:31:57Z

CI MESSAGE: [8611162]: BUILD PASSED

dali-automaton · 2023-06-12T16:58:07Z

CI MESSAGE: [8614458]: BUILD STARTED

dali-automaton · 2023-06-12T19:21:22Z

CI MESSAGE: [8614458]: BUILD PASSED

dali-automaton · 2023-06-13T09:11:49Z

CI MESSAGE: [8625577]: BUILD STARTED

* Performance test DALI allocator vs cudaMallocAsync. * Add cuda_malloc_resource. Remove some unnecessary header dependency. * Improve environment variable handling in default_resources.cc * Update docs. --------- Signed-off-by: Michal Zientkiewicz <[email protected]>

JanuszL reviewed Jun 7, 2023

View reviewed changes

dali/core/mm/perf_test.cu Outdated Show resolved Hide resolved

JanuszL reviewed Jun 7, 2023

View reviewed changes

dali/core/mm/default_resources.cc Show resolved Hide resolved

JanuszL self-assigned this Jun 7, 2023

jantonguirao assigned banasraf Jun 8, 2023

mzient added 4 commits June 12, 2023 10:26

Performance test DALI allocator vs cudaMallocAsync.

b22055b

Signed-off-by: Michal Zientkiewicz <[email protected]>

Add cuda_malloc_resource. Remove some unnecessary header dependency.

d5a96d7

Signed-off-by: Michal Zientkiewicz <[email protected]>

Update docs.

2b7b578

Signed-off-by: Michal Zientkiewicz <[email protected]>

Make clang shut up.

a0c5e48

Signed-off-by: Michal Zientkiewicz <[email protected]>

mzient force-pushed the cuda_malloc_async_resource branch from be19a11 to a27223d Compare June 12, 2023 11:00

Review issues; add is_supported to cuda_malloc_aync_memory_resource.

0dfb84a

Signed-off-by: Michal Zientkiewicz <[email protected]>

mzient force-pushed the cuda_malloc_async_resource branch from a27223d to 0dfb84a Compare June 12, 2023 11:39

Docs - less definitive wording.

c19feaf

mzient force-pushed the cuda_malloc_async_resource branch from 7c1ed20 to c19feaf Compare June 12, 2023 11:43

JanuszL reviewed Jun 12, 2023

View reviewed changes

dali/test/timing.h Outdated Show resolved Hide resolved

JanuszL reviewed Jun 12, 2023

View reviewed changes

include/dali/core/mm/malloc_resource.h Outdated Show resolved Hide resolved

JanuszL reviewed Jun 12, 2023

View reviewed changes

include/dali/core/mm/malloc_resource.h Outdated Show resolved Hide resolved

Fix typo; unify print_time return values.

b51ef68

Signed-off-by: Michal Zientkiewicz <[email protected]>

JanuszL approved these changes Jun 12, 2023

View reviewed changes

banasraf approved these changes Jun 12, 2023

View reviewed changes

mzient merged commit a805ced into NVIDIA:main Jun 13, 2023

JanuszL mentioned this pull request Sep 6, 2023

Roadmap 2023 #4578

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a memory resource based on cudaMallocAsync #4900

Add a memory resource based on cudaMallocAsync #4900

mzient commented Jun 7, 2023 •

edited

Loading

dali-automaton commented Jun 7, 2023

dali-automaton commented Jun 7, 2023

dali-automaton commented Jun 7, 2023

JanuszL Jun 7, 2023

mzient Jun 12, 2023

JanuszL Jun 7, 2023

mzient Jun 7, 2023

JanuszL Jun 7, 2023

mzient Jun 7, 2023

JanuszL Jun 7, 2023

mzient Jun 7, 2023

JanuszL Jun 7, 2023

dali-automaton commented Jun 7, 2023

dali-automaton commented Jun 12, 2023

dali-automaton commented Jun 12, 2023

dali-automaton commented Jun 12, 2023

JanuszL Jun 12, 2023

mzient Jun 12, 2023

dali-automaton commented Jun 12, 2023

dali-automaton commented Jun 12, 2023

dali-automaton commented Jun 12, 2023

dali-automaton commented Jun 13, 2023

	Using ``cudaMallocAsync`` results in slightly slower execution, but it enables memory pool sharing
	Using ``cudaMallocAsync`` results in execution time dependent on the CUDA toolkit and driver version, but it enables memory pool sharing

	(void)tid; // Make clang shut up; I prefer to keep this explicitly captured by value, even
	(void)tid; // Silences clang; I prefer to keep this explicitly captured by value, even

Add a memory resource based on cudaMallocAsync #4900

Add a memory resource based on cudaMallocAsync #4900

Conversation

mzient commented Jun 7, 2023 • edited Loading

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

dali-automaton commented Jun 7, 2023

dali-automaton commented Jun 7, 2023

dali-automaton commented Jun 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Jun 7, 2023

dali-automaton commented Jun 12, 2023

dali-automaton commented Jun 12, 2023

dali-automaton commented Jun 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Jun 12, 2023

dali-automaton commented Jun 12, 2023

dali-automaton commented Jun 12, 2023

dali-automaton commented Jun 13, 2023

mzient commented Jun 7, 2023 •

edited

Loading