-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a memory resource based on cudaMallocAsync #4900
Conversation
CI MESSAGE: [8557789]: BUILD STARTED |
CI MESSAGE: [8557789]: BUILD FAILED |
CI MESSAGE: [8557915]: BUILD STARTED |
VMM by setting ``DALI_USE_VMM=0``. This will cause ``cudaMalloc`` to be used as an upstream memory | ||
resource for the internal memory pool. | ||
|
||
Using ``cudaMallocAsync`` results in slightly slower execution, but it enables memory pool sharing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ``cudaMallocAsync`` results in slightly slower execution, but it enables memory pool sharing | |
Using ``cudaMallocAsync`` results in execution time dependent on the CUDA toolkit and driver version, but it enables memory pool sharing |
I don't think we can for sure claim that is slower in all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add some weasel word like "typically" and call it a day.
dali/core/mm/perf_test.cu
Outdated
|
||
template <typename Rep, typename Period> | ||
void print_time(std::ostream &os, std::chrono::duration<Rep, Period> time) { | ||
return format_time(seconds(time)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return from void function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch.
dali/core/mm/perf_test.cu
Outdated
} | ||
|
||
template <typename Rep, typename Period> | ||
void print_time(std::ostream &os, std::chrono::duration<Rep, Period> time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os
is unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire function is unused. Still, it seems quite useful; I'll move it to some test utils so we don't have to reinvent it.
dali/core/mm/perf_test.cu
Outdated
std::vector<std::thread> threads; | ||
for (int tid = 0; tid < num_threads; tid++) { | ||
threads.emplace_back([&, tid]() { | ||
(void)tid; // Make clang shut up; I prefer to keep this explicitly captured by value, even |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(void)tid; // Make clang shut up; I prefer to keep this explicitly captured by value, even | |
(void)tid; // Silences clang; I prefer to keep this explicitly captured by value, even |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kilroy Killjoy was here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\|||/
(o o)
----ooO-(_)-Ooo--------
CI MESSAGE: [8557915]: BUILD FAILED |
Signed-off-by: Michal Zientkiewicz <[email protected]>
Signed-off-by: Michal Zientkiewicz <[email protected]>
Signed-off-by: Michal Zientkiewicz <[email protected]>
Signed-off-by: Michal Zientkiewicz <[email protected]>
be19a11
to
a27223d
Compare
CI MESSAGE: [8610900]: BUILD STARTED |
CI MESSAGE: [8610900]: BUILD FAILED |
Signed-off-by: Michal Zientkiewicz <[email protected]>
a27223d
to
0dfb84a
Compare
CI MESSAGE: [8611162]: BUILD STARTED |
7c1ed20
to
c19feaf
Compare
@@ -235,6 +235,9 @@ inline std::shared_ptr<device_async_resource> CreateDefaultDeviceResource() { | |||
CUDA_CALL(cudaGetDevice(&device_id)); | |||
if (MMEnv::get().use_cuda_malloc_async) { | |||
#if CUDA_VERSION >= 11020 | |||
if (!cuda_malloc_async_memory_resource::is_supported(device_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if user wants async but it is not supported we want to verbosely fail, right? No silent fallbacks or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - it's not enabled by default, so I think we can fail if the user specifically requests this method.
Signed-off-by: Michal Zientkiewicz <[email protected]>
CI MESSAGE: [8611162]: BUILD PASSED |
CI MESSAGE: [8614458]: BUILD STARTED |
CI MESSAGE: [8614458]: BUILD PASSED |
CI MESSAGE: [8625577]: BUILD STARTED |
* Performance test DALI allocator vs cudaMallocAsync. * Add cuda_malloc_resource. Remove some unnecessary header dependency. * Improve environment variable handling in default_resources.cc * Update docs. --------- Signed-off-by: Michal Zientkiewicz <[email protected]>
Category:
New feature (non-breaking change which adds functionality)
Description:
This change adds a new memory resource that uses
cudaMallocAsync
under the hood.It can be enabled with an environment variable DALI_USE_CUDA_MALLOC_ASYNC.
Parsing of memory configuration variables is improved and detects contradictory settings.
Additionally, an unnecessary dependence on core/common.h was removed from some very basic header files, sometimes replaced with headers actually required (if any).
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A