-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add alignment to cuda_malloc_async_memory_resource. #4923
Conversation
Signed-off-by: Michał Zientkiewicz <[email protected]>
CI MESSAGE: [8730704]: BUILD STARTED |
|
||
class aligned_alloc_helper { | ||
public: | ||
static constexpr size_t kCudaMallocAlignment = 256; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it given once and for all or may change depending on the version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's given - I asked about it on CUDA channel and all functions that allocate global memory return data aligned to at least 256B.
51b93c5
to
f99fdc5
Compare
Signed-off-by: Michał Zientkiewicz <[email protected]>
CI MESSAGE: [8731034]: BUILD STARTED |
CI MESSAGE: [8731034]: BUILD FAILED |
CI MESSAGE: [8731034]: BUILD PASSED |
* Overallocate and align allocations with alignment > 256B * Store a mapping from aligned to original addresses in a global map * Update tests Signed-off-by: Michał Zientkiewicz <[email protected]>
Category:
Bug fix
New feature
Description:
cudaMallocAsync
allocates memory aligned to a multiple of 256 bytes, however, DALI's memory_resource interface can be used to request overaligned memory. Before this PR, DALI would neither return properly aligned memory nor raise an error, should cudaMallocAsync return insufficiently aligned pointer.This PR adds overalignment support by requesting larger blocks for allocations with alignment >256B and aligning the pointer accoringly. The mapping aligned->original pointer is kept (in a global state) and upon deletion, the pointer being deleted is looked up in the map and, if found, replaced with the original pointer.
Additionally, a tiny performance bug is fixed in AsyncPool tests.
Additional information:
Affected modules and functionalities:
cuda_malloc_async_memory_resource
AsyncPool tests.
Key points relevant for the review:
Tests:
New tests addedTest adjustedChecklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A