Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caching allocator for pinned (page-locked) memory #618

Merged
merged 2 commits into from
Dec 1, 2016

Conversation

colesbury
Copy link
Contributor

Adds a caching allocator for CUDA pinned (page-locked) memory. This avoid synchronization due to cudaFreeHost (or cudaHostUnregister) calls.

To ensure read-after-write and write-after-read consistency, a CUDA event is recorded after every cudaMemcpyAsync between host and device involving pinned memory created by this allocator. Memory allocations are only re-used after they're freed and all associated CUDA events have completed.

Unlike the caching device allocator, allocations are never split. This means that requests for small allocations may be filled by much larger cached buffers. I think this should be OK in practice.

Also, CUDA events are processed in the order in which they're recorded, even though events may occur out-of-order between devices or streams. This does not affect correctness, but means that cached allocations may not be considered "ready" for re-use until a little later. In practice, I don't think this should matter.

To enable the caching pinned memory allocator and caching device allocator, set the environment variable THC_CACHING_ALLOCATOR=1

Adds a CUDA "sleep" kernel which spins for the given number of
iterations. This is useful for testing correct synchronization with
streams.
Adds a caching allocator for CUDA pinned (page-locked) memory. This
avoid synchronization due to cudaFreeHost or cudaHostUnregister at the
expense of potentially higher host memory usage.

Correctness is preserved by recording CUDA events after each
cudaMemcpyAsync involving the pinned memory. The pinned memory
allocations are not reused until all events associated with it have
completed.
@soumith soumith merged commit 0267dae into torch:master Dec 1, 2016
@colesbury colesbury deleted the cached_pinned_memory branch December 2, 2016 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants