Add caching allocator for pinned (page-locked) memory #618

colesbury · 2016-12-01T21:36:50Z

Adds a caching allocator for CUDA pinned (page-locked) memory. This avoid synchronization due to cudaFreeHost (or cudaHostUnregister) calls.

To ensure read-after-write and write-after-read consistency, a CUDA event is recorded after every cudaMemcpyAsync between host and device involving pinned memory created by this allocator. Memory allocations are only re-used after they're freed and all associated CUDA events have completed.

Unlike the caching device allocator, allocations are never split. This means that requests for small allocations may be filled by much larger cached buffers. I think this should be OK in practice.

Also, CUDA events are processed in the order in which they're recorded, even though events may occur out-of-order between devices or streams. This does not affect correctness, but means that cached allocations may not be considered "ready" for re-use until a little later. In practice, I don't think this should matter.

To enable the caching pinned memory allocator and caching device allocator, set the environment variable THC_CACHING_ALLOCATOR=1

Adds a CUDA "sleep" kernel which spins for the given number of iterations. This is useful for testing correct synchronization with streams.

Adds a caching allocator for CUDA pinned (page-locked) memory. This avoid synchronization due to cudaFreeHost or cudaHostUnregister at the expense of potentially higher host memory usage. Correctness is preserved by recording CUDA events after each cudaMemcpyAsync involving the pinned memory. The pinned memory allocations are not reused until all events associated with it have completed.

colesbury added 2 commits December 1, 2016 12:45

Adds a CUDA "sleep" kernel

9d8e13d

Adds a CUDA "sleep" kernel which spins for the given number of iterations. This is useful for testing correct synchronization with streams.

colesbury mentioned this pull request Dec 1, 2016

Enable caching allocator for pinned (page-locked) memory pytorch/pytorch#275

Merged

soumith merged commit 0267dae into torch:master Dec 1, 2016

colesbury deleted the cached_pinned_memory branch December 2, 2016 03:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add caching allocator for pinned (page-locked) memory #618

Add caching allocator for pinned (page-locked) memory #618

colesbury commented Dec 1, 2016

Add caching allocator for pinned (page-locked) memory #618

Add caching allocator for pinned (page-locked) memory #618

Conversation

colesbury commented Dec 1, 2016