Cuda Memory Leak #7300

AmesingFlank · 2023-02-04T00:49:16Z

I noticed a bug introduced by the PR #7008 which was supposed to fix #6924

Consider this code snippet

ti.init(ti.cuda, device_memory_GB = 5)

# this takes 2GB
f1 = ti.field(float, shape = (2,1000,1000,250))

# this takes 2GB of temporary memory, supposedly
f1.to_numpy()

# this takes 2GB
f2 = ti.field(float, shape = (2,1000,1000,250))

# ensure f2 is materialized
print(f2[0,0,0,0])

This should work, because the 2GB of temp memory should be freed immediately after use.

However it doesn't, and we get a crash:

Taichi JIT:0: allocate_from_buffer: block: [0,0,0], thread: [0,0,0] Assertion `Out of CUDA pre-allocated memory.

I think I figured out the reason: in #7008, we made it so that temporary memories are allocated from the pre-allocated CUDA memory, however, when it is freed, we are still using CUDADriver::get_instance().mem_free(...); This means all of the pre-allocated memory we allocate for temporary uses will be returned to the driver.

Since we haven't implemented a way to de-allocate memory from the pre-allocated pool, I think we should just revert #7008.

cc @jim19930609

The text was updated successfully, but these errors were encountered:

jim19930609 · 2023-02-06T01:23:49Z

Looks like CudaDevice::dealloca_memory() does support returning to the pre-allocated pool:

  if (info.use_cached) {
    if (caching_allocator_ == nullptr) {
      TI_ERROR("the CudaCachingAllocator is not initialized");
    }
    caching_allocator_->release(info.size, (uint64_t *)info.ptr);  
}

Likely, there's some implementation bug. Assign to myself for further investigations

jim19930609 · 2023-03-10T07:57:39Z

Okay this is caused by a legacy design issue with CUDA runtime. Basically, Taichi used to use separate memory managers for different data types, for instance Fields are managed by SnodeTreeBufferManager, whereas Ndarrays are managed by CacheAllocator.

The stupid part is that these separate managers doesn't share the same memory pool - memory released from Ndarrays can only get reused when allocating for new Ndarrays and the same for Fields. The evidence is that if you change the code a little bit:

import taichi as ti

ti.init(ti.cuda, device_memory_GB = 5)

# this takes 2GB
f1 = ti.field(float, shape = (2,1000,1000,250))

# this takes 2GB of temporary memory, supposedly
f1.to_numpy()

# this takes 2GB
f2 = ti.field(float, shape = (2,1000,1000,250))
#f2 = ti.ndarray(float, shape = (2,1000,1000,250))

# ensure f2 is materialized
print(f2[0,0,0,0])

It's gonna work, because the memory returned by f1.to_numpy() can be reused to create the 2GB Ndarray, but not the same sized Field.

Let's leave this issue open, and I'll start with unifying SNodeTreeBufferManager and CacheAllocator

…cator (#7531) Issue: #7300 . ### Brief Summary --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…LlvmRuntimeExecutor (#7544) Issue: #7300 ### Brief Summary --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

jim19930609 · 2023-04-18T05:15:06Z

No longer an issue after: #7795

…cator (taichi-dev#7531) Issue: taichi-dev#7300 . ### Brief Summary --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…LlvmRuntimeExecutor (taichi-dev#7544) Issue: taichi-dev#7300 ### Brief Summary --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

taichi-gardener added this to Taichi Lang Feb 4, 2023

github-project-automation bot moved this to Untriaged in Taichi Lang Feb 4, 2023

jim19930609 self-assigned this Feb 6, 2023

ailzhang moved this from Untriaged to Todo in Taichi Lang Feb 10, 2023

jim19930609 moved this from Todo to In Progress in Taichi Lang Mar 10, 2023

jim19930609 added this to the v1.6.0 milestone Mar 10, 2023

jim19930609 mentioned this issue Mar 20, 2023

[RFC] Refactor plan for memory pool & allocators #7599

Open

feisuzhu closed this as completed Apr 21, 2023

github-project-automation bot moved this from In Progress to Done in Taichi Lang Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda Memory Leak #7300

Cuda Memory Leak #7300

AmesingFlank commented Feb 4, 2023

jim19930609 commented Feb 6, 2023 •

edited

Loading

jim19930609 commented Mar 10, 2023 •

edited

Loading

jim19930609 commented Apr 18, 2023

Cuda Memory Leak #7300

Cuda Memory Leak #7300

Comments

AmesingFlank commented Feb 4, 2023

jim19930609 commented Feb 6, 2023 • edited Loading

jim19930609 commented Mar 10, 2023 • edited Loading

jim19930609 commented Apr 18, 2023

jim19930609 commented Feb 6, 2023 •

edited

Loading

jim19930609 commented Mar 10, 2023 •

edited

Loading