Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda Memory Leak #7300

Closed
AmesingFlank opened this issue Feb 4, 2023 · 3 comments
Closed

Cuda Memory Leak #7300

AmesingFlank opened this issue Feb 4, 2023 · 3 comments
Assignees
Milestone

Comments

@AmesingFlank
Copy link
Collaborator

I noticed a bug introduced by the PR #7008 which was supposed to fix #6924

Consider this code snippet

ti.init(ti.cuda, device_memory_GB = 5)

# this takes 2GB
f1 = ti.field(float, shape = (2,1000,1000,250))

# this takes 2GB of temporary memory, supposedly
f1.to_numpy()

# this takes 2GB
f2 = ti.field(float, shape = (2,1000,1000,250))

# ensure f2 is materialized
print(f2[0,0,0,0])

This should work, because the 2GB of temp memory should be freed immediately after use.

However it doesn't, and we get a crash:

Taichi JIT:0: allocate_from_buffer: block: [0,0,0], thread: [0,0,0] Assertion `Out of CUDA pre-allocated memory.

I think I figured out the reason: in #7008, we made it so that temporary memories are allocated from the pre-allocated CUDA memory, however, when it is freed, we are still using CUDADriver::get_instance().mem_free(...); This means all of the pre-allocated memory we allocate for temporary uses will be returned to the driver.

Since we haven't implemented a way to de-allocate memory from the pre-allocated pool, I think we should just revert #7008.

cc @jim19930609

@jim19930609
Copy link
Contributor

jim19930609 commented Feb 6, 2023

Looks like CudaDevice::dealloca_memory() does support returning to the pre-allocated pool:

  if (info.use_cached) {
    if (caching_allocator_ == nullptr) {
      TI_ERROR("the CudaCachingAllocator is not initialized");
    }
    caching_allocator_->release(info.size, (uint64_t *)info.ptr);  
}

Likely, there's some implementation bug. Assign to myself for further investigations

@jim19930609 jim19930609 self-assigned this Feb 6, 2023
@ailzhang ailzhang moved this from Untriaged to Todo in Taichi Lang Feb 10, 2023
@jim19930609
Copy link
Contributor

jim19930609 commented Mar 10, 2023

Okay this is caused by a legacy design issue with CUDA runtime. Basically, Taichi used to use separate memory managers for different data types, for instance Fields are managed by SnodeTreeBufferManager, whereas Ndarrays are managed by CacheAllocator.

The stupid part is that these separate managers doesn't share the same memory pool - memory released from Ndarrays can only get reused when allocating for new Ndarrays and the same for Fields. The evidence is that if you change the code a little bit:

import taichi as ti

ti.init(ti.cuda, device_memory_GB = 5)

# this takes 2GB
f1 = ti.field(float, shape = (2,1000,1000,250))

# this takes 2GB of temporary memory, supposedly
f1.to_numpy()

# this takes 2GB
f2 = ti.field(float, shape = (2,1000,1000,250))
#f2 = ti.ndarray(float, shape = (2,1000,1000,250))

# ensure f2 is materialized
print(f2[0,0,0,0])

It's gonna work, because the memory returned by f1.to_numpy() can be reused to create the 2GB Ndarray, but not the same sized Field.

Let's leave this issue open, and I'll start with unifying SNodeTreeBufferManager and CacheAllocator

@jim19930609 jim19930609 moved this from Todo to In Progress in Taichi Lang Mar 10, 2023
@jim19930609 jim19930609 added this to the v1.6.0 milestone Mar 10, 2023
ailzhang pushed a commit that referenced this issue Mar 15, 2023
…cator (#7531)

Issue: #7300 .
 
### Brief Summary

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
jim19930609 added a commit that referenced this issue Mar 21, 2023
…LlvmRuntimeExecutor (#7544)

Issue: #7300

### Brief Summary

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@jim19930609
Copy link
Contributor

No longer an issue after: #7795

@github-project-automation github-project-automation bot moved this from In Progress to Done in Taichi Lang Apr 21, 2023
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
…cator (taichi-dev#7531)

Issue: taichi-dev#7300 .
 
### Brief Summary

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
…LlvmRuntimeExecutor (taichi-dev#7544)

Issue: taichi-dev#7300

### Brief Summary

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants