-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] HuggingFace/Pytorch with dask-cuda- worker does not free memory #383
Comments
I am seeing something similar. Oddly, I see two CUDA CONTEXT creations happening on Device 0 (two processes). Note, this only happens with |
I think the second device is the client process initializing as well so I don't think this is a big concern |
I suspect torch things are not being cleaned up nicely with dask-cuda but I don't know why. As a test I re-ran |
Is it possible this is related to the same Numba issue ( numba/numba#6147 )? Thinking about the multiple contexts on the same device. If so, could you please try downgrading to Also Peter made a fix yesterday ( #379 ) that we should make sure we are getting. |
Will try it, thanks for the tip.
So I was on
Let me update my env and rerun it. Thanks a lot for the support guys. |
Thanks @VibhuJawa! Please let us know how it goes 🙂 |
Were you able to make any progress here Vibhu or are you still stuck? |
Still stuck on it, This does not sadly seem to fix it for the workflow at rapidsai/gpu-bdb#84 , I will try to take some time to get you guys a better repro. Sorry for the delay on this, was pulled into other things. |
No worries. Thanks for the update 🙂 |
As an update on this below cleans up extra client.run(torch.cuda.empty_cache) But there still is some cleanup issues happening (without Gist: https://gist.github.com/VibhuJawa/bd06afceef8960ce5b99026c14ecac8e Example: from transformers import AutoModelForTokenClassification
import gc
import torch
model_path = 'bert-base-cased'
model = AutoModelForTokenClassification.from_pretrained(model_path)
model = model.cuda()
model = model.eval()
with torch.no_grad():
token_tensor = torch.randint(high=1000,size=(200,256)).long().cuda()
output = model(token_tensor)
del model
del token_tensor
del output
gc.collect()
torch.cuda.empty_cache() !nvidia-smi | head -n 10
Below OOMsimport rmm
rmm.reinitialize(pool_allocator=True,initial_pool_size=15e+9) ---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-5-53991fa7fde5> in <module>
1 import rmm
2 rmm.reinitialize(pool_allocator=True,
----> 3 initial_pool_size=15e+9)
/nvme/0/vjawa/conda/envs/tpcxbb-aug-31-pytorch/lib/python3.7/site-packages/rmm/rmm.py in reinitialize(pool_allocator, managed_memory, initial_pool_size, maximum_pool_size, devices, logging, log_file_name)
75 devices=devices,
76 logging=logging,
---> 77 log_file_name=log_file_name,
78 )
79
rmm/_lib/memory_resource.pyx in rmm._lib.memory_resource._initialize()
rmm/_lib/memory_resource.pyx in rmm._lib.memory_resource._initialize()
rmm/_lib/memory_resource.pyx in rmm._lib.memory_resource.PoolMemoryResource.__cinit__()
RuntimeError: RMM failure at: ../include/rmm/mr/device/pool_memory_resource.hpp:100: Initial pool size exceeds the maximum pool size! I am closing this issue here and will raise something on |
@VibhuJawa my guess here is that things blow up because pytorch isn't using RMM. Folks have filed an issue for using external allocators (like rmm) with pytorch: pytorch/pytorch#43144 |
Yup, FWIW, I won't need memory to be freed this aggresively if PyTorch was working with RMM. The current workflow is as follows:
And we want to the above without restarting workers/client as these restarts often are finicky on our lab machines especially at scale and can take up to 2+ minutes (if they work correctly) If we had RMM pool working as an external allocator with PyTorch we could have just 1 pool that gets re-used making workflow like above much more straightforward. |
When I run the same exact example within a
dask-cuda
worker memory is not freed but it is freed when I run it without it.Below Works :
Below Fails :
Minimal Gists
CC: @jakirkham / @randerzander .
The text was updated successfully, but these errors were encountered: