Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of Memory error while doing Inference on GPU #9

Open
np-n opened this issue Nov 27, 2024 · 0 comments
Open

Out of Memory error while doing Inference on GPU #9

np-n opened this issue Nov 27, 2024 · 0 comments

Comments

@np-n
Copy link

np-n commented Nov 27, 2024

Hello there,

Thank you for publishing model on the physionet.org. I have downloaded the Me-LLaMA-13b-chat model on my device having 24GB GPU and tried to make inference on it. But I couldn't load even the model. In documentation I have found that it can be fine-tuned on at least 24 GB GPU. But I am little bit confused here. Model is not loaded in the 24GB GPU even, how fine-tuning is possible. I have tried by reducing the model to 16 bit precision as well but still couldn't load on 24GB. Is there are any other way to load the model on the 24GB GPU without performing the quantization operation?

I have tried to make the inference on the model using following source code:

from transformers import AutoTokenizer, AutoModelForCausalLM 
import torch
torch.cuda.empty_cache()

model_file = "./physionet.org/files/me-llama/1.0.0/MeLLaMA-13B-chat"
prompt = "I am suffering from flu, give me home remedies?"

# Check if GPU is available and set the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the tokenizer and model from your local model directory.
tokenizer = AutoTokenizer.from_pretrained(model_file)
model = AutoModelForCausalLM.from_pretrained(model_file).to(device)

It trows out of memory error while loading the model. And my GPU memory have been fully utilized.

OutOfMemoryError: CUDA out of memory. Tried to allocate 100.00 MiB. GPU 0 has a total capacity of 23.69 GiB of which 81.69 MiB is free. Including non-PyTorch memory, this process has 23.61 GiB memory in use. Of the allocated memory 23.36 GiB is allocated by PyTorch, and 1.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Following image shows the GPU image usage statistics:
image

Note: In CPU. it tooks around 49GB RAM on my PC.

Please help me if you have any idea about how to run the model on the 24 GB GPU efficiently. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant