Out of Memory error while doing Inference on GPU #9

np-n · 2024-11-27T06:38:50Z

Hello there,

Thank you for publishing model on the physionet.org. I have downloaded the Me-LLaMA-13b-chat model on my device having 24GB GPU and tried to make inference on it. But I couldn't load even the model. In documentation I have found that it can be fine-tuned on at least 24 GB GPU. But I am little bit confused here. Model is not loaded in the 24GB GPU even, how fine-tuning is possible. I have tried by reducing the model to 16 bit precision as well but still couldn't load on 24GB. Is there are any other way to load the model on the 24GB GPU without performing the quantization operation?

I have tried to make the inference on the model using following source code:

from transformers import AutoTokenizer, AutoModelForCausalLM 
import torch
torch.cuda.empty_cache()

model_file = "./physionet.org/files/me-llama/1.0.0/MeLLaMA-13B-chat"
prompt = "I am suffering from flu, give me home remedies?"

# Check if GPU is available and set the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the tokenizer and model from your local model directory.
tokenizer = AutoTokenizer.from_pretrained(model_file)
model = AutoModelForCausalLM.from_pretrained(model_file).to(device)

It trows out of memory error while loading the model. And my GPU memory have been fully utilized.

OutOfMemoryError: CUDA out of memory. Tried to allocate 100.00 MiB. GPU 0 has a total capacity of 23.69 GiB of which 81.69 MiB is free. Including non-PyTorch memory, this process has 23.61 GiB memory in use. Of the allocated memory 23.36 GiB is allocated by PyTorch, and 1.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Following image shows the GPU image usage statistics:

Note: In CPU. it tooks around 49GB RAM on my PC.

Please help me if you have any idea about how to run the model on the 24 GB GPU efficiently. Thank you.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of Memory error while doing Inference on GPU #9

Out of Memory error while doing Inference on GPU #9

np-n commented Nov 27, 2024

Out of Memory error while doing Inference on GPU #9

Out of Memory error while doing Inference on GPU #9

Comments

np-n commented Nov 27, 2024