Replies: 3 comments
-
prefer float16 to reduce vram usage |
Beta Was this translation helpful? Give feedback.
-
Having similar problem when running (loading) more than one whisper model (e.g. for switching between different inference models). It would be helpful if the VRAM / GPU memory could be freed in between two model loads! Possible proposed solution like: does not have a big effect on the VRAM. Does anyone know how to free the VRAM occupied by a whisper model that is no longer needed?? |
Beta Was this translation helpful? Give feedback.
-
Any idea here? Are no-one else having the need to do garbage collection or similar? |
Beta Was this translation helpful? Give feedback.
-
Hi! I'm trying to transcribe audio on my 3060, with 12GB VRAM, running on pop OS.
I tried using this fine-tuned model for transcribing Cantonese, but I kept going out of memory. I watched the memory stats go from 500MB to 11GB before the crash, so I just assumed I didn't have enough memory for it.
But then I tried the original medium, large, and large-v2 models (which this model is based off of), and they all worked perfectly fine without a problem. I've also tried the fine-tuned model on CPU and it runs without a problem (though slower of course).
As far as I know, this model should have the same size as v2? So I have no idea what went wrong there.
Any suggestions would be very welcome. Thanks!
The error in question, though I'm sure you've seen it before:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 11.74 GiB of which 40.25 MiB is free. Including non-PyTorch memory, this process has 11.06 GiB memory in use. Of the allocated memory 10.68 GiB is allocated by PyTorch, and 289.59 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Beta Was this translation helpful? Give feedback.
All reactions