-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running with 12GB RAM (not VRAM)? #14
Comments
You mean CPU inference? I'll mark this as a duplicate of #10 and track them as one feature request. |
No not CPU, GPU inference but with low RAM (I have 16GB VRAM so it is fine, but the RAM is problematic) |
Ah right, I see what you mean. I think once the reload bug is fixed it should make the model smaller, although the T5 text encoder will still take up a lot of RAM so until we can get gguf quants of that working it probably won't work. #5 I assume the regular ComfyUI flux model doesn't work for you either without OOMing or swapping right? |
Actually it seems that model does not load in my VRAM but it loads in my RAM. |
It is during the gguf_sd_loader function. It seems that model is first loaded in RAM then would be transferred to VRAM? |
That's expected. We can technically load into VRAM directly but there's no easy way to check how much free VRAM we have/need since we don't know the size of the checkpoint until we load it (at least I don't think that's easy to check? gguf can use mmap so it might be possible to lazyload them as well.) |
@GXcells Okay so I think I got numpy's mmap to play ball and load it directly onto the GPU, or at least reduce memory pressure by a lot. Could you git pull and retest? |
Okay so i am with 12 gb of RAM (Not VRAM) will i be able to run this model?? has any one tried it? |
Minimum seems to be 13GBs system RAM even with the crappy FP8 T5 model :( Maybe if you close everything else and add some swap it could manage |
Working like a charm, but now I am stuck with the T4 GPU that throws a Cuda OOM when running the function sd.load_diffusion_model_state_dict, even with 4 bit quantized model, in a version of comfy that runs without UI |
You're not using the CUDA override node with a single GPU right? |
I am using the branch "totoro4" from https://github.com/camenduru/ComfyUI to use in a jupyter notebook |
Hmm, not familiar with that. T4 should have plenty of VRAM for that unless there's a memory leak or a bug in whatever cuda/numpy/etc versions you have? Or maybe that repository changes the way the function works and tries to load the state dict that we return to cuda? Not too sure, might look into it although there's other, more pressing things that need adding/fixing/optimizing for now. |
Yes a bit weird because I could run the 4 bits quantized models on a RTX3050 4GB VRAM thanks to your node in ComfyUI. |
Is there a way to run these models with 12 GB RAM?
With fp8 models it is working but with GGUF models it always fail.
The text was updated successfully, but these errors were encountered: