Running with 12GB RAM (not VRAM)? #14

GXcells · 2024-08-15T18:33:38Z

Is there a way to run these models with 12 GB RAM?
With fp8 models it is working but with GGUF models it always fail.

city96 · 2024-08-15T20:24:45Z

You mean CPU inference? I'll mark this as a duplicate of #10 and track them as one feature request.

GXcells · 2024-08-15T20:33:54Z

No not CPU, GPU inference but with low RAM (I have 16GB VRAM so it is fine, but the RAM is problematic)

city96 · 2024-08-15T20:38:20Z

Ah right, I see what you mean. I think once the reload bug is fixed it should make the model smaller, although the T5 text encoder will still take up a lot of RAM so until we can get gguf quants of that working it probably won't work. #5

I assume the regular ComfyUI flux model doesn't work for you either without OOMing or swapping right?

GXcells · 2024-08-15T20:57:49Z

Actually it seems that model does not load in my VRAM but it loads in my RAM.
Not sure why

GXcells · 2024-08-15T21:30:13Z

It is during the gguf_sd_loader function. It seems that model is first loaded in RAM then would be transferred to VRAM?
Is there a way to directly do this in VRAM?

city96 · 2024-08-15T21:33:13Z

That's expected. We can technically load into VRAM directly but there's no easy way to check how much free VRAM we have/need since we don't know the size of the checkpoint until we load it (at least I don't think that's easy to check? gguf can use mmap so it might be possible to lazyload them as well.)

city96 · 2024-08-16T02:38:20Z

@GXcells Okay so I think I got numpy's mmap to play ball and load it directly onto the GPU, or at least reduce memory pressure by a lot. Could you git pull and retest?

Meshwa428 · 2024-08-16T04:48:17Z

Okay so i am with 12 gb of RAM (Not VRAM) will i be able to run this model?? has any one tried it?

city96 · 2024-08-16T05:12:10Z

Minimum seems to be 13GBs system RAM even with the crappy FP8 T5 model :(

Maybe if you close everything else and add some swap it could manage

GXcells · 2024-08-16T14:27:43Z

@GXcells Okay so I think I got numpy's mmap to play ball and load it directly onto the GPU, or at least reduce memory pressure by a lot. Could you git pull and retest?

Working like a charm, but now I am stuck with the T4 GPU that throws a Cuda OOM when running the function sd.load_diffusion_model_state_dict, even with 4 bit quantized model, in a version of comfy that runs without UI

city96 · 2024-08-16T14:51:04Z

You're not using the CUDA override node with a single GPU right?

GXcells · 2024-08-16T14:58:26Z

I am using the branch "totoro4" from https://github.com/camenduru/ComfyUI to use in a jupyter notebook

city96 · 2024-08-16T16:27:45Z

Hmm, not familiar with that. T4 should have plenty of VRAM for that unless there's a memory leak or a bug in whatever cuda/numpy/etc versions you have? Or maybe that repository changes the way the function works and tries to load the state dict that we return to cuda? Not too sure, might look into it although there's other, more pressing things that need adding/fixing/optimizing for now.

GXcells · 2024-08-16T17:37:33Z

pressing

Yes a bit weird because I could run the 4 bits quantized models on a RTX3050 4GB VRAM thanks to your node in ComfyUI.
I'll post an issue on the Camenduru github but I also believe they will probably implement your node soon.
I will stay with fp8 models for the time being on the T4 GPU.
Thanks a lot for your super support :)

city96 closed this as completed Aug 15, 2024

city96 reopened this Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running with 12GB RAM (not VRAM)? #14

Running with 12GB RAM (not VRAM)? #14

GXcells commented Aug 15, 2024

city96 commented Aug 15, 2024

GXcells commented Aug 15, 2024 •

edited

Loading

city96 commented Aug 15, 2024

GXcells commented Aug 15, 2024

GXcells commented Aug 15, 2024

city96 commented Aug 15, 2024

city96 commented Aug 16, 2024

Meshwa428 commented Aug 16, 2024

city96 commented Aug 16, 2024

GXcells commented Aug 16, 2024 •

edited

Loading

city96 commented Aug 16, 2024

GXcells commented Aug 16, 2024

city96 commented Aug 16, 2024

GXcells commented Aug 16, 2024

Running with 12GB RAM (not VRAM)? #14

Running with 12GB RAM (not VRAM)? #14

Comments

GXcells commented Aug 15, 2024

city96 commented Aug 15, 2024

GXcells commented Aug 15, 2024 • edited Loading

city96 commented Aug 15, 2024

GXcells commented Aug 15, 2024

GXcells commented Aug 15, 2024

city96 commented Aug 15, 2024

city96 commented Aug 16, 2024

Meshwa428 commented Aug 16, 2024

city96 commented Aug 16, 2024

GXcells commented Aug 16, 2024 • edited Loading

city96 commented Aug 16, 2024

GXcells commented Aug 16, 2024

city96 commented Aug 16, 2024

GXcells commented Aug 16, 2024

GXcells commented Aug 15, 2024 •

edited

Loading

GXcells commented Aug 16, 2024 •

edited

Loading