Is it possible to run the ggufs on cpu like llama? #10

TingTingin · 2024-08-15T17:00:17Z

It would be nice to be able to run ggufs on cpu like you can with llama gguf I don't know of what the speed would look like but could be better for people with low vram gpus

Also i haven't looked at the code but i believe ggufs have more efficient mem allocation built in i.e if you choose to split the model between gpu and cpu it wont be as bad as typical mem overflow from pytorch if its possible for this to be implemented it would also be a nice feature to have for those with low vram gpus

city96 · 2024-08-15T20:21:58Z

We're only using gguf as a storage medium here without the surrounding llama.cpp library, so we would have to rely on the ComfyUI lowvram mode (which will need some extra changes to work).

city96 mentioned this issue Aug 15, 2024

Running with 12GB RAM (not VRAM)? #14

Open

asomoza mentioned this issue Aug 19, 2024

NF4 Flux params in diffusers huggingface/diffusers#9165

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to run the ggufs on cpu like llama? #10

Is it possible to run the ggufs on cpu like llama? #10

TingTingin commented Aug 15, 2024

city96 commented Aug 15, 2024

Is it possible to run the ggufs on cpu like llama? #10

Is it possible to run the ggufs on cpu like llama? #10

Comments

TingTingin commented Aug 15, 2024

city96 commented Aug 15, 2024