Skip to content

Accepts a Hugging Face model URL, automatically downloads and quantizes it using Bits and Bytes.

License

Notifications You must be signed in to change notification settings

CharlesMod/quantizeHFmodel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

quantizeHFmodel

Accepts Hugging Face models, and automatically downloads and quantizes it with Bits And Bytes (BNB). This process can be done entirely with CPU and RAM with acceptable performance. (About 10 minutes to quantize a 90GB model for me.)

Generates q4_k_m, q5_k_m, q8_0 by default.

Remember to export your Hugging Face Token like so:

export HUGGING_FACE_HUB_TOKEN="YOUR_TOKEN"

An example of using the script is like:

python3 quantizeHFmodel.py fireworks-ai/firefunction-v1

I'm also hosting quantizeHQQ here - it does the same thing except quantizes with HQQ (https://github.com/mobiusml/hqq), theoretically yielding a better quality quant. However, this takes crazy amounts of VRAM to do, on the order of >100GB. I don't have that, but if this is useful to you, more power to you.

About

Accepts a Hugging Face model URL, automatically downloads and quantizes it using Bits and Bytes.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages