-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LoftQ does not seem to quantify the base model #1525
Comments
Thanks a lot for reporting. Indeed, what is written in the documentation about how to initialize with LoftQ is incorrect, my bad. The correct way is unfortunately a bit more complicated. Please follow the instructions here: https://github.com/huggingface/peft/tree/main/examples/loftq_finetuning In the meantime, I'll be working on updating the docs and unit tests. |
Thank you very much! I will try the script immediately ~ |
Relates to huggingface#1525 Unfortunately, the docs I wrote about how to use LoftQ were incorrect, based on a misunderstanding I had. In reality, it is quite a bit more involved to get LoftQ working, requiring a complete roundtrip first loading a non-quantized model with LoftQ, saving the LoRA weights and the modified base model, loading the just stored base model again but this time with quantization, and finally loading the LoftQ-initialized adapter on top. The docs now link to the example which demosthenes how to move through these steps.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Note that the docs are updated and should now be correct. Also, in PEFT v0.10.0, we released a more convenient way to initialize with LoftQ (docs). |
System Info
transformers version: 4.37.2
Platform: Ubuntu 18.04.6 LTS
GPU: RTX GeForce 3090 x 2
Python version: 3.10.13
Huggingface_hub version: 0.20.3
Safetensors version: 0.4.2
Accelerate version: 0.26.1
Accelerate config: not found
PyTorch version (GPU?): 2.2.1+cu121 (True)
Tensorflow version (GPU?): not found
Flax version (CPU?/GPU?/TPU?): not found
Jax version: not found
JaxLib version: not found
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no
Peft version: 0.9.0
Trl version: 0.7.11
Who can help?
@pacman100 @stevhliu
Information
Tasks
examples
folderReproduction
I am attempting to fine-tune my model (Baichuan2-chat-7B) using LoftQ, but the results seem to differ from my expectations. Due to the computational resources available in our lab, I am using two 3090 GPUs for fine-tuning. I followed the method described at https://huggingface.co/docs/peft/en/developer_guides/lora#loftq:
However, I have found that LoftQ does not seem to quantize my LLM as expected. The actual usage of GPU memory is roughly the same as when directly loading the model (about 8GB on each GPU), with only an additional 2GB for LoRA used.
Could you please help me understand why this is happening? Is there something I am doing incorrectly?
Additionally, there is a discussion thread with a similar issue encountered by someone using the Llama-2-7b model:
https://discuss.huggingface.co/t/fine-tuning-for-llama2-based-model-with-loftq-quantization/66737/7
Expected behavior
After setting LoftQ,the usage of GPU memory could be able to drop significantly, comparable to the effect of using load_in_4bit.
The text was updated successfully, but these errors were encountered: