LoftQ does not seem to quantify the base model #1525

Mr-KenLee · 2024-03-04T05:18:30Z

System Info

transformers version: 4.37.2
Platform: Ubuntu 18.04.6 LTS
GPU: RTX GeForce 3090 x 2
Python version: 3.10.13
Huggingface_hub version: 0.20.3
Safetensors version: 0.4.2
Accelerate version: 0.26.1
Accelerate config: not found
PyTorch version (GPU?): 2.2.1+cu121 (True)
Tensorflow version (GPU?): not found
Flax version (CPU?/GPU?/TPU?): not found
Jax version: not found
JaxLib version: not found
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no
Peft version: 0.9.0
Trl version: 0.7.11

Who can help?

@pacman100 @stevhliu

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

I am attempting to fine-tune my model (Baichuan2-chat-7B) using LoftQ, but the results seem to differ from my expectations. Due to the computational resources available in our lab, I am using two 3090 GPUs for fine-tuning. I followed the method described at https://huggingface.co/docs/peft/en/developer_guides/lora#loftq:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoftQConfig, LoraConfig, get_peft_model

base_model = AutoModelForCausalLM.from_pretrained("Baichuan2-7B-Chat", trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16)  # don't quantize here
loftq_config = LoftQConfig(loftq_bits=4)           # set 4bit quantization
lora_config = LoraConfig(
    init_lora_weights="loftq",
    loftq_config=loftq_config,
    r=16,
    lora_alpha=8,
    target_modules=["W_pack"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)
peft_model = get_peft_model(base_model, lora_config)

However, I have found that LoftQ does not seem to quantize my LLM as expected. The actual usage of GPU memory is roughly the same as when directly loading the model (about 8GB on each GPU), with only an additional 2GB for LoRA used.

Could you please help me understand why this is happening? Is there something I am doing incorrectly?

Additionally, there is a discussion thread with a similar issue encountered by someone using the Llama-2-7b model:
https://discuss.huggingface.co/t/fine-tuning-for-llama2-based-model-with-loftq-quantization/66737/7

Expected behavior

After setting LoftQ，the usage of GPU memory could be able to drop significantly, comparable to the effect of using load_in_4bit.

The text was updated successfully, but these errors were encountered:

BenjaminBossan · 2024-03-04T15:41:15Z

Thanks a lot for reporting. Indeed, what is written in the documentation about how to initialize with LoftQ is incorrect, my bad. The correct way is unfortunately a bit more complicated. Please follow the instructions here:

https://github.com/huggingface/peft/tree/main/examples/loftq_finetuning

In the meantime, I'll be working on updating the docs and unit tests.

Mr-KenLee · 2024-03-04T15:44:21Z

Thank you very much! I will try the script immediately ~

Relates to huggingface#1525 Unfortunately, the docs I wrote about how to use LoftQ were incorrect, based on a misunderstanding I had. In reality, it is quite a bit more involved to get LoftQ working, requiring a complete roundtrip first loading a non-quantized model with LoftQ, saving the LoRA weights and the modified base model, loading the just stored base model again but this time with quantization, and finally loading the LoftQ-initialized adapter on top. The docs now link to the example which demosthenes how to move through these steps.

github-actions · 2024-04-03T15:03:57Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

BenjaminBossan · 2024-04-03T15:11:33Z

Note that the docs are updated and should now be correct. Also, in PEFT v0.10.0, we released a more convenient way to initialize with LoftQ (docs).

BenjaminBossan mentioned this issue Mar 4, 2024

Fix LoftQ docs and tests #1532

Merged

BenjaminBossan closed this as completed Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoftQ does not seem to quantify the base model #1525

LoftQ does not seem to quantify the base model #1525

Mr-KenLee commented Mar 4, 2024

BenjaminBossan commented Mar 4, 2024

Mr-KenLee commented Mar 4, 2024

github-actions bot commented Apr 3, 2024

BenjaminBossan commented Apr 3, 2024

LoftQ does not seem to quantify the base model #1525

LoftQ does not seem to quantify the base model #1525

Comments

Mr-KenLee commented Mar 4, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

BenjaminBossan commented Mar 4, 2024

Mr-KenLee commented Mar 4, 2024

github-actions bot commented Apr 3, 2024

BenjaminBossan commented Apr 3, 2024