Should base model be dequantized when merging LoRA weights with base model? #254

jinyongyoo · 2023-08-24T03:18:32Z

Hi, I have a question regarding merging LoRA weights with quantized base model. For cases where we want to merge the LoRA weights back into the original model for inference, we can use merge_and_unload method. However, this obviously isn't possible for the case of quantized base models (as seen in #28).

So a common workaround I've seen is loading the base model (without quantization) and then merging the LoRA weights. But shouldn't this result in a training and inference mismatch since LoRA weights were trained using the quantized model which is different from the base model without quantization (quantization is naturally a noisy process)? I was wondering if such workaround would result in performance degradation of the final model.

Another workaround I can think of is to dequantize the quantized base model and then add the LoRA weight. This would get rid of the training / inference mismatch problem. Has there been any attempt to dequantize the base model and add the LoRA weights?

The text was updated successfully, but these errors were encountered:

eugene-yh · 2023-08-24T12:33:44Z

There is a mathematical hack to dequantize the base model. See my post here: #28 (comment)

jinyongyoo · 2023-08-24T15:20:59Z

Thanks! I ended up using dequantize_4bit method from bnb to dequantize the linear weights, but I think this approach is neat.

ChrisHayduk · 2023-08-25T01:07:15Z

@jinyongyoo Would you mind sharing the code that you used to dequantize the model? How did you apply dequantize_4bit?

jinyongyoo · 2023-08-25T01:13:27Z

Not sure if this is 100% correct way to do it.

dequantize_4bit(module.weight.data, quant_state=module.weight.quant_state) where module is instance of bnb.nn.Linear4bit. That should get you the weight that you can use to create torch.nn.Linear

ChrisHayduk · 2023-08-25T19:49:51Z

@jinyongyoo Awesome, thank you! And you just looped through every module of the model and check if it was of type bnb.nn.Linear4bit and, if it was, you replaced that module with the dequantized version?

jinyongyoo · 2023-08-27T13:10:59Z

yes

danieltanhx mentioned this issue Sep 1, 2023

the model before lora load and after lora load is diff mshumer/gpt-llm-trainer#16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should base model be dequantized when merging LoRA weights with base model? #254

Should base model be dequantized when merging LoRA weights with base model? #254

jinyongyoo commented Aug 24, 2023

eugene-yh commented Aug 24, 2023

jinyongyoo commented Aug 24, 2023

ChrisHayduk commented Aug 25, 2023

jinyongyoo commented Aug 25, 2023

ChrisHayduk commented Aug 25, 2023

jinyongyoo commented Aug 27, 2023

Should base model be dequantized when merging LoRA weights with base model? #254

Should base model be dequantized when merging LoRA weights with base model? #254

Comments

jinyongyoo commented Aug 24, 2023

eugene-yh commented Aug 24, 2023

jinyongyoo commented Aug 24, 2023

ChrisHayduk commented Aug 25, 2023

jinyongyoo commented Aug 25, 2023

ChrisHayduk commented Aug 25, 2023

jinyongyoo commented Aug 27, 2023