Why is it recommend to set load_in_8bit: true
for LORA finetuning?
#1611
-
I startet to experment lora finetuning and since I have enough memory and still the model always gets worse through lora finetuning I am wondering why this is the case. I saw a warning in the logs that it is recommended to quantize to 8 bit. Why is this recommended? Shouldn't the model loose performance through quantisation? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hey, this reply is a bit late, but I hope I can clarify this for future readers. The reason is that when you do LORA fine-tuning, you're fine-tuning a separate set of weights. It is recommended that you load the base model in 8bit to save vram which allows you to set higher batch size + train faster.
Could this be an issue with the dataset?
Once merged, your model will be outputted as fp16. |
Beta Was this translation helpful? Give feedback.
Hey, this reply is a bit late, but I hope I can clarify this for future readers.
The reason is that when you do LORA fine-tuning, you're fine-tuning a separate set of weights. It is recommended that you load the base model in 8bit to save vram which allows you to set higher batch size + train faster.
Could this be an issue with the dataset?
Once merged, your model will be outputted as fp16.