huggingface · BenjaminBossan · Mar 20, 2024 · Mar 4, 2024 · Mar 19, 2024 · Mar 19, 2024
diff --git a/docs/source/developer_guides/lora.md b/docs/source/developer_guides/lora.md
@@ -44,6 +44,8 @@ config = LoraConfig(init_lora_weights=False, ...)
 
 When quantizing the base model for QLoRA training, consider using the [LoftQ initialization](https://arxiv.org/abs/2310.08659), which has been shown to improve performance when training quantized models. The idea is that the LoRA weights are initialized such that the quantization error is minimized. To use LoftQ, follow [these instructions](https://github.com/huggingface/peft/tree/main/examples/loftq_finetuning).
 
+In general, for LoftQ to work best, it is recommended to target as many layers with LoRA as possible, since those not targeted cannot have LoftQ applied. This means that passing `LoraConfig(..., target_modules="all-linear")` will most likely give the best results. Also, you should use `nf4` as quant type in your quantization config when using 4bit quantization, i.e. `BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")`.
+
 <Tip>
 
 Learn more about how PEFT works with quantization in the [Quantization](quantization) guide.

diff --git a/docs/source/developer_guides/quantization.md b/docs/source/developer_guides/quantization.md
@@ -97,6 +97,8 @@ You're all set for training with whichever training method you prefer!
 
 [LoftQ](https://hf.co/papers/2310.08659) initializes LoRA weights such that the quantization error is minimized, and it can improve performance when training quantized models. To get started, follow [these instructions](https://github.com/huggingface/peft/tree/main/examples/loftq_finetuning).
 
+In general, for LoftQ to work best, it is recommended to target as many layers with LoRA as possible, since those not targeted cannot have LoftQ applied. This means that passing `LoraConfig(..., target_modules="all-linear")` will most likely give the best results. Also, you should use `nf4` as quant type in your quantization config when using 4bit quantization, i.e. `BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")`.
+
 ### QLoRA-style training
 
 QLoRA adds trainable weights to all the linear layers in the transformer architecture. Since the attribute names for these linear layers can vary across architectures, set `target_modules` to `"all-linear"` to add LoRA to all the linear layers:

diff --git a/tests/test_gpu_examples.py b/tests/test_gpu_examples.py
@@ -1357,7 +1357,10 @@ class TestLoftQ:
     Tests for LoftQ to ensure that it reduces the quantization error compared to normal LoRA quantization.
     """
 
-    error_factor = 1  # FIXME should be > 1
+    # The error factor indicates by how much the quantization error should be decreased when using LoftQ compared to
+    # quantization without LoftQ. Thus 1.03 means that the error should be decreased by 3% at least. This is a very
+    # conservative value to prevent flakiness, in practice most gains are > 1.5
+    error_factor = 1.03
 
     def get_input(self, model_id, device):
         tokenizer = AutoTokenizer.from_pretrained(model_id)
@@ -1567,7 +1570,7 @@ def test_t5_loftq_8bit(self, device, tmp_path):
         assert mse_loftq < (mse_quantized / self.error_factor)
         assert mae_loftq < (mae_quantized / self.error_factor)
 
-    @pytest.mark.xfail  # failing for now, see discussion in #1532
+    @pytest.mark.xfail  # failing for now, but having DoRA pass is only a nice-to-have, not a must, so we're good
     @pytest.mark.parametrize("device", ["cuda", "cpu"])
     def test_bloomz_loftq_4bit_dora(self, device, tmp_path):
         # same as test_bloomz_loftq_4bit but with DoRA
-Original file line number
+Diff line change
@@ Expand Up / @@ -44,6 +44,8 @@ config = LoraConfig(init_lora_weights=False, ...) @@
     When quantizing the base model for QLoRA training, consider using the [LoftQ initialization](https://arxiv.org/abs/2310.08659), which has been shown to improve performance when training quantized models. The idea is that the LoRA weights are initialized such that the quantization error is minimized. To use LoftQ, follow [these instructions](https://github.com/huggingface/peft/tree/main/examples/loftq_finetuning).
+    In general, for LoftQ to work best, it is recommended to target as many layers with LoRA as possible, since those not targeted cannot have LoftQ applied. This means that passing `LoraConfig(..., target_modules="all-linear")` will most likely give the best results. Also, you should use `nf4` as quant type in your quantization config when using 4bit quantization, i.e. `BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")`.
     <Tip>
     Learn more about how PEFT works with quantization in the [Quantization](quantization) guide.
@@ Expand Down @@