Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LoftQ docs and tests #1532

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/developer_guides/lora.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ config = LoraConfig(init_lora_weights=False, ...)

When quantizing the base model for QLoRA training, consider using the [LoftQ initialization](https://arxiv.org/abs/2310.08659), which has been shown to improve performance when training quantized models. The idea is that the LoRA weights are initialized such that the quantization error is minimized. To use LoftQ, follow [these instructions](https://github.com/huggingface/peft/tree/main/examples/loftq_finetuning).

In general, for LoftQ to work best, it is recommended to target as many layers with LoRA as possible, since those not targeted cannot have LoftQ applied. This means that passing `LoraConfig(..., target_modules="all-linear")` will most likely give the best results. Also, you should use `nf4` as quant type in your quantization config when using 4bit quantization, i.e. `BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice point to note in the docs.


<Tip>

Learn more about how PEFT works with quantization in the [Quantization](quantization) guide.
Expand Down
2 changes: 2 additions & 0 deletions docs/source/developer_guides/quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,8 @@ You're all set for training with whichever training method you prefer!

[LoftQ](https://hf.co/papers/2310.08659) initializes LoRA weights such that the quantization error is minimized, and it can improve performance when training quantized models. To get started, follow [these instructions](https://github.com/huggingface/peft/tree/main/examples/loftq_finetuning).

In general, for LoftQ to work best, it is recommended to target as many layers with LoRA as possible, since those not targeted cannot have LoftQ applied. This means that passing `LoraConfig(..., target_modules="all-linear")` will most likely give the best results. Also, you should use `nf4` as quant type in your quantization config when using 4bit quantization, i.e. `BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")`.

### QLoRA-style training

QLoRA adds trainable weights to all the linear layers in the transformer architecture. Since the attribute names for these linear layers can vary across architectures, set `target_modules` to `"all-linear"` to add LoRA to all the linear layers:
Expand Down
7 changes: 5 additions & 2 deletions tests/test_gpu_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -1357,7 +1357,10 @@ class TestLoftQ:
Tests for LoftQ to ensure that it reduces the quantization error compared to normal LoRA quantization.
"""

error_factor = 1 # FIXME should be > 1
# The error factor indicates by how much the quantization error should be decreased when using LoftQ compared to
# quantization without LoftQ. Thus 1.03 means that the error should be decreased by 3% at least. This is a very
# conservative value to prevent flakiness, in practice most gains are > 1.5
error_factor = 1.03

def get_input(self, model_id, device):
tokenizer = AutoTokenizer.from_pretrained(model_id)
Expand Down Expand Up @@ -1567,7 +1570,7 @@ def test_t5_loftq_8bit(self, device, tmp_path):
assert mse_loftq < (mse_quantized / self.error_factor)
assert mae_loftq < (mae_quantized / self.error_factor)

@pytest.mark.xfail # failing for now, see discussion in #1532
@pytest.mark.xfail # failing for now, but having DoRA pass is only a nice-to-have, not a must, so we're good
@pytest.mark.parametrize("device", ["cuda", "cpu"])
def test_bloomz_loftq_4bit_dora(self, device, tmp_path):
# same as test_bloomz_loftq_4bit but with DoRA
Expand Down
Loading