[`RWKV`] Final fix RWMV 4bit #26134

younesbelkada · 2023-09-13T08:29:55Z

What does this PR do?

Double quantization was not working properly for RWKV models as stated in the PR above - leading to an error. This PR proposed a global fix for RWKV models so that they can be ran in 4bit bitsandbytes without any problem.

The followed approach here is the following:

For each target layer, de-quantize the 4bit weights using bnb.nn.functional.dequantize_4bit
Perform the weights scaling
Requantize the weights again.

That way it is possible to make sure to cover both the double quantization and classic 4bit quantization and match the results together

import torch
from transformers import RwkvForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_id = "RWKV/rwkv-4-169m-pile"

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True
)

model = RwkvForCausalLM.from_pretrained(model_id, quantization_config=quantization_config)
tok = AutoTokenizer.from_pretrained(model_id)

text = "Hello my name is"
input_ids = tok.encode(text, return_tensors="pt").to(0)

out = model.generate(input_ids, max_new_tokens=30)
print(tok.decode(out[0], skip_special_tokens=True))

model_non_dequant = RwkvForCausalLM.from_pretrained(model_id, load_in_4bit=True)

text = "Hello my name is"
input_ids = tok.encode(text, return_tensors="pt").to(0)

out = model.generate(input_ids, max_new_tokens=30)
print(tok.decode(out[0], skip_special_tokens=True))

cc @amyeroberts and @SunMarc for your information!

HuggingFaceDocBuilderDev · 2023-09-13T08:58:44Z

The documentation is not available anymore as the PR was closed or merged.

amyeroberts

Thanks for adding this fix!

Could you add a test which would have failed before this change and passes now? Ideally it should be applied to all eligible models

amyeroberts · 2023-09-13T10:21:32Z

src/transformers/models/rwkv/modeling_rwkv.py

+        # re-quantize the model:
+        # we need to put it first on CPU then back to the device
+        # this will create an overhead :/
+        quant_weight = bnb.nn.Params4bit(dequant_weights.to("cpu"), requires_grad=False).to(dequant_weights.device)


Why are we setting requires_grad=False here?

This seems to be a requirement from bnb, all quantized parameters need to have that value set to False whereas the default is True :/ I can open an issue on bnb if this is a bug

Otherwise you get

File "/home/younes_huggingface_co/miniconda3/envs/fix-test/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 179, in to return self.cuda(device) File "/home/younes_huggingface_co/miniconda3/envs/fix-test/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 158, in cuda self.data = w_4bit RuntimeError: data set to a tensor that requires gradients must be floating point or complex dtype

Yes, even if it's not a bug, then it would be good to get clarification about this.

Will this affectively set these layers to non-trainable even if they were trainable before?

Hm I am not sure here, in any case, quantized layers cannot be trained in any case as this is not supported
I have added more clarifications here: ba1b10f

younesbelkada · 2023-09-13T10:55:16Z

Thanks for the review, I added a test that should be applicable to other checkpoints as well, as they use the same arch

amyeroberts

Thanks for fixing and iterating!

* Final fix RWMV 4bit * fixup * add a test * add more clarifications

Final fix RWMV 4bit

4a7081e

younesbelkada requested a review from amyeroberts September 13, 2023 08:30

fixup

4def408

amyeroberts reviewed Sep 13, 2023

View reviewed changes

add a test

6572b76

younesbelkada requested a review from amyeroberts September 13, 2023 10:55

add more clarifications

ba1b10f

amyeroberts approved these changes Sep 13, 2023

View reviewed changes

younesbelkada merged commit 7ccac73 into huggingface:main Sep 13, 2023

younesbelkada deleted the fix-rwkv-4bit branch September 13, 2023 14:30

younesbelkada mentioned this pull request Sep 25, 2023

RuntimeError: result type Float can't be cast to the desired output type Byte #26383

Closed

4 tasks

parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023

[RWKV] Final fix RWMV 4bit (huggingface#26134)

9ab31b5

* Final fix RWMV 4bit * fixup * add a test * add more clarifications

blbadger pushed a commit to blbadger/transformers that referenced this pull request Nov 8, 2023

[RWKV] Final fix RWMV 4bit (huggingface#26134)

392019e

* Final fix RWMV 4bit * fixup * add a test * add more clarifications

EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 18, 2023

[RWKV] Final fix RWMV 4bit (huggingface#26134)

978e691

* Final fix RWMV 4bit * fixup * add a test * add more clarifications

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`RWKV`] Final fix RWMV 4bit #26134

[`RWKV`] Final fix RWMV 4bit #26134

younesbelkada commented Sep 13, 2023

HuggingFaceDocBuilderDev commented Sep 13, 2023 •

edited

Loading

amyeroberts left a comment

amyeroberts Sep 13, 2023

younesbelkada Sep 13, 2023

amyeroberts Sep 13, 2023

younesbelkada Sep 13, 2023

younesbelkada commented Sep 13, 2023

amyeroberts left a comment

[RWKV] Final fix RWMV 4bit #26134

[RWKV] Final fix RWMV 4bit #26134

Conversation

younesbelkada commented Sep 13, 2023

What does this PR do?

HuggingFaceDocBuilderDev commented Sep 13, 2023 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Sep 13, 2023

Choose a reason for hiding this comment

younesbelkada Sep 13, 2023

Choose a reason for hiding this comment

amyeroberts Sep 13, 2023

Choose a reason for hiding this comment

younesbelkada Sep 13, 2023

Choose a reason for hiding this comment

younesbelkada commented Sep 13, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

[`RWKV`] Final fix RWMV 4bit #26134

[`RWKV`] Final fix RWMV 4bit #26134

HuggingFaceDocBuilderDev commented Sep 13, 2023 •

edited

Loading