-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RWKV
] Final fix RWMV 4bit
#26134
[RWKV
] Final fix RWMV 4bit
#26134
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this fix!
Could you add a test which would have failed before this change and passes now? Ideally it should be applied to all eligible models
# re-quantize the model: | ||
# we need to put it first on CPU then back to the device | ||
# this will create an overhead :/ | ||
quant_weight = bnb.nn.Params4bit(dequant_weights.to("cpu"), requires_grad=False).to(dequant_weights.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we setting requires_grad=False
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a requirement from bnb, all quantized parameters need to have that value set to False whereas the default is True :/ I can open an issue on bnb if this is a bug
Otherwise you get
File "/home/younes_huggingface_co/miniconda3/envs/fix-test/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 179, in to
return self.cuda(device)
File "/home/younes_huggingface_co/miniconda3/envs/fix-test/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 158, in cuda
self.data = w_4bit
RuntimeError: data set to a tensor that requires gradients must be floating point or complex dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, even if it's not a bug, then it would be good to get clarification about this.
Will this affectively set these layers to non-trainable even if they were trainable before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I am not sure here, in any case, quantized layers cannot be trained in any case as this is not supported
I have added more clarifications here: ba1b10f
Thanks for the review, I added a test that should be applicable to other checkpoints as well, as they use the same arch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing and iterating!
* Final fix RWMV 4bit * fixup * add a test * add more clarifications
* Final fix RWMV 4bit * fixup * add a test * add more clarifications
* Final fix RWMV 4bit * fixup * add a test * add more clarifications
What does this PR do?
Fixes #23848
Double quantization was not working properly for RWKV models as stated in the PR above - leading to an error. This PR proposed a global fix for RWKV models so that they can be ran in 4bit bitsandbytes without any problem.
The followed approach here is the following:
bnb.nn.functional.dequantize_4bit
That way it is possible to make sure to cover both the double quantization and classic 4bit quantization and match the results together
cc @amyeroberts and @SunMarc for your information!