-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bnb
] Add fp4 support for dispatch
#1505
Conversation
# quantize only if necessary | ||
device_index = torch.device(device).index if torch.device(device).type == "cuda" else None | ||
if not getattr(module.weight, "quant_state", None) and device_index is not None: | ||
module.weight = module.weight.cuda(device_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .cuda
function is very very deprecated. You should use to
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think the way bitsandbytes has designed its Linear4bit
layers we need to call cuda: https://github.com/TimDettmers/bitsandbytes/blob/ac5550a0238286377ee3f58a85aeba1c40493e17/bitsandbytes/nn/modules.py#L152 it seems to be the only way to quantize the weights :/ I tried it with to
and it didn't worked. (note that at that point module.weight
is a bnb.nn.Params4bit
module)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ok. Not very PyTorch-ic then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah ! :/
The documentation is not available anymore as the PR was closed or merged. |
tests/test_big_modeling.py
Outdated
"""Tests that `dispatch_model` quantizes int8 layers""" | ||
from huggingface_hub import hf_hub_download | ||
from transformers import AutoConfig, AutoModel, BitsAndBytesConfig | ||
from transformers.utils.bitsandbytes import replace_8bit_linear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this function renamed to something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, let me modify that
tests/test_big_modeling.py
Outdated
@slow | ||
@unittest.skip("Un-skip in the next transformers release") | ||
def test_dipatch_model_fp4_simple(self): | ||
"""Tests that `dispatch_model` quantizes int8 layers""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To adapt.
What does this PR do?
Fixes #1504
This PR applies the similar enhancement as #1228 for FP4 layers
Now the script below outputs the desired dtype:
cc @sgugger @BlackSamorez