-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPTQ Quantization via from_pretrained
: why enforcing fp16
?
#25888
Comments
Hi ! |
Why override it if |
if we set it to float32 by default it will create a lot of overhead for non-linear modules being in fp32 (such as embedding layer) making it not possible to fit some models on Google colab for example, therefore for bnb we set them to half-precision with a logger.info explaining what is happening under the hood. |
Understood -- thanks for the explanation! So just to confirm, there are no correctness issues with using If that’s the case, would it be more effective to add a warning when |
Hi @HanGuo97 , the backend in auto_gptq library always used |
Interesting, thanks for the clarification! I briefly looked into the |
Yeah, I must have forgotten to deactivate |
Oh yes you are right, I missed this :) (In hindsight, it's a bit odd they set this to True by default when it clearly depends on the model.) Edit: |
Thanks again for looking into that ! |
Feature request
Hi, I noticed in the following line that model has to be in
fp16
format before GPTQ quantization. I'm curious whether this condition can be dropped?transformers/src/transformers/modeling_utils.py
Line 2504 in ccb92be
Motivation
My use case runs into troubles with
fp16
but works withbf16
, and I noticed that if I simply remove this line and keeptorch_dtype=None
everything runs fine.Your contribution
NA
The text was updated successfully, but these errors were encountered: