-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High cpu memory usage as bf16 model is auto loaded as fp32 #34743
Comments
Hey @Qubitium, the model was indeed serialized as bf16, but here you're not specifying in which dtype you would like to load it. We follow torch's default loading mechanism, which is to automatically load it in the default In order to update the dtype in which it should be loaded, please change this line: - model = AutoModelForCausalLM.from_pretrained(model_file)
+ model = AutoModelForCausalLM.from_pretrained(model_file, torch_dtype=torch.bfloat16) You can also use - model = AutoModelForCausalLM.from_pretrained(model_file)
+ model = AutoModelForCausalLM.from_pretrained(model_file, torch_dtype='auto') You can read more about this in the |
@LysandreJik It's 2024 and I would like to propose that the default float32 be modified. Please read the below with a light heart. Reasons:
Overall, accept the config.json default as truth unless there is an override, or the default is really in-comptible with gpu/cpu: when a device does not physically support it model specified dtype. torch_dtype (`str` or `torch.dtype`, *optional*):
Override the default `torch.dtype` and load the model under a specific `dtype`. The different options
are:
1. `torch.float16` or `torch.bfloat16` or `torch.float`: load in a specified
`dtype`, ignoring the model's `config.torch_dtype` if one exists. If not specified
- the model will get loaded in `torch.float` (fp32).
2. `"auto"` - A `torch_dtype` entry in the `config.json` file of the model will be
attempted to be used. If this entry isn't found then next check the `dtype` of the first weight in
the checkpoint that's of a floating point type and use that as `dtype`. This will load the model
using the `dtype` it was saved in at the end of the training. It can't be used as an indicator of how
the model was trained. Since it could be trained in one of half precision dtypes, but saved in fp32.
3. A string that is a valid `torch.dtype`. E.g. "float32" loads the model in `torch.float32`, "float16" loads in `torch.float16` etc.
<Tip>
For some models the `dtype` they were trained in is unknown - you may try to check the model's paper or
reach out to the authors and ask them to add this information to the model's card and to insert the
`torch_dtype` entry in `config.json` on the hub.
</Tip> |
Thanks for your feedback @Qubitium! If we were to change the default here we would do it when passing from major 4 to 5 as it's a very significant change. Something we can do right now however is to make the @stevhliu, would it be possible to make this much more visible in the docs? There are many areas where we could showcase the |
For sure, I'll open a PR to make In the next version of the docs, the |
Completely agree with you @Qubitium on the motivations, we are kind of stuck with this because of how big of a change it is. |
System Info
Ubuntu 24.04
Transformers 4.46.2
Accelerator 1.1.1
Safetensor 0.4.5
Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Unexpected 2x cpu memory usage due to bf16 safetensor loaded as float32 on
device=cpu
.Manually passing torch_dtype=torch.bfloat16 has no such issue but this should not be necessary since both model.config and safentensor files has proper bfloat16.
Sample reproducing code:
Code output:
Expected behavior
Modify above code pass
torch_dtype=torch.bfloat16
tofrom_pretrained
and memory usage is normal/expected:There are two related issues here:
Manually passing dtype=bfloat16 to
from_pretrained
fixes this issue.The text was updated successfully, but these errors were encountered: