You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hugging Face allows the use of DeepSpeed in order to acelerate the training of a model in one GPU (or more) (read DeepSpeed Integration).
The use of DeepSpeed can be done in a notebook or in the command line (script *.py). For example, here is an example from Hugging Face: transformers + deepspeed CLI.
Great... but there is a problem when using Mixed precision training using float16 within DeepSpeed: the resulting *.bin files (model and adapter) are not saved in fp32.
In the case of a DeepSpeed training of Hugging Face transformers script, it is possible thanks to the zero_to_fp32.py script (check the links at the bottom of this message) to get a fp32 weights reconstruction from pytorch_model.bin.
However, how to do that after a DeepSpeed training of adapter-transformers script (with Mixed precision training using float16) on the pytorch_adapter.bin and pytorch_model_head.bin files?
Note: I ran the run_mlm.py script updated by adapter-transformers within DeepSpeed (with Mixed precision training using float16) with a change in line 490 of the Trainer in order to save the model at the end of the training (new code: do_save_full_model=adapter_args.train_adapter). As expected, the pytorch_adapter.bin size was half of its fp32 value. Then, I run the zero_to_fp32.py as explained above and in the listed links, I uploaded my saved model with the following code and I check if the embeddings and layers weights have been unchanged (expected): but they have changed.
from transformers import BertForMaskedLM, AutoTokenizer
model = BertForMaskedLM.from_pretrained(str(path_to_awesome_name_you_picked))
More, I checked the content of the adapter folder that was as following:
adapter_config.json -- 632 B
head_config.json -- 231 B
pytorch_adapter.bin -- 232 MB
pytorch_model_head.bin -- 232 MB
More than 230 MB for each bin file (the trained model was a BERT base) ? I was expected a small value with fp16 weights... strange. In the case of a training witout DeepSpeed, the values of these bin files are as following:
This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.
Environment info
adapter-transformers
version: 2.0.1DeepSpeed
version: 0.4.0Details
Hugging Face allows the use of DeepSpeed in order to acelerate the training of a model in one GPU (or more) (read DeepSpeed Integration).
The use of
DeepSpeed
can be done in a notebook or in the command line (script *.py). For example, here is an example from Hugging Face: transformers + deepspeed CLI.Instead of using the examples notebooks or scripts of
transformers
Hugging Face, it is possible to use the updated scripts byadapter-transformers
.Great... but there is a problem when using
Mixed precision training
usingfloat16
withinDeepSpeed
: the resulting*.bin
files (model and adapter) are not saved infp32
.In the case of a
DeepSpeed
training of Hugging Facetransformers
script, it is possible thanks to thezero_to_fp32.py
script (check the links at the bottom of this message) to get afp32
weights reconstruction frompytorch_model.bin
.However, how to do that after a
DeepSpeed
training ofadapter-transformers
script (withMixed precision training
usingfloat16
) on thepytorch_adapter.bin
andpytorch_model_head.bin
files?Note: I ran the run_mlm.py script updated by
adapter-transformers
withinDeepSpeed
(withMixed precision training
usingfloat16
) with a change in line 490 of theTrainer
in order to save the model at the end of the training (new code:do_save_full_model=adapter_args.train_adapter
). As expected, thepytorch_adapter.bin
size was half of itsfp32
value. Then, I run thezero_to_fp32.py
as explained above and in the listed links, I uploaded my saved model with the following code and I check if the embeddings and layers weights have been unchanged (expected): but they have changed.More, I checked the content of the adapter folder that was as following:
More than 230 MB for each
bin
file (the trained model was aBERT base
) ? I was expected a small value withfp16
weights... strange. In the case of a training witoutDeepSpeed
, the values of thesebin
files are as following:Links to read about
fp32
weights reconstruction:The text was updated successfully, but these errors were encountered: