Adapter-transformers & DeepSpeed: how to get fp32 weights reconstruction? #192

piegu · 2021-06-21T18:44:07Z

Environment info

adapter-transformers version: 2.0.1
DeepSpeed version: 0.4.0

Details

Hugging Face allows the use of DeepSpeed in order to acelerate the training of a model in one GPU (or more) (read DeepSpeed Integration).

The use of DeepSpeed can be done in a notebook or in the command line (script *.py). For example, here is an example from Hugging Face: transformers + deepspeed CLI.

Instead of using the examples notebooks or scripts of transformers Hugging Face, it is possible to use the updated scripts by adapter-transformers.

Great... but there is a problem when using Mixed precision training using float16 within DeepSpeed: the resulting *.bin files (model and adapter) are not saved in fp32.

In the case of a DeepSpeed training of Hugging Face transformers script, it is possible thanks to the zero_to_fp32.py script (check the links at the bottom of this message) to get a fp32 weights reconstruction from pytorch_model.bin.

However, how to do that after a DeepSpeed training of adapter-transformers script (with Mixed precision training using float16) on the pytorch_adapter.bin and pytorch_model_head.bin files?

Note: I ran the run_mlm.py script updated by adapter-transformers within DeepSpeed (with Mixed precision training using float16) with a change in line 490 of the Trainer in order to save the model at the end of the training (new code: do_save_full_model=adapter_args.train_adapter). As expected, the pytorch_adapter.bin size was half of its fp32 value. Then, I run the zero_to_fp32.py as explained above and in the listed links, I uploaded my saved model with the following code and I check if the embeddings and layers weights have been unchanged (expected): but they have changed.

from transformers import BertForMaskedLM, AutoTokenizer
model = BertForMaskedLM.from_pretrained(str(path_to_awesome_name_you_picked))

More, I checked the content of the adapter folder that was as following:

adapter_config.json -- 632 B
head_config.json -- 231 B
pytorch_adapter.bin -- 232 MB
pytorch_model_head.bin -- 232 MB

More than 230 MB for each bin file (the trained model was a BERT base) ? I was expected a small value with fp16 weights... strange. In the case of a training witout DeepSpeed, the values of these bin files are as following:

pytorch_adapter.bin -- 29.6 MB
pytorch_model_head.bin -- 94 MB

Links to read about fp32 weights reconstruction:

The text was updated successfully, but these errors were encountered:

adapter-hub-bert · 2022-10-14T06:48:15Z

This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.

adapter-hub-bert · 2022-10-29T06:20:07Z

This issue was closed because it was stale for 14 days without any activity.

piegu added the question Further information is requested label Jun 21, 2021

adapter-hub-bert added the Stale label Oct 14, 2022

adapter-hub-bert closed this as not planned Won't fix, can't repro, duplicate, stale Oct 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapter-transformers & DeepSpeed: how to get fp32 weights reconstruction? #192

Adapter-transformers & DeepSpeed: how to get fp32 weights reconstruction? #192

piegu commented Jun 21, 2021

adapter-hub-bert commented Oct 14, 2022

adapter-hub-bert commented Oct 29, 2022

Adapter-transformers & DeepSpeed: how to get fp32 weights reconstruction? #192

Adapter-transformers & DeepSpeed: how to get fp32 weights reconstruction? #192

Comments

piegu commented Jun 21, 2021

Environment info

Details

adapter-hub-bert commented Oct 14, 2022

adapter-hub-bert commented Oct 29, 2022