Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapter-transformers & DeepSpeed: how to get fp32 weights reconstruction? #192

Closed
piegu opened this issue Jun 21, 2021 · 2 comments
Closed
Labels
question Further information is requested Stale

Comments

@piegu
Copy link

piegu commented Jun 21, 2021

Environment info

  • adapter-transformers version: 2.0.1
  • DeepSpeed version: 0.4.0

Details

Hugging Face allows the use of DeepSpeed in order to acelerate the training of a model in one GPU (or more) (read DeepSpeed Integration).

The use of DeepSpeed can be done in a notebook or in the command line (script *.py). For example, here is an example from Hugging Face: transformers + deepspeed CLI.

Instead of using the examples notebooks or scripts of transformers Hugging Face, it is possible to use the updated scripts by adapter-transformers.

Great... but there is a problem when using Mixed precision training using float16 within DeepSpeed: the resulting *.bin files (model and adapter) are not saved in fp32.

In the case of a DeepSpeed training of Hugging Face transformers script, it is possible thanks to the zero_to_fp32.py script (check the links at the bottom of this message) to get a fp32 weights reconstruction from pytorch_model.bin.

However, how to do that after a DeepSpeed training of adapter-transformers script (with Mixed precision training using float16) on the pytorch_adapter.bin and pytorch_model_head.bin files?

Note: I ran the run_mlm.py script updated by adapter-transformers within DeepSpeed (with Mixed precision training using float16) with a change in line 490 of the Trainer in order to save the model at the end of the training (new code: do_save_full_model=adapter_args.train_adapter). As expected, the pytorch_adapter.bin size was half of its fp32 value. Then, I run the zero_to_fp32.py as explained above and in the listed links, I uploaded my saved model with the following code and I check if the embeddings and layers weights have been unchanged (expected): but they have changed.

from transformers import BertForMaskedLM, AutoTokenizer
model = BertForMaskedLM.from_pretrained(str(path_to_awesome_name_you_picked))

More, I checked the content of the adapter folder that was as following:

adapter_config.json -- 632 B
head_config.json -- 231 B
pytorch_adapter.bin -- 232 MB
pytorch_model_head.bin -- 232 MB

More than 230 MB for each bin file (the trained model was a BERT base) ? I was expected a small value with fp16 weights... strange. In the case of a training witout DeepSpeed, the values of these bin files are as following:

pytorch_adapter.bin -- 29.6 MB
pytorch_model_head.bin -- 94 MB

Links to read about fp32 weights reconstruction:

@piegu piegu added the question Further information is requested label Jun 21, 2021
@adapter-hub-bert
Copy link
Member

This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.

@adapter-hub-bert
Copy link
Member

This issue was closed because it was stale for 14 days without any activity.

@adapter-hub-bert adapter-hub-bert closed this as not planned Won't fix, can't repro, duplicate, stale Oct 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

2 participants