fix_mbart_tied_weights #26422

SunMarc · 2023-09-26T15:23:14Z

What does this PR do ?

Fixes #26266. This PR fixes the tied weights for mbart model. Before this PR, only lm_head was tied to model.shared. Now, we also make sure to tie model.encoder.embed_tokens and model.decoder.embed_tokens to model.shared by defining the _tie_weights method which will be called when we do model.tie_weights(). I've checked that we get the same weights at the end. This issue only happens when we load with safetensors + device_map because we don't save the shared tensors and the weights are on the meta device.

HuggingFaceDocBuilderDev · 2023-09-26T15:38:49Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2023-09-27T08:59:11Z

Hmm I'm not sure if all mBART weights share all weight matrices with each other.

We should make sure that at least for all the following models:
https://huggingface.co/models?other=mbart&sort=trending&search=facebook
all three embedding matrices are identical (I'm not sure this is always the case e.g. for multi-lingual ones)

LysandreJik · 2023-09-27T09:00:54Z

Yes I think we should look at the config.tie_word_embeddings value and adapt accordingly. See recent PR on FSMT: #26292

SunMarc · 2023-09-27T10:42:06Z

Thanks for the link @LysandreJik. I've updated the code and added a test.

LysandreJik · 2023-09-28T08:15:28Z

As @patrickvonplaten was saying could you also quickly verify that it works with the most downloaded mbart models on the Hub? When doing the FSMT change I ended up breaking a few FSMT models on the Hub, let's try to prevent this here 😁

Thanks for your help @SunMarc

SunMarc · 2023-09-28T09:40:39Z

Hi @LysandreJik , I confirm that for the most downloaded mbart models on the hub, all three embedding matrices are identical. Here's the snippet that I used:

from transformers import AutoModelForSeq2SeqLM
models = ["facebook/mbart-large-50-many-to-many-mmt", "facebook/mbart-large-50-many-to-one-mmt", "facebook/mbart-large-50-one-to-many-mmt","facebook/mbart-large-50","facebook/mbart-large-cc25","facebook/mbart-large-en-ro","facebook/mgenre-wiki"]
for model_id in models:
    for safetensors in [True, False]:
        for device_map in ["auto", None]:
            try:
                model = AutoModelForSeq2SeqLM.from_pretrained(model_id, use_safetensors=safetensors, device_map=device_map)
            except:
                print(f"{model_id} failed to load with safetensors={safetensors} and device_map={device_map}")
            assert len(
                {
                    model.get_output_embeddings().weight.data_ptr(),
                    model.get_input_embeddings().weight.data_ptr(),
                    model.base_model.decoder.embed_tokens.weight.data_ptr(),
                    model.base_model.encoder.embed_tokens.weight.data_ptr(),
                }
            ) == 1, "Embeddings are not tied in {}".format(model_id)

LysandreJik · 2023-09-28T13:08:21Z

Thanks a lot @SunMarc !

BramVanroy · 2023-09-28T13:35:36Z

Yay, that works. Thanks a lot everyone!

* fix_mbart_tied_weights * add test

fix_mbart_tied_weights

91c3ea1

SunMarc requested review from patrickvonplaten and LysandreJik September 26, 2023 15:23

add test

b30f8e6

LysandreJik approved these changes Sep 28, 2023

View reviewed changes

LysandreJik merged commit 5e11d72 into huggingface:main Sep 28, 2023

blbadger pushed a commit to blbadger/transformers that referenced this pull request Nov 8, 2023

fix_mbart_tied_weights (huggingface#26422)

afb2aba

* fix_mbart_tied_weights * add test

EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 18, 2023

fix_mbart_tied_weights (huggingface#26422)

31ab4bb

* fix_mbart_tied_weights * add test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix_mbart_tied_weights #26422

fix_mbart_tied_weights #26422

SunMarc commented Sep 26, 2023

HuggingFaceDocBuilderDev commented Sep 26, 2023 •

edited

Loading

patrickvonplaten commented Sep 27, 2023

LysandreJik commented Sep 27, 2023

SunMarc commented Sep 27, 2023

LysandreJik commented Sep 28, 2023

SunMarc commented Sep 28, 2023

LysandreJik commented Sep 28, 2023

BramVanroy commented Sep 28, 2023

fix_mbart_tied_weights #26422

fix_mbart_tied_weights #26422

Conversation

SunMarc commented Sep 26, 2023

What does this PR do ?

HuggingFaceDocBuilderDev commented Sep 26, 2023 • edited Loading

patrickvonplaten commented Sep 27, 2023

LysandreJik commented Sep 27, 2023

SunMarc commented Sep 27, 2023

LysandreJik commented Sep 28, 2023

SunMarc commented Sep 28, 2023

LysandreJik commented Sep 28, 2023

BramVanroy commented Sep 28, 2023

HuggingFaceDocBuilderDev commented Sep 26, 2023 •

edited

Loading