Uninitialized token embeddings MBART when using device_map #26266

BramVanroy · 2023-09-19T15:00:52Z

System Info

Current master

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Loading a finetuned model's safetensors with device_map=auto, I get a warning that the tied embeddings are not initialized.

Some weights of MBartForConditionalGeneration were not initialized from the model checkpoint and are newly initialized: ['model.decoder.embed_tokens.weight', 'model.encoder.embed_tokens.weight']

I finetuned an MBART model with the trainer, use_safetensors was set to True. The model's vocabulary (embedding size) was extended, maybe that matters.

from transformers import AutoModelForSeq2SeqLM
# Works:
model = AutoModelForSeq2SeqLM.from_pretrained("BramVanroy/mbart_test")
# Does not work (tied embeddings not loaded correctly - triggers a warning)
model = AutoModelForSeq2SeqLM.from_pretrained("BramVanroy/mbart_test", device_map="auto")

It is not just the warning, it seems actually the case that the weights are not loaded correctly (random output).

Expected behavior

Correctly loaded safetensors

The text was updated successfully, but these errors were encountered:

LysandreJik · 2023-09-26T11:25:49Z

I've traced this to the definition of the find_tied_parameters function from accelerate.

In transformers we have a control flow to identify tied parameters that does not work for meta tensors; if we identify a meta tensor, we then rely on the find_tied_parameters function from accelerate. There seems to be a discrepancy in the number of layers returned by these two methods here, depending on whether we'reusing a device_map or not:

transformers/src/transformers/modeling_utils.py

Lines 3471 to 3481 in 0ac3875

    
           if device_map is None and not is_fsdp_enabled(): 
        
               ptrs = collections.defaultdict(list) 
        
               for name, tensor in model.state_dict().items(): 
        
                   id_tensor = id_tensor_storage(tensor) 
        
                   ptrs[id_tensor].append(name) 
        
               # These are all the pointers of shared tensors. 
        
               tied_params = [names for _, names in ptrs.items() if len(names) > 1] 
        
           else: 
        
               # id function doesn't work for meta tensor so we need this function 
        
               tied_params = find_tied_parameters(model)

@SunMarc, would you like to investigate what might be going on here?

SunMarc · 2023-09-26T11:40:33Z

Yes @LysandreJik, I will check that is going on.

LysandreJik self-assigned this Sep 26, 2023

SunMarc self-assigned this Sep 26, 2023

SunMarc mentioned this issue Sep 26, 2023

fix_mbart_tied_weights #26422

Merged

LysandreJik closed this as completed in #26422 Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uninitialized token embeddings MBART when using device_map #26266

Uninitialized token embeddings MBART when using device_map #26266

BramVanroy commented Sep 19, 2023

LysandreJik commented Sep 26, 2023

SunMarc commented Sep 26, 2023 •

edited

Loading

Uninitialized token embeddings MBART when using device_map #26266

Uninitialized token embeddings MBART when using device_map #26266

Comments

BramVanroy commented Sep 19, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

LysandreJik commented Sep 26, 2023

SunMarc commented Sep 26, 2023 • edited Loading

SunMarc commented Sep 26, 2023 •

edited

Loading