Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uninitialized token embeddings MBART when using device_map #26266

Closed
4 tasks
BramVanroy opened this issue Sep 19, 2023 · 2 comments · Fixed by #26422
Closed
4 tasks

Uninitialized token embeddings MBART when using device_map #26266

BramVanroy opened this issue Sep 19, 2023 · 2 comments · Fixed by #26422
Assignees

Comments

@BramVanroy
Copy link
Collaborator

System Info

Current master

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Loading a finetuned model's safetensors with device_map=auto, I get a warning that the tied embeddings are not initialized.

Some weights of MBartForConditionalGeneration were not initialized from the model checkpoint and are newly initialized: ['model.decoder.embed_tokens.weight', 'model.encoder.embed_tokens.weight']

I finetuned an MBART model with the trainer, use_safetensors was set to True. The model's vocabulary (embedding size) was extended, maybe that matters.

from transformers import AutoModelForSeq2SeqLM
# Works:
model = AutoModelForSeq2SeqLM.from_pretrained("BramVanroy/mbart_test")
# Does not work (tied embeddings not loaded correctly - triggers a warning)
model = AutoModelForSeq2SeqLM.from_pretrained("BramVanroy/mbart_test", device_map="auto")

It is not just the warning, it seems actually the case that the weights are not loaded correctly (random output).

Expected behavior

Correctly loaded safetensors

@LysandreJik LysandreJik self-assigned this Sep 26, 2023
@LysandreJik
Copy link
Member

I've traced this to the definition of the find_tied_parameters function from accelerate.

In transformers we have a control flow to identify tied parameters that does not work for meta tensors; if we identify a meta tensor, we then rely on the find_tied_parameters function from accelerate. There seems to be a discrepancy in the number of layers returned by these two methods here, depending on whether we'reusing a device_map or not:

if device_map is None and not is_fsdp_enabled():
ptrs = collections.defaultdict(list)
for name, tensor in model.state_dict().items():
id_tensor = id_tensor_storage(tensor)
ptrs[id_tensor].append(name)
# These are all the pointers of shared tensors.
tied_params = [names for _, names in ptrs.items() if len(names) > 1]
else:
# id function doesn't work for meta tensor so we need this function
tied_params = find_tied_parameters(model)

@SunMarc, would you like to investigate what might be going on here?

@SunMarc
Copy link
Member

SunMarc commented Sep 26, 2023

Yes @LysandreJik, I will check that is going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants