-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Marian model conversion #30173
Fix Marian model conversion #30173
Conversation
src/transformers/modeling_utils.py
Outdated
@@ -3840,7 +3840,6 @@ def _fix_key(key): | |||
model_buffers = {".".join([prefix, key]) for key in model_buffers} | |||
unexpected_keys = sorted(unexpected_keys - model_buffers) | |||
|
|||
model.tie_weights() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob we can remove this, given a few lines above the weights before loading state dict are tied already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was introduced in #25107, however, indeed there are no changeds to the model in the code above.
Can you make sure the bug fixed in the PR are still passing?
@@ -34,7 +34,6 @@ | |||
|
|||
DEFAULT_REPO = "Tatoeba-Challenge" | |||
DEFAULT_MODEL_DIR = os.path.join(DEFAULT_REPO, "models") | |||
LANG_CODE_URL = "https://datahub.io/core/language-codes/r/language-codes-3b2.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one gives 404, so I just loaded dataset to the hub
@@ -622,6 +622,12 @@ def load_marian_model(self) -> MarianMTModel: | |||
bias_tensor = nn.Parameter(torch.FloatTensor(self.final_bias)) | |||
model.model.decoder.embed_tokens.weight = decoder_wemb_tensor | |||
|
|||
# handle tied embeddings, otherwise "from_pretrained" loads them incorrectly | |||
if self.cfg["tied-embeddings"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this is the actual fix, which gives equal weights for tied parameters
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! We were recently pinged about that.
Great digging!
src/transformers/modeling_utils.py
Outdated
@@ -3840,7 +3840,6 @@ def _fix_key(key): | |||
model_buffers = {".".join([prefix, key]) for key in model_buffers} | |||
unexpected_keys = sorted(unexpected_keys - model_buffers) | |||
|
|||
model.tie_weights() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was introduced in #25107, however, indeed there are no changeds to the model in the code above.
Can you make sure the bug fixed in the PR are still passing?
wemb_tensor = nn.Parameter(torch.FloatTensor(self.wemb)) | ||
bias_tensor = nn.Parameter(torch.FloatTensor(self.final_bias)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where these are used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yeah, makes sense. They are not used anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird I cannot reply to the above comment. Anyway, I tested FSDP with fsdp_cpu_ram_efficient_loading: true
, and looks like tying weights do not have big effect on CPU memory. But to be clear, summoning @pacman100 to confirm that as he was the PR contributor
@ArthurZucker i reverted the change for tying weights, to be consistent with 'main'. It was not the actual solution for Marian models, so I think it does not hurt whatever was the reason for it to be added. Requesting re-review :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super glad with these changes ! Thanks for fixing!
What does this PR do?
Fixes #26216. After a bit of code exploration I found that problem was in the "tie_weights" method. The issue is that "tie_weights" does not clone, but simply change pointer output_embeddings -> input_embeddings. And since we tie weights at least twice during loading weights (once before loading state dict and the other is after), the weights were being loaded incorrectly.
Why the issue is only in MarianMT and not other models that tie weights? -> Because in other models either the weights already hold the same/tied values , or the order of loading parameters is different from MarianMT. When loading state dict into the MarianMT, "output_embeddings" are loaded last and therefore "input_embeddings" weights data is overriden to will hold same data as "output_embeddings". That way we lose access to actual "input_embeddings" weights data
This is somehow related to an old PR I found.
I converted the weights with
python src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py --models fin-eng --save_dir converted
and checked the correctness of results with the below script. Everything works good.Marian tests including slow are all passsing on my end.