Correct loading of models with shared tensors when using accelerator.load_state() #2875

jkuntzer · 2024-06-20T08:05:26Z

What does this PR do?

I would run into problems with PyTorch's load_state_dict complaining about missing keys. These keys belonged to shared tensors. These shared keys are intentionally omitted by the safetensors library. To load a model correctly, one has to use safetensor's load_model function instead of the default load_state_dict function (described here). This was previously not done when using the load_state function of the Accelerator.

Fixes # (issue)
I think this issue might be relevant as they also report problems when loading with accelerator.load_state.
#2155

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Core parts of the library: @muellerzr @BenjaminBossan @SunMarc

…lerator.load_state()

HuggingFaceDocBuilderDev · 2024-06-20T09:54:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

muellerzr

Thanks! Just one question

muellerzr · 2024-07-03T08:13:29Z

src/accelerate/checkpointing.py

-        models[i].load_state_dict(state_dict, **load_model_func_kwargs)
+            model.load_state_dict(state_dict, **load_model_func_kwargs)
    logger.info("All model weights loaded successfully")


Any particular reason for this change? I'd expect only the prior to be modified.

In the if statement, he's loading the safetensors model directly whereas before, we were only getting the state dict.

load_model does both: it loads the file and uses it to populate the state_dict. Previously, each branch of the if-condition only loaded the file and after the if-condition, the model would load the state dict. Since load_model does both, I indented the statement on line 204 to become part of the else-clause. This becomes clearer when you have a look at the complete surroundings of the changes instead of only the affected lines.

The other change in this line (aside from the indent) namely using model instead of models[i] is mostly cosmetic. My linter was complaining that the enumerate call defines model but it's never used.

SunMarc

Thanks for the change and spotting the issue ! Could you add a test with a model with tied weight ? You can use the following test for reference : test_save_load_model

jkuntzer · 2024-07-05T09:03:34Z

Yes, I'll have a look into it.

jkuntzer · 2024-07-09T08:50:16Z

You can verify that the shared weights are implemented correctly by checking the output. safetensors warns you about that fact.

SunMarc

Thanks for iterating and adding the tests @jkuntzer ! Could you do a final check and see if the test that you added fails when you remove the changes you did ?

jkuntzer · 2024-07-09T13:16:24Z

Just did. This is the expected error message I get when reverting my changes.

SunMarc · 2024-07-09T14:05:40Z

tests/test_accelerator.py

+        # need to add this for compliance with other methods
+        self.weight = self.linear1.weight
+        self.bias = self.linear1.bias


Do we really need that ? where does it fail ?

It used to fail previously. You're right. This part can be safely removed.

SunMarc · 2024-07-09T14:08:00Z

Just did. This is the expected error message I get when reverting my changes.

I was only expecting linear2.weight and linear2.bias to be missing. Maybe this is due to

self.weight = self.linear1.weight
self.bias = self.linear1.bias

jkuntzer · 2024-07-09T14:42:09Z

Just did. This is the expected error message I get when reverting my changes.

I was only expecting linear2.weight and linear2.bias to be missing. Maybe this is due to
self.weight = self.linear1.weight
self.bias = self.linear1.bias

After removing the unnecessary bits, it correctly only throws an error for the weights and bias of the 2nd linear layer.

SunMarc · 2024-07-10T14:17:57Z

Nice ! Could you just fix the quality issue (make style) and we are good to merge !

muellerzr

Thanks!

Enabled correct loading of models with shared tensors when using acce…

b97af0f

…lerator.load_state()

removed unused import

308bf24

muellerzr reviewed Jul 3, 2024

View reviewed changes

SunMarc reviewed Jul 3, 2024

View reviewed changes

added a test for a model with shared weights

7183764

SunMarc approved these changes Jul 9, 2024

View reviewed changes

SunMarc reviewed Jul 9, 2024

View reviewed changes

removed unnecessary bits

5e43b7e

SunMarc requested a review from muellerzr July 10, 2024 14:18

fixed linting errors

7aff18a

muellerzr approved these changes Jul 15, 2024

View reviewed changes

muellerzr merged commit f4f1260 into huggingface:main Jul 15, 2024
24 of 25 checks passed

byi8220 mentioned this pull request Jul 24, 2024

Require safetensors>=0.4.3 #2957

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct loading of models with shared tensors when using accelerator.load_state() #2875

Correct loading of models with shared tensors when using accelerator.load_state() #2875

jkuntzer commented Jun 20, 2024

HuggingFaceDocBuilderDev commented Jun 20, 2024

muellerzr left a comment

muellerzr Jul 3, 2024

SunMarc Jul 3, 2024

jkuntzer Jul 5, 2024

jkuntzer Jul 5, 2024

SunMarc left a comment

jkuntzer commented Jul 5, 2024

jkuntzer commented Jul 9, 2024

SunMarc left a comment

jkuntzer commented Jul 9, 2024

SunMarc Jul 9, 2024

jkuntzer Jul 9, 2024

SunMarc commented Jul 9, 2024

jkuntzer commented Jul 9, 2024

SunMarc commented Jul 10, 2024

muellerzr left a comment

Correct loading of models with shared tensors when using accelerator.load_state() #2875

Correct loading of models with shared tensors when using accelerator.load_state() #2875

Conversation

jkuntzer commented Jun 20, 2024

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Jun 20, 2024

muellerzr left a comment

Choose a reason for hiding this comment

muellerzr Jul 3, 2024

Choose a reason for hiding this comment

SunMarc Jul 3, 2024

Choose a reason for hiding this comment

jkuntzer Jul 5, 2024

Choose a reason for hiding this comment

jkuntzer Jul 5, 2024

Choose a reason for hiding this comment

SunMarc left a comment

Choose a reason for hiding this comment

jkuntzer commented Jul 5, 2024

jkuntzer commented Jul 9, 2024

SunMarc left a comment

Choose a reason for hiding this comment

jkuntzer commented Jul 9, 2024

SunMarc Jul 9, 2024

Choose a reason for hiding this comment

jkuntzer Jul 9, 2024

Choose a reason for hiding this comment

SunMarc commented Jul 9, 2024

jkuntzer commented Jul 9, 2024

SunMarc commented Jul 10, 2024

muellerzr left a comment

Choose a reason for hiding this comment