-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trainer._load_from_checkpoint - support loading multiple Peft adapters #30505
Trainer._load_from_checkpoint - support loading multiple Peft adapters #30505
Conversation
nice @claralp would you mind also adding some tests? |
@kashif will try. Guess I can use a similar model setup as here test_trainer.py#L933 Another question: How is the checkpoint loading with DeepSpeed supposed to work with this trainer.py#L1847? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the support for loading multiple adapters with load_from_checkpoint
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clean implementation @claralp !
Guess I can use a similar model setup as here test_trainer.py#L933
Yeah this makes sense and I think you should be able to adapt the logic used for full training here:
transformers/tests/trainer/test_trainer.py
Line 1787 in 1e05671
def test_can_resume_training(self): |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
15429ad
to
fe7044e
Compare
@kashif @lewtun please check if this works now for you. Note: Discovered another bug while implementing this. |
fe7044e
to
1d6d481
Compare
ok @claralp let's link the peft issue here too then? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again ! seconded what @kashif said ! if you could have a small reproducer to repro the bug and file it on PEFT it would be really great 🙏
@kashif, @younesbelkada the issue cannot be reproduced anymore. The background: |
@kashif @younesbelkada @lewtun is there any open point for me to fix now or is this ready? |
On my end looks great, just waiting for a final review cc @ArthurZucker @LysandreJik |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! cc @muellerzr if you're fine with the changes in Trainer
, feel free to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG2M, thanks for also adding a test!
What does this PR do?
Since it it possible to have multiple Peft adapters in the same model, it should also be possible to resume a training of such models from checkpoint with transformers.Trainer.train(resume_from_checkpoint=True|"path").
No documentation changes because this is a fix of sth already existing, tested with DPO/KTO
Fixes #30478
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@lewtun @kashif @younesbelkada