-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PEFT
] make the trainer support resume checkpoint from a named adapter #28531
#28547
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thank you ! Can you also run the styling checks? make fixup
Hi @younesbelkada , I have made the styling checks. If there is still something missing, please informe me :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for fixing!
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
My pleasure :D |
Yes @chenbin11200 - it will get reviewed as soon as possible and we'll merge the fix in main ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
Could you add a test to make sure we can resume from a checkpoint?
PEFT
] make the trainer support resume checkpoint from a named adapter #28531
@chenbin11200 the test would go here ideally: https://github.com/huggingface/transformers/blob/main/tests/peft_integration/test_peft_integration.py let me know if you need help designing the test! |
Hi @younesbelkada @amyeroberts , |
Hi @chenbin11200 |
Thank you @younesbelkada, I will try that. |
8c7c70c
to
c55c79c
Compare
c55c79c
to
9cf826a
Compare
Hi @younesbelkada, |
No problem at all ! |
@younesbelkada I see... After merging the main branch, the broken issues are gone. Cool, I didn't realized the checkings are combined with the origin master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating and adding tests!
Just a few small comments
compute_metrics = kwargs.pop("compute_metrics", None) | ||
optimizers = kwargs.pop("optimizers", (None, None)) | ||
output_dir = kwargs.pop("output_dir", "./regression") | ||
preprocess_logits_for_metrics = kwargs.pop("preprocess_logits_for_metrics", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's best to keep logic for tests as simple and explicit as possible. Here, rather than popping from kwargs, these should be kwargs with default values in get_regression_adapter_trainer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically you are right. I imitated the coding style of other test case in test_trainer.py, there mixes some arges in kwargs and pops them later. What do you think? To keep the coding style identity, or I could seperate them in method args, that's not a big effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather they were explicitly set, or, tbh because this method is only used in one place, just hardcoded
@amyeroberts Hi, I have revoled most of the comments and left a single point to discuss with you. :) |
Hi @chenbin11200 |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi @younesbelkada , Sorry for delay, it was crazy days... This issue seems get fixed in #30505 , so this PR is safe to close. Thank you for your support and enthusiasm, looking forward to our next cooperation :D |
Thank you very much @chenbin11200 ! |
What does this PR do?
Fixes # 28531
In peft>=0.5.0, when one initialize the PeftModel with a adapter name, like
In this case, the
adapter_config.json
andadapter_model.bin
files will be saved in/my_output_dir/checkpoint-300/my_lora_model_name
instead of/my_output_dir/checkpoint-300
directly. That will raise ValueError when trying to resume a training from checkpoint.This PR is to fix this issue by join path into the subfolder, and load the adapter from the right subfolder(if necessary, because if one don't offer a adapter_name, the weight and config files will not be saved into a subfolder, this is also considered).
Before submitting
Pull Request section?
The bug is reported there but not yet discussed.
https://github.com/huggingface/transformers/issues/28531
documentation guidelines, and
here are tips on formatting docstrings.
No unit test, only tested locally for this small change.
Who can review?
@muellerzr and @pacman100