Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trainer._load_from_checkpoint - support loading multiple Peft adapters #30505

Merged

Conversation

claralp
Copy link
Contributor

@claralp claralp commented Apr 26, 2024

What does this PR do?

Since it it possible to have multiple Peft adapters in the same model, it should also be possible to resume a training of such models from checkpoint with transformers.Trainer.train(resume_from_checkpoint=True|"path").
No documentation changes because this is a fix of sth already existing, tested with DPO/KTO

Fixes #30478

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@lewtun @kashif @younesbelkada

@kashif
Copy link
Contributor

kashif commented Apr 26, 2024

nice @claralp would you mind also adding some tests?

@claralp
Copy link
Contributor Author

claralp commented Apr 26, 2024

@kashif will try. Guess I can use a similar model setup as here test_trainer.py#L933

Another question: How is the checkpoint loading with DeepSpeed supposed to work with this trainer.py#L1847?

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the support for loading multiple adapters with load_from_checkpoint!

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean implementation @claralp !

Guess I can use a similar model setup as here test_trainer.py#L933

Yeah this makes sense and I think you should be able to adapt the logic used for full training here:

def test_can_resume_training(self):

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@claralp claralp force-pushed the trainer_checkpoint_fix_multiple_adapters branch from 15429ad to fe7044e Compare May 2, 2024 13:14
@claralp
Copy link
Contributor Author

claralp commented May 2, 2024

@kashif @lewtun please check if this works now for you.

Note: Discovered another bug while implementing this.
While the PeftAdapterMixin.set_adapter function supports multiple active adapters, its internal logic for setting multiple adapters active is broken.
But this needs to be fixed in the Peft library first

@claralp claralp force-pushed the trainer_checkpoint_fix_multiple_adapters branch from fe7044e to 1d6d481 Compare May 2, 2024 13:27
@kashif
Copy link
Contributor

kashif commented May 2, 2024

ok @claralp let's link the peft issue here too then?

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again ! seconded what @kashif said ! if you could have a small reproducer to repro the bug and file it on PEFT it would be really great 🙏

@claralp
Copy link
Contributor Author

claralp commented May 3, 2024

@kashif, @younesbelkada the issue cannot be reproduced anymore.

The background:
Since multiple adapters can be active at the same time I would have expected the same in src/peft/utils/other.py#L272.
I had an error previously when calling this with a list of adapters in src/transformers/integrations/peft.py#L311.
This was before commits/16bd0ef66f642ef4b69f4f21cfc368d548318672 when I accessed the active_adapters property instead of active_adapter.
But when I now run a test case with two active adapters, it works.
So I would treat this as solved

@younesbelkada younesbelkada requested a review from ArthurZucker May 3, 2024 09:25
@claralp
Copy link
Contributor Author

claralp commented May 6, 2024

@kashif @younesbelkada @lewtun is there any open point for me to fix now or is this ready?

@younesbelkada
Copy link
Contributor

On my end looks great, just waiting for a final review cc @ArthurZucker @LysandreJik

@younesbelkada younesbelkada requested a review from LysandreJik May 6, 2024 09:10
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! cc @muellerzr if you're fine with the changes in Trainer, feel free to merge.

Copy link
Contributor

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG2M, thanks for also adding a test!

@muellerzr muellerzr merged commit e076953 into huggingface:main May 6, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trainer with resume_from_checkpoint does not work with multiple Peft Adapters
7 participants