Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] add prepare_model_for_training #85

Merged
merged 2 commits into from
Feb 15, 2023

Conversation

younesbelkada
Copy link
Contributor

What does this PR do?

This PR makes the life easy for users. Before this PR, before training a model a user needed to manually cast some parameters in fp32 for stability (for int8 models), call gradient_checkpointing_enable on the model and do a hack to add required_grad to the output of the embedding layer.

This PR introduces a new method prepare_model_for_training that wraps everything in a single place. Added also some tests

IMO it should be up to users to add

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x.to(torch.float16)).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

For 2 reasons:
1- this needs to be called before creating the PeftModel (if we want to add it on PeftModel itself we might need to hack the hooks of the lm_head
2- the dtype of the input is very specific to some models, for eg, for t5 we need to cast the input to float16 , whereas this seems to be not needed for opt models for instance

cc @pacman100

Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @younesbelkada, thank you for working on this 🤗. I think this shouldn't be part of PeftModel class as these changes are required before creating the PeftModel itself. How about a util function in others.py which is called in get_peft_model function before creation of object of PeftModel/one of its subclass.

If that is done, we can also include the lm_head change there itself and I think x.to(torch.float16) can be done for all models without any downside. Let me know your thoughts


for param in self.base_model.parameters():
# freeze base model's layers
param.requires_grad = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can't be called after creating PeftModel as adapter weights are added and are trainable but with this they will be frozen. LoRA weights are injected in base model inplace.

@younesbelkada
Copy link
Contributor Author

This makes a lot of sense! Thanks for the review
Applied the suggestions (and also changed the name of the test file) Let me know if this is better!

Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for iterating, this looks great! Thak You 😄. LGTM!

@pacman100 pacman100 merged commit ed5a7bf into huggingface:main Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants