[`core`] add `prepare_model_for_training` #85

younesbelkada · 2023-02-14T11:17:52Z

What does this PR do?

This PR makes the life easy for users. Before this PR, before training a model a user needed to manually cast some parameters in fp32 for stability (for int8 models), call gradient_checkpointing_enable on the model and do a hack to add required_grad to the output of the embedding layer.

This PR introduces a new method prepare_model_for_training that wraps everything in a single place. Added also some tests

IMO it should be up to users to add

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x.to(torch.float16)).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

For 2 reasons:
1- this needs to be called before creating the PeftModel (if we want to add it on PeftModel itself we might need to hack the hooks of the lm_head
2- the dtype of the input is very specific to some models, for eg, for t5 we need to cast the input to float16 , whereas this seems to be not needed for opt models for instance

cc @pacman100

pacman100

Hello @younesbelkada, thank you for working on this 🤗. I think this shouldn't be part of PeftModel class as these changes are required before creating the PeftModel itself. How about a util function in others.py which is called in get_peft_model function before creation of object of PeftModel/one of its subclass.

If that is done, we can also include the lm_head change there itself and I think x.to(torch.float16) can be done for all models without any downside. Let me know your thoughts

pacman100 · 2023-02-14T11:27:20Z

src/peft/peft_model.py

+
+        for param in self.base_model.parameters():
+            # freeze base model's layers
+            param.requires_grad = False


this can't be called after creating PeftModel as adapter weights are added and are trainable but with this they will be frozen. LoRA weights are injected in base model inplace.

younesbelkada · 2023-02-14T11:51:48Z

This makes a lot of sense! Thanks for the review
Applied the suggestions (and also changed the name of the test file) Let me know if this is better!

pacman100

Thank you for iterating, this looks great! Thak You 😄. LGTM!

add prepare_model_for_training

0e80648

pacman100 reviewed Feb 14, 2023

View reviewed changes

apply suggestions

36c7e3b

younesbelkada mentioned this pull request Feb 14, 2023

[bnb] add flan-t5 example #86

Merged

pacman100 approved these changes Feb 15, 2023

View reviewed changes

pacman100 merged commit ed5a7bf into huggingface:main Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`core`] add `prepare_model_for_training` #85

[`core`] add `prepare_model_for_training` #85

younesbelkada commented Feb 14, 2023

pacman100 left a comment •

edited

Loading

pacman100 Feb 14, 2023

younesbelkada commented Feb 14, 2023

pacman100 left a comment

[core] add prepare_model_for_training #85

[core] add prepare_model_for_training #85

Conversation

younesbelkada commented Feb 14, 2023

What does this PR do?

pacman100 left a comment • edited Loading

Choose a reason for hiding this comment

pacman100 Feb 14, 2023

Choose a reason for hiding this comment

younesbelkada commented Feb 14, 2023

pacman100 left a comment

Choose a reason for hiding this comment

[`core`] add `prepare_model_for_training` #85

[`core`] add `prepare_model_for_training` #85

pacman100 left a comment •

edited

Loading