[core
] add prepare_model_for_training
#85
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR makes the life easy for users. Before this PR, before training a model a user needed to manually cast some parameters in fp32 for stability (for int8 models), call
gradient_checkpointing_enable
on the model and do a hack to addrequired_grad
to the output of the embedding layer.This PR introduces a new method
prepare_model_for_training
that wraps everything in a single place. Added also some testsIMO it should be up to users to add
For 2 reasons:
1- this needs to be called before creating the
PeftModel
(if we want to add it onPeftModel
itself we might need to hack the hooks of the lm_head2- the
dtype
of the input is very specific to some models, for eg, for t5 we need to cast the input tofloat16
, whereas this seems to be not needed for opt models for instancecc @pacman100