-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix _init_weights
for ResNetPreTrainedModel
#31851
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging into this and fixing!
I think there's a wider issue outside of mismatched shapes due to the fast_init logic. I started looking into it here: #30451 but haven't had time to finish as there's quite a few models (notably audio models) which will not have all of their parameters properly initialized.
tests/test_modeling_common.py
Outdated
@@ -3179,9 +3179,9 @@ def test_mismatched_shapes_have_properly_initialized_weights(self): | |||
continue | |||
|
|||
# TODO: ydshieh | |||
if model_class.__name__ in ["Wav2Vec2ForSequenceClassification", "Wav2Vec2ForSequenceClassification"]: | |||
if model_class.__name__ in ["Wav2Vec2ForSequenceClassification", "Wav2Vec2ForSequenceClassification", "CLIPForImageClassification", "RegNetForImageClassification"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm - skipping a test because it's failing it's the best reason.... As this is already done, I'm OK with it providing there's an issue to track so this is resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyeroberts Here there are 2 kinds of failures:
- one is about not initialized at all and giving very large or even
nan
values: this is important and should be fixed - one is about if we are using
config.initializer_factor
to control the initialization, but that is really more for testing purpose (historically)
initializer_factor (`float`, *optional*, defaults to 1.0):
A factor for initializing all weight matrices (should be kept to 1, used internally for initialization
testing).
and that is less important. For the model classes I added here, they seem all belong to the 2nd cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding #30451, it might contain other issues/cases I don't observed here though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#30451 Would fall under the first kind of error - for some models and some parameters we get unpredictable values e.g. nans because the empty
arrays are never properly initialized as they're skipped during initialization. It's due to how our initialization logic works, where a layer or paramater can be marked as initialized even if it's only partially done. For example, only the bias being initialized. The main issue is that it's not properly caught in our tests at the moment.
For example
Values like |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
"Wav2Vec2ForSequenceClassification", | ||
"CLIPForImageClassification", | ||
"RegNetForImageClassification", | ||
"ResNetForImageClassification", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyeroberts I realized we can't just skip by looking the model class (otherwise we don't detect ResNetForImageClassification
before the fix of this PR)
special_param_names = [ | ||
r"wav2vec2\.masked_spec_embed", | ||
r"wav2vec2\.feature_extractor\.conv_layers\..+\.conv\.weight", | ||
r"wav2vec2\.feature_projection\.projection\.weight", | ||
r"wav2vec2\.feature_projection\.projection\.bias", | ||
r"wav2vec2\.encoder\.pos_conv_embed\.conv\.parametrizations\.weight\.original.", | ||
r"classifier\.weight", | ||
r"regnet\.embedder\.embedder\.convolution\.weight", | ||
r"regnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.convolution\.weight", | ||
r"regnet\.encoder\.stages\..+\.layers\..+\.shortcut\.convolution\.weight", | ||
r"regnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.attention\..+\.weight", | ||
r"regnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.attention\..+\.bias", | ||
r"classifier\..+\.weight", | ||
r"classifier\..+\.bias", | ||
r"resnet\.embedder\.embedder\.convolution\.weight", | ||
r"resnet\.encoder\.stages\..+\.layers\..+\.shortcut\.convolution\.weight", | ||
r"resnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.convolution\.weight", | ||
r"resnet\.encoder\.stages\..+\.layers\..+\.shortcut\.convolution\.weight", | ||
r"resnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.attention\..+\.weight", | ||
r"resnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.attention\..+\.bias", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So let's not skip, but allow the mean values to be in [-1.0, 1.0]
for some parameters of certain model classes.
self.assertGreaterEqual( | ||
param_mean, | ||
-1.0, | ||
msg=f"Parameter {name} of model {model_class} seems not properly initialized", | ||
) | ||
self.assertLessEqual( | ||
param_mean, | ||
1.0, | ||
msg=f"Parameter {name} of model {model_class} seems not properly initialized", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the new block.
* Revert "Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (#31868)" This reverts commit b45dd5d. * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check --------- Co-authored-by: ydshieh <[email protected]>
Thanks @ydshieh and @amyeroberts ! Good job finding other impacted models and fixing! |
* Revert "Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (huggingface#31868)" This reverts commit b45dd5d. * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check --------- Co-authored-by: ydshieh <[email protected]>
* Revert "Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (huggingface#31868)" This reverts commit b45dd5d. * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check --------- Co-authored-by: ydshieh <[email protected]>
* Revert "Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (huggingface#31868)" This reverts commit b45dd5d. * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check --------- Co-authored-by: ydshieh <[email protected]>
What does this PR do?
Fix #31841. The cause is clearly explained by @williford.
I still have to check why this is not captured by CI.
OK, it's because the test added in # (
test_mismatched_shapes_have_properly_initialized_weights
) only checksAutoModelForSequenceClassification
but notAutoModelForSequenceClassification
.I will have to update it.