-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[gpt
] Gpt2 fix half precision causal mask
#23256
[gpt
] Gpt2 fix half precision causal mask
#23256
Conversation
gpt
] Gpt2 fix 8bit inferencegpt
] Gpt2 fix half precision causal mask
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing! I just have a question about _keys_to_ignore_on_load_missing
for decision transformer
@@ -746,7 +747,8 @@ class DecisionTransformerPreTrainedModel(PreTrainedModel): | |||
base_model_prefix = "decision_transformer" | |||
main_input_name = "states" | |||
supports_gradient_checkpointing = False | |||
_keys_to_ignore_on_load_missing = [r"position_ids"] | |||
_keys_to_ignore_on_load_missing = [r"position_ids", r"h\.\d+\.attn\.masked_bias", r"h\.\d+\.attn\.bias"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should r"h\.\d+\.attn\.masked_bias", r"h\.\d+\.attn\.bias"
be in _keys_to_ignore_on_load_missing
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you are right it should probably be in keys_to_ignore_on_load_unexpected
only; will modify that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for fixing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for fixing :)
* fix gpt2 inference * fixup * no need to be in `_keys_to_ignore_on_load_missing`
* fix gpt2 inference * fixup * no need to be in `_keys_to_ignore_on_load_missing`
* fix gpt2 inference * fixup * no need to be in `_keys_to_ignore_on_load_missing`
What does this PR do?
Applies a similar fix than #23136 but for GPT2.
To reproduce:
The explanation is the same as the tagged PR:
Some users reported that they were also able to reproduce on PyTorch main branch but without load_in_8bit, I didn't managed to reproduce that way, I will have a deeper look
cc @amyeroberts