-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix llama sin_cached/cos_cached backward compatibility #29299
Fix llama sin_cached/cos_cached backward compatibility #29299
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@@ -100,6 +100,21 @@ def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None): | |||
inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2, dtype=torch.int64).float().to(device) / self.dim)) | |||
self.register_buffer("inv_freq", inv_freq, persistent=False) | |||
|
|||
# TODO: Remove in 4.40. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 4.40 here? This kind of version dependant removal would be for deprecation of a feature, but AFAICT in the PR comment we don't have an implemented fix which replaces this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyeroberts I just followed 7d312ad The sin_cached attribute will be removed in 4.40.
cc @gante
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh - OK. Won't this means things still break though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can remove them, no 💔
@fxmarty The extent of the fix may depend on the following question: are the libraries downstream broken because a) of the lack of the tensors, or because b) the lack of the tensors AND their values? The PR as it stands would fix a), but it probably wouldn't fix b). Full story of how this came to be:
|
Note to ourselves: non-permanent buffers can't be treated as common variables for deprecation purposes 😬 |
#29198 will add them at init time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not duplicate the work
Was not aware #29198 was a fix for that, nice! Note that with |
Feel free to comment over there |
The
_sin_cached
&_cos_cached
are never set in the init (compare to https://github.com/huggingface/transformers/blob/v4.37.2/src/transformers/models/llama/modeling_llama.py#L134-L136), which yields errors in external packages as backward compatibility is broken (e.g. in https://github.com/AutoGPTQ/AutoGPTQ/blob/6b55300dd83326504ee6e02b730fa4451adfa479/auto_gptq/modeling/_utils.py#L95-L96)IMO this should be in a patch release.