-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making the unsqueeze dimension parameterized in the apply_rotary_pos_emb function in modeling_llama.py #26948
Comments
MMM changing the transpose would change the past key values which is not backward compatible. The unsqueeze dimension could be a nice to have but would rather have it in the init of the rope embedding than in the forward as this is another not backward compatible change 😓 cc @gante for ROPE 🤗 |
@ShashankMosaicML I do not oppose adding As for backward compatibility -- this is a good question. As of now, as @ArthurZucker said, we don't want to break it. However, we are currently working on a new cache structure (with a flag for backwards compatibility), so making the new cache structure optimal for FA2 might be a good idea 🤔 |
If we can remove 2 transpose (it's not a bottlneck but still) would be nice. Let's keep that in mind when refactoring the cache cc @tomaarsen as well. |
@ShashankMosaicML feel free to open a PR and tag us 🤗 |
Great, will do this soon! |
Here is the link to the pull request. |
Feature request
Hi Huggingface Team,
We would like to request to make the 'unsqueeze(1)' in these two lines parameterized. To be precise, we would like to request that the following lines
be converted to something like the following
Motivation
We are trying to import and use the apply_rotary_pos_emb function and the LlamaRotaryEmbedding class. However, our query and key tensors have the shape
[batch_size, sequence_len, heads, dim]
. To make them compatible with theapply_rotary_pos_emb
function, we have to transpose the tensors to[batch_size, heads, sequence_len, dim]
, callapply_rotary_pos_emb
on them and then transpose the tensors back to[batch_size, sequence_len, heads, dim]
. These unnecessary transposes could be avoided if theapply_rotary_pos_emb
function had a parameter which controlled the dimension on which theunsqueeze
's here were applied.Please note that the Llama Huggingface code also does similar back and forth transposes, and hence could benefit from this very small code change as well.
Your contribution
We are willing to submit a PR for this.
The text was updated successfully, but these errors were encountered: