Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making the unsqueeze dimension parameterized in the apply_rotary_pos_emb function in modeling_llama.py #26948

Closed
ShashankMosaicML opened this issue Oct 19, 2023 · 6 comments · Fixed by #27117

Comments

@ShashankMosaicML
Copy link
Contributor

Feature request

Hi Huggingface Team,
We would like to request to make the 'unsqueeze(1)' in these two lines parameterized. To be precise, we would like to request that the following lines

def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
    cos = cos[position_ids].unsqueeze(1) 
    sin = sin[position_ids].unsqueeze(1)
    ...

be converted to something like the following

def apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim=1):
    cos = cos[position_ids].unsqueeze(unsqueeze_dim)
    sin = sin[position_ids].unsqueeze(unsqueeze_dim)
    ...

Motivation

We are trying to import and use the apply_rotary_pos_emb function and the LlamaRotaryEmbedding class. However, our query and key tensors have the shape [batch_size, sequence_len, heads, dim]. To make them compatible with the apply_rotary_pos_emb function, we have to transpose the tensors to [batch_size, heads, sequence_len, dim], call apply_rotary_pos_emb on them and then transpose the tensors back to [batch_size, sequence_len, heads, dim]. These unnecessary transposes could be avoided if the apply_rotary_pos_emb function had a parameter which controlled the dimension on which the unsqueeze's here were applied.

Please note that the Llama Huggingface code also does similar back and forth transposes, and hence could benefit from this very small code change as well.

Your contribution

We are willing to submit a PR for this.

@ArthurZucker
Copy link
Collaborator

MMM changing the transpose would change the past key values which is not backward compatible. The unsqueeze dimension could be a nice to have but would rather have it in the init of the rope embedding than in the forward as this is another not backward compatible change 😓
we refrain from adding custom usage code as transformers is not meant to be a tool box with 10K config arguments, but if removing the 2 transpose gives a very good performance gain we could consider this IMO.

cc @gante for ROPE 🤗

@gante
Copy link
Member

gante commented Oct 25, 2023

@ShashankMosaicML I do not oppose adding unsqueeze_dim=1 as an argument if it makes life easier for others 🤗

As for backward compatibility -- this is a good question. As of now, as @ArthurZucker said, we don't want to break it. However, we are currently working on a new cache structure (with a flag for backwards compatibility), so making the new cache structure optimal for FA2 might be a good idea 🤔

@ArthurZucker
Copy link
Collaborator

If we can remove 2 transpose (it's not a bottlneck but still) would be nice. Let's keep that in mind when refactoring the cache cc @tomaarsen as well.

@gante
Copy link
Member

gante commented Oct 27, 2023

@ShashankMosaicML feel free to open a PR and tag us 🤗

@ShashankMosaicML
Copy link
Contributor Author

Great, will do this soon!

@ShashankMosaicML
Copy link
Contributor Author

Here is the link to the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants