Making the unsqueeze dimension parameterized in the apply_rotary_pos_emb function in modeling_llama.py #26948

ShashankMosaicML · 2023-10-19T22:00:42Z

Feature request

Hi Huggingface Team,
We would like to request to make the 'unsqueeze(1)' in these two lines parameterized. To be precise, we would like to request that the following lines

def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
    cos = cos[position_ids].unsqueeze(1) 
    sin = sin[position_ids].unsqueeze(1)
    ...

be converted to something like the following

def apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim=1):
    cos = cos[position_ids].unsqueeze(unsqueeze_dim)
    sin = sin[position_ids].unsqueeze(unsqueeze_dim)
    ...

Motivation

We are trying to import and use the apply_rotary_pos_emb function and the LlamaRotaryEmbedding class. However, our query and key tensors have the shape [batch_size, sequence_len, heads, dim]. To make them compatible with the apply_rotary_pos_emb function, we have to transpose the tensors to [batch_size, heads, sequence_len, dim], call apply_rotary_pos_emb on them and then transpose the tensors back to [batch_size, sequence_len, heads, dim]. These unnecessary transposes could be avoided if the apply_rotary_pos_emb function had a parameter which controlled the dimension on which the unsqueeze's here were applied.

Please note that the Llama Huggingface code also does similar back and forth transposes, and hence could benefit from this very small code change as well.

Your contribution

We are willing to submit a PR for this.

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2023-10-20T10:21:52Z

MMM changing the transpose would change the past key values which is not backward compatible. The unsqueeze dimension could be a nice to have but would rather have it in the init of the rope embedding than in the forward as this is another not backward compatible change 😓
we refrain from adding custom usage code as transformers is not meant to be a tool box with 10K config arguments, but if removing the 2 transpose gives a very good performance gain we could consider this IMO.

cc @gante for ROPE 🤗

gante · 2023-10-25T11:47:30Z

@ShashankMosaicML I do not oppose adding unsqueeze_dim=1 as an argument if it makes life easier for others 🤗

As for backward compatibility -- this is a good question. As of now, as @ArthurZucker said, we don't want to break it. However, we are currently working on a new cache structure (with a flag for backwards compatibility), so making the new cache structure optimal for FA2 might be a good idea 🤔

ArthurZucker · 2023-10-26T06:30:30Z

If we can remove 2 transpose (it's not a bottlneck but still) would be nice. Let's keep that in mind when refactoring the cache cc @tomaarsen as well.

gante · 2023-10-27T11:22:04Z

@ShashankMosaicML feel free to open a PR and tag us 🤗

ShashankMosaicML · 2023-10-27T22:25:05Z

Great, will do this soon!

ShashankMosaicML · 2023-10-28T01:09:38Z

Here is the link to the pull request.

ShashankMosaicML mentioned this issue Oct 28, 2023

added unsqueeze_dim to apply_rotary_pos_emb #27117

Merged

5 tasks

amyeroberts closed this as completed in #27117 Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making the unsqueeze dimension parameterized in the apply_rotary_pos_emb function in modeling_llama.py #26948

Making the unsqueeze dimension parameterized in the apply_rotary_pos_emb function in modeling_llama.py #26948

ShashankMosaicML commented Oct 19, 2023

ArthurZucker commented Oct 20, 2023

gante commented Oct 25, 2023

ArthurZucker commented Oct 26, 2023

gante commented Oct 27, 2023

ShashankMosaicML commented Oct 27, 2023

ShashankMosaicML commented Oct 28, 2023

Making the unsqueeze dimension parameterized in the apply_rotary_pos_emb function in modeling_llama.py #26948

Making the unsqueeze dimension parameterized in the apply_rotary_pos_emb function in modeling_llama.py #26948

Comments

ShashankMosaicML commented Oct 19, 2023

Feature request

Motivation

Your contribution

ArthurZucker commented Oct 20, 2023

gante commented Oct 25, 2023

ArthurZucker commented Oct 26, 2023

gante commented Oct 27, 2023

ShashankMosaicML commented Oct 27, 2023

ShashankMosaicML commented Oct 28, 2023