[Gemma] Fix eager attention #29187

sanchit-gandhi · 2024-02-21T22:57:27Z

What does this PR do?

Fixes the Gemma "eager" attention implementation, which is the default for torch versions <= 2.1. This issue was reported on the Hub discussions and by @osanseviero from the model cards/blog post.

The PR also includes a set of slow tests to ensure:

Logits equivalence between {eager, sdpa, flash attention 2}
Expected generation results for {eager, sdpa, flash attention 2}

=> these tests confirm all attention implementations work and have equivalence between back-ends

sanchit-gandhi · 2024-02-21T23:02:27Z

src/transformers/models/gemma/modeling_gemma.py

@@ -276,7 +276,7 @@ def forward(

        attn_output = attn_output.transpose(1, 2).contiguous()

-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+        attn_output = attn_output.view(bsz, q_len, -1)


This is the only modelling code change required - the remainder of the changes in this PR are logit + integration tests

HuggingFaceDocBuilderDev · 2024-02-21T23:17:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2024-02-22T00:07:49Z

Thanks for the catch. A real pity our 280 prior test did not catch this! 🤗

* fix modelling code * add tests * fix tests * add some logit tests * style * fix fix

sanchit-gandhi · 2024-02-22T10:18:37Z

I'll look at integrating 2 tests in test_modeling_common.py that can automatically test for eager vs sdpa and eager vs fa2. Currently, this is done on an ad-hoc basis for each model (e.g. Whisper here).

ArthurZucker · 2024-02-22T10:24:02Z

It's also that it tests dummy models, but here the head_dim != hidden / head! So a case no studied

fxmarty · 2024-02-22T17:49:56Z

@sanchit-gandhi Not true, see

transformers/tests/test_modeling_common.py

Line 3431 in 2cc8cf6

def test_eager_matches_sdpa_inference(self, torch_dtype: str):

. It is just a slow test. For FA2 vs eager it's possible there is not global test though, did not check that

ArthurZucker · 2024-02-23T08:44:48Z

eager vs sdpa passed 😉

sanchit-gandhi added 6 commits February 21, 2024 22:28

fix modelling code

2b599be

add tests

97b1d95

fix tests

00d98e5

add some logit tests

bd03127

style

7c8d072

fix fix

8055091

sanchit-gandhi requested review from ArthurZucker and younesbelkada February 21, 2024 22:59

sanchit-gandhi commented Feb 21, 2024

View reviewed changes

ArthurZucker approved these changes Feb 22, 2024

View reviewed changes

ArthurZucker merged commit 2a9b1f8 into huggingface:main Feb 22, 2024
18 checks passed

ArthurZucker pushed a commit that referenced this pull request Feb 22, 2024

[Gemma] Fix eager attention (#29187)

2f54e0b

* fix modelling code * add tests * fix tests * add some logit tests * style * fix fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Gemma] Fix eager attention #29187

[Gemma] Fix eager attention #29187

sanchit-gandhi commented Feb 21, 2024 •

edited

Loading

sanchit-gandhi Feb 21, 2024

HuggingFaceDocBuilderDev commented Feb 21, 2024

ArthurZucker commented Feb 22, 2024

sanchit-gandhi commented Feb 22, 2024 •

edited

Loading

ArthurZucker commented Feb 22, 2024

fxmarty commented Feb 22, 2024 •

edited

Loading

ArthurZucker commented Feb 23, 2024

[Gemma] Fix eager attention #29187

[Gemma] Fix eager attention #29187

Conversation

sanchit-gandhi commented Feb 21, 2024 • edited Loading

What does this PR do?

sanchit-gandhi Feb 21, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 21, 2024

ArthurZucker commented Feb 22, 2024

sanchit-gandhi commented Feb 22, 2024 • edited Loading

ArthurZucker commented Feb 22, 2024

fxmarty commented Feb 22, 2024 • edited Loading

ArthurZucker commented Feb 23, 2024

sanchit-gandhi commented Feb 21, 2024 •

edited

Loading

sanchit-gandhi commented Feb 22, 2024 •

edited

Loading

fxmarty commented Feb 22, 2024 •

edited

Loading