Add Flex Attention for Mistral along with refactoring #34845

OmarManzoor · 2024-11-21T08:19:26Z

What does this PR do?

Adds Flex Attention for Mistral
Does refactoring to enable the attention mechanisms using functions instead of classes

Who can review?

OmarManzoor · 2024-11-21T10:34:29Z

tests/test_modeling_common.py

-                    if "SdpaAttention" in class_name or "SdpaSelfAttention" in class_name:
+                    if ("SdpaAttention" in class_name or "SdpaSelfAttention" in class_name) or (
+                        hasattr(submodule, "_uses_attention_functions") and submodule._uses_attention_functions
+                    ):


I am not exactly sure how to handle this correctly.

ArthurZucker

Thanks for working on this!

ArthurZucker · 2024-11-25T15:07:14Z

src/transformers/models/mistral/modeling_mistral.py

+            if self._attn_implementation != "flash_attention_2":
+                cache_kwargs["cache_position"] = cache_position


I don't think this escape is required no?

Currently this is how it is handled in FlashAttention2

if past_key_value is not None: cache_kwargs = {"sin": sin, "cos": cos} # Specific to RoPE models key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)

src/transformers/models/mistral/modeling_mistral.py

OmarManzoor · 2024-11-26T07:02:17Z

@ArthurZucker What should be do about these failing tests? I think they are related to the sdpa tests where we might have output_attentions equal to True.

ydshieh · 2024-11-27T13:12:24Z

Hi. Let's take a look on failing test(s) step by step.

First, do you know why Idefics2 will have something MistralAttention? It's very strange. (see the test_torch job log)

FAILED tests/models/idefics2/test_modeling_idefics2.py::Idefics2ForConditionalGenerationModelTest::test_retain_grad_hidden_states_attentions - AttributeError: 'MistralAttention' object has no attribute 'scaling'

OmarManzoor · 2024-11-28T06:53:35Z

Hi. Let's take a look on failing test(s) step by step.

First, do you know why Idefics2 will have something MistralAttention? It's very strange. (see the test_torch job log)

FAILED tests/models/idefics2/test_modeling_idefics2.py::Idefics2ForConditionalGenerationModelTest::test_retain_grad_hidden_states_attentions - AttributeError: 'MistralAttention' object has no attribute 'scaling'

Idefics2 uses mistral as the text model

self.text_model = AutoModel.from_config(config.text_config)

        if isinstance(text_config, dict):
            text_config["model_type"] = text_config["model_type"] if "model_type" in text_config else "mistral"
            text_config = CONFIG_MAPPING[text_config["model_type"]](**text_config)
        elif text_config is None:
            logger.info("text_config is None, using default text config")
            text_config = CONFIG_MAPPING["mistral"](
                max_position_embeddings=4096 * 8,
                rms_norm_eps=1e-5,
                # None in the original configuration_mistral, we set it to the unk_token_id
                pad_token_id=0,
                tie_word_embeddings=False,
            )

ArthurZucker · 2024-12-04T13:48:55Z

BTW let's make sure we rebase now that #34896 was merged!

ArthurZucker · 2024-12-05T14:55:13Z

can you make sure the CIs are green? 🤗

OmarManzoor · 2024-12-05T17:26:32Z

can you make sure the CIs are green? 🤗

Should I reset the default back to eager instead of flex because the eager matches sdpa fails for float32 when using flex. Or do we need to change the thresholds to ensure that flex remains the default while the tests are also green?

ArthurZucker

Hey! We actually shipped this in #35235 ! 🤗 super sorry for the late notice

OmarManzoor · 2024-12-23T15:08:52Z

Hey! We actually shipped this in #35235 ! 🤗 super sorry for the late notice

Thanks for informing.

OmarManzoor added 2 commits November 21, 2024 13:15

Add Flex Attention for Mistral along with refactoring

69f978b

Fix the test_sdpa_can_dispatch_non_composite_models test

22b3482

OmarManzoor commented Nov 21, 2024

View reviewed changes

Fix test in idefics2

44b36f7

ArthurZucker reviewed Nov 25, 2024

View reviewed changes

OmarManzoor added 2 commits November 26, 2024 10:43

Merge remote-tracking branch 'upstream/main' into mistral_flex

37a6776

Address PR suggestions

fc7a287

Merge branch 'main' into mistral_flex

2b90b24

Some updates

da785da

OmarManzoor added 2 commits December 4, 2024 21:32

Merge branch 'main' into mistral_flex

123f503

Updates

5fd2380

ArthurZucker reviewed Dec 23, 2024

View reviewed changes

OmarManzoor closed this Dec 23, 2024

OmarManzoor deleted the mistral_flex branch December 23, 2024 15:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Flex Attention for Mistral along with refactoring #34845

Add Flex Attention for Mistral along with refactoring #34845

OmarManzoor commented Nov 21, 2024

OmarManzoor Nov 21, 2024

ArthurZucker left a comment

ArthurZucker Nov 25, 2024

OmarManzoor Nov 26, 2024

OmarManzoor commented Nov 26, 2024

ydshieh commented Nov 27, 2024 •

edited

Loading

OmarManzoor commented Nov 28, 2024

ArthurZucker commented Dec 4, 2024

ArthurZucker commented Dec 5, 2024

OmarManzoor commented Dec 5, 2024

ArthurZucker left a comment

OmarManzoor commented Dec 23, 2024

		if self._attn_implementation != "flash_attention_2":
		cache_kwargs["cache_position"] = cache_position

Add Flex Attention for Mistral along with refactoring #34845

Add Flex Attention for Mistral along with refactoring #34845

Conversation

OmarManzoor commented Nov 21, 2024

What does this PR do?

Who can review?

OmarManzoor Nov 21, 2024

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Nov 25, 2024

Choose a reason for hiding this comment

OmarManzoor Nov 26, 2024

Choose a reason for hiding this comment

OmarManzoor commented Nov 26, 2024

ydshieh commented Nov 27, 2024 • edited Loading

OmarManzoor commented Nov 28, 2024

ArthurZucker commented Dec 4, 2024

ArthurZucker commented Dec 5, 2024

OmarManzoor commented Dec 5, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

OmarManzoor commented Dec 23, 2024

ydshieh commented Nov 27, 2024 •

edited

Loading