Fix for Neuron #30259

michaelbenayoun · 2024-04-15T16:38:10Z

What does this PR do?

This PR fixes things for them to run on Trainium instances.

It fixes:

The use of Ellipsis is not well supported in the latest Neuron SDK
The way metadata is computed during symbolic_trace. Before no metadata was traced when a user would define its custom leaf modules, it should be working now.
The way the new transformers.cache_utils.Cache classes are handled. They can now be symbolically traced.

HuggingFaceDocBuilderDev · 2024-04-15T16:57:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

The attention layer also uses the ... notation but if it's not a prblem good. Thanks for fixing

amyeroberts

Thanks!

amyeroberts · 2024-04-29T10:05:59Z

@michaelbenayoun There were a bunch of fp16 issues we had to solve last week because of the new pytorch release. Could you try rebasing on main? This should resolve the currently failing tests

michaelbenayoun · 2024-04-29T11:40:23Z

Some tests are failing because of the PR itself. Working on that today.

ArthurZucker · 2024-04-30T12:30:05Z

src/transformers/models/cohere/modeling_cohere.py

@@ -1021,8 +1021,11 @@ def _update_causal_mask(
            causal_mask = causal_mask.clone()  # copy to contiguous memory for in-place edit
            if attention_mask.dim() == 2:
                mask_length = attention_mask.shape[-1]
-                padding_mask = causal_mask[..., :mask_length].eq(0.0) * attention_mask[:, None, None, :].eq(0.0)
-                causal_mask[..., :mask_length] = causal_mask[..., :mask_length].masked_fill(padding_mask, min_dtype)
+                padding_mask = causal_mask[:, :, :, :mask_length] + attention_mask[:, None, None, :]


you risk overflows here no?
If you have torch_min + torch_min -> produced inf

+1 - we should try to avoid this. I think it's technically OK because of the masked_fill below

Even if you overflow it's ok because the next line you create a boolean tensor: padding_mask = padding_mask == 0. So in the end you get the same result as long as the elements in the tensor that need to be zero end up having this value.

This code, which overflows in the context of the function:

padding_mask = causal_mask[:, :, :, :mask_length] + attention_mask[:, None, None, :] padding_mask *= 2 padding_mask = padding_mask == 0

produces the same output as this code, which does not overflow:

padding_mask = causal_mask[:, :, :, :mask_length] + attention_mask[:, None, None, :] padding_mask = padding_mask == 0

amyeroberts

Thanks for working on and fixing this!

tests/test_modeling_common.py

amyeroberts · 2024-04-30T12:41:43Z

src/transformers/models/cohere/modeling_cohere.py

@@ -1021,8 +1021,11 @@ def _update_causal_mask(
            causal_mask = causal_mask.clone()  # copy to contiguous memory for in-place edit
            if attention_mask.dim() == 2:
                mask_length = attention_mask.shape[-1]
-                padding_mask = causal_mask[..., :mask_length].eq(0.0) * attention_mask[:, None, None, :].eq(0.0)
-                causal_mask[..., :mask_length] = causal_mask[..., :mask_length].masked_fill(padding_mask, min_dtype)
+                padding_mask = causal_mask[:, :, :, :mask_length] + attention_mask[:, None, None, :]


+1 - we should try to avoid this. I think it's technically OK because of the masked_fill below

amyeroberts · 2024-04-30T12:44:07Z

src/transformers/utils/fx.py

+            "`past_key_values` were specified as input names, but model.config.use_cache = False, this might lead to "
+            "unexpected behavior."
+        )
+    if "past_key_values" not in input_names and hasattr(model.config, "use_cache") and model.config.use_cache:


For my own understanding - what's the reason for the different approach for getting the attr here - does model.config.use_cache default to None?

When tracing with FX, you transform the inputs tensors to Proxy that are recording the flow of operations.

So here we check for multiple cases:

past_key_values is in the requested inputs, meaning that the users wants to trace the model with the use of the cache. In this case, we warn the user if model.config.use_cache is False because no cache-related operations will be recorded.

past_key_values is not in the requested inputs, but model.config.use_cache is True. In this setting we will create a DynamicCache (with past_key_values=None), but this operation will be "hardcoded" in the graph because no proxy input will be provided (since past_key_values=None), resulting in failures. So if the users does not request past_key_values as inputs, we disable the use of cache altogether.

Ah, sorry, what I meant was there seems to be different assumptions about the default for model.config.use_cache between these two branches. i.e. why don't we do getattr(model.config, "use_cache", False) here too?

I can do that, changing it!

michaelbenayoun · 2024-04-30T14:22:12Z

@amyeroberts about this comment, you can check my answer to @ArthurZucker .

To give a little bit more context, I'm doing this because the way it is currently implemented produces a compiler error on AWS Trainium / Inferentia devices, preventing us to use these models with the latest Transformers version.

ArthurZucker

thanks for iterating

Fix _update_causal_mask for Neuron

d32aee0

michaelbenayoun added 3 commits April 16, 2024 10:15

Remove the use of Ellipsis

9429cd9

Update warning message

5f6526c

Fixup

32be4b7

michaelbenayoun marked this pull request as ready for review April 17, 2024 08:40

Fix

9e72932

ArthurZucker approved these changes Apr 17, 2024

View reviewed changes

Fix _update_causal_mask for Neuron, it works

3198401

amyeroberts approved these changes Apr 18, 2024

View reviewed changes

michaelbenayoun added 3 commits April 25, 2024 09:38

Fix FX when defining custom leaf module

446cb62

Partial fix

65ebf45

Fix

1483dbb

michaelbenayoun added 5 commits April 29, 2024 14:08

Merge branch 'main' into fix_for_neuron

206b007

[WIP] use metaclasses instead

c991c89

Works

c8362c0

Cleanup

2eae3e0

Fix

8ed56cb

michaelbenayoun mentioned this pull request Apr 30, 2024

Sync transformers and accelerate versions huggingface/optimum-neuron#562

Merged

michaelbenayoun requested review from amyeroberts and ArthurZucker April 30, 2024 09:37

ArthurZucker mentioned this pull request Apr 30, 2024

Llama model cannot be fx traced #29923

Closed

4 tasks

ArthurZucker reviewed Apr 30, 2024

View reviewed changes

amyeroberts approved these changes Apr 30, 2024

View reviewed changes

michaelbenayoun mentioned this pull request Apr 30, 2024

Make fx traced model with the use of past_key_values pickable again? #30575

Closed

michaelbenayoun added 2 commits April 30, 2024 16:35

Removing comment

0d547f4

nit

2a24b86

ArthurZucker approved these changes May 2, 2024

View reviewed changes

michaelbenayoun merged commit fbabd67 into huggingface:main May 2, 2024
23 checks passed

michaelbenayoun deleted the fix_for_neuron branch May 2, 2024 08:24

amyeroberts mentioned this pull request May 2, 2024

Fix copies for DBRX - neuron fix #30610

Merged

ArthurZucker pushed a commit that referenced this pull request May 6, 2024

Fix for Neuron (#30259)

bb98e7c

ArthurZucker mentioned this pull request May 9, 2024

Add torch.compile for Mistral #30642

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for Neuron #30259

Fix for Neuron #30259

michaelbenayoun commented Apr 15, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 15, 2024

ArthurZucker left a comment

amyeroberts left a comment

amyeroberts commented Apr 29, 2024

michaelbenayoun commented Apr 29, 2024

ArthurZucker Apr 30, 2024

amyeroberts Apr 30, 2024

michaelbenayoun Apr 30, 2024

amyeroberts left a comment

amyeroberts Apr 30, 2024

amyeroberts Apr 30, 2024

michaelbenayoun Apr 30, 2024

amyeroberts Apr 30, 2024

michaelbenayoun Apr 30, 2024

michaelbenayoun commented Apr 30, 2024

ArthurZucker left a comment

Fix for Neuron #30259

Fix for Neuron #30259

Conversation

michaelbenayoun commented Apr 15, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Apr 15, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts commented Apr 29, 2024

michaelbenayoun commented Apr 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelbenayoun commented Apr 30, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

michaelbenayoun commented Apr 15, 2024 •

edited

Loading