Fix precision errors from casting rotary parameters to FP16 with AMP #27700

kevinhu · 2023-11-25T03:13:40Z

What does this PR do?

When training with AMP, using einsum to multiply t and self.inv_freq will introduce precision errors because it casts the result to FP16. This can be avoided by using torch.outer instead, as originally mentioned here: https://github.com/Dao-AILab/flash-attention/blob/2c3baba4a63c4007c8a132c5380edc9430f88a22/flash_attn/layers/rotary.py#L396C1-L398C45

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

ArthurZucker

Hey! Thanks for opening this PR, it seems to me that the issue lies with AMP no?
My only concern would have been performances, outer might be a little bit slower but it seems to be negligible so LGTM.
Let's make sure that the failing test is fixed!

ArthurZucker

I ran the following script for benchmarking:

import torch
from torch.utils import benchmark

results = []
for b in [10, 10000, 2000000]:
    for n in [10, 100, 10000, 1000000]:
        if b * n >= 1000000000:
            continue

        description = f'[{b}, {n}]'

        x = torch.rand(b, device='mps')
        y = torch.rand(n, device='mps')

        results.append(benchmark.Timer(
            stmt='torch.outer(x,y)',
            globals={'x': x, 'y': y},
            description=description,
        ).blocked_autorange())

        results.append(benchmark.Timer(
            stmt='torch.einsum("i,j->ij",x,y)',
            globals={'x': x, 'y': y},
            description=description,
        ).blocked_autorange())

compare = benchmark.Compare(results)
compare.trim_significant_figures()
compare.colorize()
compare.print()

Got the following:

So looks good to me 😉

ArthurZucker

failing test is unrelated to the PR i'll fix it on main

HuggingFaceDocBuilderDev · 2023-11-28T08:57:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

ArthurZucker · 2023-11-29T15:31:15Z

FYI @gante and @Rocketknight1 if we see anything failing. I ran slow tests locally and it was all good

kevinhu added 6 commits November 24, 2023 19:09

Update modeling_llama.py

643098a

Update modeling_open_llama.py

cd763d2

Update modeling_gpt_neox.py

659b8fc

Update modeling_mistral.py

9dfac9e

Update modeling_persimmon.py

d3aef86

Update modeling_phi.py

3f81d37

kevinhu changed the title ~~Fix precision errors from casting rotary parameters to FP16 with AMP in Llama~~ Fix precision errors from casting rotary parameters to FP16 with AMP Nov 25, 2023

kevinhu added 2 commits November 24, 2023 20:37

Update modeling_falcon.py

db4c79f

Update modeling_gpt_neox_japanese.py

d00f086

ArthurZucker reviewed Nov 27, 2023

View reviewed changes

Merge branch 'huggingface:main' into fix-einsum-amp

ecaf512

ArthurZucker reviewed Nov 28, 2023

View reviewed changes

ArthurZucker approved these changes Nov 28, 2023

View reviewed changes

Merge branch 'huggingface:main' into fix-einsum-amp

3f17191

ArthurZucker merged commit 083e369 into huggingface:main Nov 29, 2023
19 checks passed

tomaarsen mentioned this pull request Dec 8, 2023

Generate: SinkCache can handle iterative prompts #27907

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix precision errors from casting rotary parameters to FP16 with AMP #27700

Fix precision errors from casting rotary parameters to FP16 with AMP #27700

kevinhu commented Nov 25, 2023 •

edited

Loading

ArthurZucker left a comment

ArthurZucker left a comment

ArthurZucker left a comment

HuggingFaceDocBuilderDev commented Nov 28, 2023

ArthurZucker commented Nov 29, 2023

Fix precision errors from casting rotary parameters to FP16 with AMP #27700

Fix precision errors from casting rotary parameters to FP16 with AMP #27700

Conversation

kevinhu commented Nov 25, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 28, 2023

ArthurZucker commented Nov 29, 2023

kevinhu commented Nov 25, 2023 •

edited

Loading