[Time series Informer] fix dtype of cumsum #25431

kashif · 2023-08-10T09:03:13Z

What does this PR do?

Fix an issue when training Informer with FP16, the cumsum returns float32.

See report here: https://discuss.huggingface.co/t/how-to-train-on-multiple-gpus-the-informer-model-for-time-series-forecasting/48984/3

HuggingFaceDocBuilderDev · 2023-08-10T09:22:18Z

The documentation is not available anymore as the PR was closed or merged.

ArthurZucker

Nice catch! Thanks for the fix. Is there a reason for float32 -> cast to the correct type instead of computing it in the correct dtype? (If so just add a small comment! )

ArthurZucker · 2023-08-17T16:23:49Z

src/transformers/models/informer/modeling_informer.py

@@ -647,7 +647,7 @@ def forward(
        # calculate context for updating the attn_output, based on:
        # https://github.com/zhouhaoyi/Informer2020/blob/ac59c7447135473fb2aafeafe94395f884d5c7a5/models/attn.py#L74
        if self.is_decoder:
-            context = value_states.cumsum(dim=-2)
+            context = value_states.cumsum(dim=-2, dtype=torch.float32).to(value_states.dtype)


Suggested change

context = value_states.cumsum(dim=-2, dtype=torch.float32).to(value_states.dtype)

context = value_states.cumsum(dim=-2, dtype=value_states.dtype)

so I wanted the cumsum to be in float32 to avoid overflow etc. and then cast...

ArthurZucker

Thanks

* fix dtype of cumsum * add comment

fix dtype of cumsum

df15a88

Merge branch 'huggingface:main' into informer-cumsum

91b4915

kashif requested a review from ArthurZucker August 17, 2023 16:14

ArthurZucker reviewed Aug 17, 2023

View reviewed changes

add comment

0383a3e

ArthurZucker approved these changes Aug 18, 2023

View reviewed changes

kashif merged commit 8d2f953 into huggingface:main Aug 18, 2023

kashif deleted the informer-cumsum branch August 18, 2023 12:27

blbadger pushed a commit to blbadger/transformers that referenced this pull request Nov 8, 2023

[Time series Informer] fix dtype of cumsum (huggingface#25431)

37f0ba3

* fix dtype of cumsum * add comment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Time series Informer] fix dtype of cumsum #25431

[Time series Informer] fix dtype of cumsum #25431

kashif commented Aug 10, 2023

HuggingFaceDocBuilderDev commented Aug 10, 2023 •

edited

Loading

ArthurZucker left a comment

ArthurZucker Aug 17, 2023

kashif Aug 17, 2023

ArthurZucker left a comment

	context = value_states.cumsum(dim=-2, dtype=torch.float32).to(value_states.dtype)
	context = value_states.cumsum(dim=-2, dtype=value_states.dtype)

[Time series Informer] fix dtype of cumsum #25431

[Time series Informer] fix dtype of cumsum #25431

Conversation

kashif commented Aug 10, 2023

What does this PR do?

HuggingFaceDocBuilderDev commented Aug 10, 2023 • edited Loading

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Aug 17, 2023

Choose a reason for hiding this comment

kashif Aug 17, 2023

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Aug 10, 2023 •

edited

Loading