[QST] Avoiding fp16 errors during GEMM from CUTE example #1188

SamKG · 2023-11-14T21:14:24Z

What is your question?
In the CUTE GEMM Example, the accumulation buffer is fp16. However, I noticed a rather annoying downstream effect: due to rounding errors, the accumulation will silently ignore small deltas if the exponent is large (e.g. 2048 + 1 = 2048, due to precision issues).

I have two (related) questions:

Is there a way to error or warn if the accumulator is silently dropping bits?
Is there a way (aside from setting to fp32 or bf16) to avoid this issue?

hwu36 · 2023-11-15T17:42:50Z

if this is an issue to you, it means fp16 is not enough for you. you need to use fp32 accumulation.

in fp16 math, 2048 + 1 = 2048. wikipedia explains it in detail.

mnicely · 2023-12-06T15:20:49Z

I believe this has been resolved. Feel free to reopen if you disagree.

SamKG added ? - Needs Triage question Question labels Nov 14, 2023

mnicely removed the ? - Needs Triage label Nov 20, 2023

mnicely added the CuTe CuTe Functionality label Dec 6, 2023

mnicely closed this as completed Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Avoiding fp16 errors during GEMM from CUTE example #1188

[QST] Avoiding fp16 errors during GEMM from CUTE example #1188

SamKG commented Nov 14, 2023

hwu36 commented Nov 15, 2023

mnicely commented Dec 6, 2023

[QST] Avoiding fp16 errors during GEMM from CUTE example #1188

[QST] Avoiding fp16 errors during GEMM from CUTE example #1188

Comments

SamKG commented Nov 14, 2023

hwu36 commented Nov 15, 2023

mnicely commented Dec 6, 2023