Revert "Optimize `fp8` `linalg_ext.attention` by rework Q@K scaling" #18112

raikonenfnu · 2024-08-06T05:17:18Z

Mathematically correct, however since scaling (which is an elementwise mul linalg.generic) happens now before reduction-addf, it unexpectedly becomes a vector.contract after GenericVectorization pass.

    %41 = vector.contract {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>], iterator_types = ["parallel", "reduction"], kind = #vector.kind<add>} %39, %cst_0, %34 : vector<32x128xf32>, vector<32x128xf32> into vector<32xf32>

Groverkss · 2024-08-06T11:06:51Z

I don't think this should be a revert... This is a problem with our codegen, not the decomposition or the attention operation. Why is the vector.contract a problem btw?

Groverkss

Comment above

Groverkss

After talking to Stan offline, this is fine for now.

…18031)" This reverts commit 2c53b4a.

raikonenfnu requested review from hanhanW and MaheshRavishankar as code owners August 6, 2024 05:17

raikonenfnu requested review from Groverkss and rsuderman and removed request for MaheshRavishankar and hanhanW August 6, 2024 05:17

Groverkss requested changes Aug 6, 2024

View reviewed changes

Groverkss approved these changes Aug 6, 2024

View reviewed changes

Revert "Optimize fp8 linalg_ext.attention by rework Q@K scaling (#…

8b1d241

…18031)" This reverts commit 2c53b4a.

raikonenfnu force-pushed the revert-18031-optimize_fp8_attention branch from 06b277d to 8b1d241 Compare August 6, 2024 17:22

raikonenfnu merged commit 71f1e20 into main Aug 6, 2024
47 checks passed

raikonenfnu deleted the revert-18031-optimize_fp8_attention branch August 6, 2024 20:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Optimize `fp8` `linalg_ext.attention` by rework Q@K scaling" #18112

Revert "Optimize `fp8` `linalg_ext.attention` by rework Q@K scaling" #18112

raikonenfnu commented Aug 6, 2024

Groverkss commented Aug 6, 2024

Groverkss left a comment

Groverkss left a comment

Revert "Optimize fp8 linalg_ext.attention by rework Q@K scaling" #18112

Revert "Optimize fp8 linalg_ext.attention by rework Q@K scaling" #18112

Conversation

raikonenfnu commented Aug 6, 2024

Groverkss commented Aug 6, 2024

Groverkss left a comment

Choose a reason for hiding this comment

Groverkss left a comment

Choose a reason for hiding this comment

Revert "Optimize `fp8` `linalg_ext.attention` by rework Q@K scaling" #18112

Revert "Optimize `fp8` `linalg_ext.attention` by rework Q@K scaling" #18112