Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Optimize fp8 linalg_ext.attention by rework Q@K scaling" #18112

Merged
merged 1 commit into from
Aug 6, 2024

Conversation

raikonenfnu
Copy link
Collaborator

Reverts #18031

Mathematically correct, however since scaling (which is an elementwise mul linalg.generic) happens now before reduction-addf, it unexpectedly becomes a vector.contract after GenericVectorization pass.

    %41 = vector.contract {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>], iterator_types = ["parallel", "reduction"], kind = #vector.kind<add>} %39, %cst_0, %34 : vector<32x128xf32>, vector<32x128xf32> into vector<32xf32>

@raikonenfnu raikonenfnu requested review from Groverkss and rsuderman and removed request for MaheshRavishankar and hanhanW August 6, 2024 05:17
@Groverkss
Copy link
Contributor

I don't think this should be a revert... This is a problem with our codegen, not the decomposition or the attention operation. Why is the vector.contract a problem btw?

Copy link
Contributor

@Groverkss Groverkss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment above

Copy link
Contributor

@Groverkss Groverkss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking to Stan offline, this is fine for now.

@raikonenfnu raikonenfnu force-pushed the revert-18031-optimize_fp8_attention branch from 06b277d to 8b1d241 Compare August 6, 2024 17:22
@raikonenfnu raikonenfnu merged commit 71f1e20 into main Aug 6, 2024
47 checks passed
@raikonenfnu raikonenfnu deleted the revert-18031-optimize_fp8_attention branch August 6, 2024 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants