[attention] Extend attention to fuse transpose #669

antiagainst · 2024-05-10T17:59:28Z

No description provided.

antiagainst · 2024-05-22T16:27:13Z

Update 5/22: patch iree-org/iree#17408 out; needing review.
Update 5/23: working on decomposition and tiling. Patch out today or so.

Groverkss · 2024-06-04T16:05:57Z

Plan to finish it this week (Before Jun 7):

4 Jun: Land online attention (iree-org/iree#17536)
5 Jun: Create transform script using online_attention for MFMA
6 Jun: Add indexing_maps to attention op
7 Jun: Fusions

raikonenfnu · 2024-07-12T21:46:12Z

Hey guys, quick update

Indexing attention ([LinalgExt] Adding IndexingMaps to linalg_ext.attentionOp iree-org/iree#17864) has landed
CastTypeOFitMMA support for TD pipeline ([LLVMGPU] Support CastTypeToFitMMA on TransformDialect script. iree-org/iree#17884) is up
transfer_write distribution for non contiguous indexing map ([LLVMGPU][VectorDist] Enable support to distribute vector.transfer_write with non-contiguous dims iree-org/iree#17895) is up

Once 2. and 3. and iree-org/iree@d2ca774 is landed on main, we should be able to handle/compile fused attn-transpose.

antiagainst · 2024-07-12T23:33:16Z

Awesome. All 3 pull requests are in. Can you send out the last piece?

raikonenfnu · 2024-07-13T00:02:17Z

Awesome. All 3 pull requests are in. Can you send out the last piece?

Hey Lei, I think @MaheshRavishankar is en route to pushing that one in! :)

MaheshRavishankar · 2024-07-13T00:03:14Z

I can send it in early next week.

raikonenfnu · 2024-07-13T00:30:29Z

I also pushed up/updated the spec mlir to find k2 correctly (link). I tested compiling on the fusion-preprocessing test MLIR (here) and was able to get a vmfb out.

The gist above is slightly different from the test in where we make the scale constant here. It fails on vector distribution if scale is not constant.

compile command:

~/nod/iree-build-notrace/tools/iree-compile constant_transpose_fusion.mlir --iree-hal-target-backends=rocm --iree-rocm-target-chip=gfx942 --iree-global-opt-propagate-transposes=true --iree-opt-outer-dim-concat=true --iree-opt-const-eval=false --iree-opt-data-tiling=false --iree-rocm-waves-per-eu=2 --iree-vm-target-truncate-unsupported-floats --iree-codegen-llvmgpu-use-vector-distribution --iree-codegen-gpu-native-math-precision=true --iree-flow-enable-aggressive-fusion -o attention.vmfb --iree-codegen-transform-dialect-library=attention_and_matmul_spec.mlir

raikonenfnu · 2024-07-13T00:54:06Z

FYI I also tested the attention-transpose-fusion vmfb numerics on normal random numbers (0.0, 1.0) against torch, seems like we have good numerics there :)

Starting IR, compile command, data generator can all be found in https://gist.github.com/raikonenfnu/973b4d91e4378702ce4b4496d732cb57

Needed to update the shape from the original fusion-preprocessing test a little bit since the fastest dim for Q,K,V needs to be the same to run on pytorch.

antiagainst added this to Turbine: SDXL on CDNA May 10, 2024

antiagainst converted this from a draft issue May 10, 2024

antiagainst assigned Groverkss May 10, 2024

antiagainst changed the title ~~KERNEL: Extend attention to fuse transpose~~ [attention] Extend attention to fuse transpose May 11, 2024

antiagainst assigned raikonenfnu Jun 26, 2024

antiagainst added the sdxl-int8 Issues replated to SDXL quantized model support label Jul 12, 2024

antiagainst closed this as completed Jul 27, 2024

github-project-automation bot moved this from In progress to Done in Turbine: SDXL on CDNA Jul 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[attention] Extend attention to fuse transpose #669

[attention] Extend attention to fuse transpose #669

antiagainst commented May 10, 2024

antiagainst commented May 22, 2024 •

edited

Loading

Groverkss commented Jun 4, 2024

raikonenfnu commented Jul 12, 2024

antiagainst commented Jul 12, 2024

raikonenfnu commented Jul 13, 2024

MaheshRavishankar commented Jul 13, 2024

raikonenfnu commented Jul 13, 2024

raikonenfnu commented Jul 13, 2024

[attention] Extend attention to fuse transpose #669

[attention] Extend attention to fuse transpose #669

Comments

antiagainst commented May 10, 2024

antiagainst commented May 22, 2024 • edited Loading

Groverkss commented Jun 4, 2024

raikonenfnu commented Jul 12, 2024

antiagainst commented Jul 12, 2024

raikonenfnu commented Jul 13, 2024

MaheshRavishankar commented Jul 13, 2024

raikonenfnu commented Jul 13, 2024

raikonenfnu commented Jul 13, 2024

antiagainst commented May 22, 2024 •

edited

Loading