-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[attention] Extend attention to fuse transpose #669
Comments
Update 5/22: patch iree-org/iree#17408 out; needing review. |
Plan to finish it this week (Before Jun 7): 4 Jun: Land online attention (iree-org/iree#17536) |
Hey guys, quick update
Once 2. and 3. and iree-org/iree@d2ca774 is landed on main, we should be able to handle/compile fused attn-transpose. |
Awesome. All 3 pull requests are in. Can you send out the last piece? |
Hey Lei, I think @MaheshRavishankar is en route to pushing that one in! :) |
I can send it in early next week. |
I also pushed up/updated the spec mlir to find k2 correctly (link). I tested compiling on the fusion-preprocessing test MLIR (here) and was able to get a vmfb out. The gist above is slightly different from the test in where we make the scale constant here. It fails on vector distribution if scale is not constant. compile command:
|
FYI I also tested the attention-transpose-fusion vmfb numerics on normal random numbers (0.0, 1.0) against torch, seems like we have good numerics there :) Starting IR, compile command, data generator can all be found in https://gist.github.com/raikonenfnu/973b4d91e4378702ce4b4496d732cb57 Needed to update the shape from the original fusion-preprocessing test a little bit since the fastest dim for Q,K,V needs to be the same to run on pytorch. |
No description provided.
The text was updated successfully, but these errors were encountered: