Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DispatchCreation] Collapse iree_linalg_ext.attention #19012

Merged
merged 4 commits into from
Nov 12, 2024

Conversation

IanWood1
Copy link
Contributor

@IanWood1 IanWood1 commented Nov 4, 2024

This change adds support for attention in CollapseDimensionsPass so that the attention op will be collapsed as much as possible. This is motivated by reducing the different variants of attention that the sdxl attention spec has to handle.

Changes to LinalgExt/Transforms/ReshapeFusion.cpp are mostly taken directly from https://github.com/llvm/llvm-project/blob/002a0a27bc4702d6f34434c1838cb1698a0b0098/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp (attributed at the top of the file). I attempted to keep not modify the original logic as much as possible to keep it general in case it needs to be reused for other LinalgExt ops.

@IanWood1 IanWood1 changed the title [DispatchCreation] Collapse LinalgExt::AttentionOp [DispatchCreation] Collapse iree_linalg_ext.attention Nov 4, 2024
@IanWood1 IanWood1 marked this pull request as ready for review November 4, 2024 20:51
Add support for attention in `CollapseDimensionsPass` so that the
attention op gets collapsed as much as possible. This is motivated by
reducing the different variants of attention that the sdxl attention
spec has to handle.

Signed-off-by: Ian Wood <[email protected]>
Since this pass now handles more than just `linalg.generic` ops. Fix up
the comments and drop references to linalg.generic ops.

Signed-off-by: Ian Wood <[email protected]>
Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good. Lets chat offline for me to get better context on this. Left a few minor comments.


/// Map from iteration domain index in the original op to the iteration domain
/// index in the collapsed op.
SmallVector<std::pair<int64_t, unsigned>> origOpToCollapsedOpIterationDim;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why int64_t and unsigned ?

@hanhanW hanhanW requested a review from Groverkss November 7, 2024 21:43
Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke to Ian offline to get more context. Looks good.

Signed-off-by: Ian Wood <[email protected]>
Signed-off-by: Ian Wood <[email protected]>
@IanWood1 IanWood1 merged commit 2bfc639 into iree-org:main Nov 12, 2024
36 checks passed
Groverkss pushed a commit to Groverkss/iree that referenced this pull request Dec 1, 2024
This change adds support for attention in `CollapseDimensionsPass` so
that the attention op will be collapsed as much as possible. This is
motivated by reducing the different variants of attention that the sdxl
attention spec has to handle.


Changes to LinalgExt/Transforms/ReshapeFusion.cpp are mostly taken
directly from
https://github.com/llvm/llvm-project/blob/002a0a27bc4702d6f34434c1838cb1698a0b0098/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
(attributed at the top of the file). I attempted to keep not modify the
original logic as much as possible to keep it general in case it needs
to be reused for other `LinalgExt` ops.

---------

Signed-off-by: Ian Wood <[email protected]>
giacs-epic pushed a commit to giacs-epic/iree that referenced this pull request Dec 4, 2024
This change adds support for attention in `CollapseDimensionsPass` so
that the attention op will be collapsed as much as possible. This is
motivated by reducing the different variants of attention that the sdxl
attention spec has to handle.

Changes to LinalgExt/Transforms/ReshapeFusion.cpp are mostly taken
directly from
https://github.com/llvm/llvm-project/blob/002a0a27bc4702d6f34434c1838cb1698a0b0098/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
(attributed at the top of the file). I attempted to keep not modify the
original logic as much as possible to keep it general in case it needs
to be reused for other `LinalgExt` ops.

---------

Signed-off-by: Ian Wood <[email protected]>
Signed-off-by: Giacomo Serafini <[email protected]>
IanWood1 added a commit that referenced this pull request Jan 8, 2025
Reland after fixing sdxl int8 regressions via
#19012.

Running CI revealed further performance regressions that have pending
patches: #19325 and
#19326.

This reverts commit 8d3faf8.

---------

Signed-off-by: Ian Wood <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants