-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DispatchCreation] Collapse iree_linalg_ext.attention
#19012
Conversation
iree_linalg_ext.attention
Add support for attention in `CollapseDimensionsPass` so that the attention op gets collapsed as much as possible. This is motivated by reducing the different variants of attention that the sdxl attention spec has to handle. Signed-off-by: Ian Wood <[email protected]>
Since this pass now handles more than just `linalg.generic` ops. Fix up the comments and drop references to linalg.generic ops. Signed-off-by: Ian Wood <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good. Lets chat offline for me to get better context on this. Left a few minor comments.
|
||
/// Map from iteration domain index in the original op to the iteration domain | ||
/// index in the collapsed op. | ||
SmallVector<std::pair<int64_t, unsigned>> origOpToCollapsedOpIterationDim; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why int64_t
and unsigned
?
compiler/src/iree/compiler/Dialect/LinalgExt/Transforms/ReshapeFusion.cpp
Show resolved
Hide resolved
compiler/src/iree/compiler/DispatchCreation/test/collapse_dimensions.mlir
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke to Ian offline to get more context. Looks good.
Signed-off-by: Ian Wood <[email protected]>
Signed-off-by: Ian Wood <[email protected]>
This change adds support for attention in `CollapseDimensionsPass` so that the attention op will be collapsed as much as possible. This is motivated by reducing the different variants of attention that the sdxl attention spec has to handle. Changes to LinalgExt/Transforms/ReshapeFusion.cpp are mostly taken directly from https://github.com/llvm/llvm-project/blob/002a0a27bc4702d6f34434c1838cb1698a0b0098/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp (attributed at the top of the file). I attempted to keep not modify the original logic as much as possible to keep it general in case it needs to be reused for other `LinalgExt` ops. --------- Signed-off-by: Ian Wood <[email protected]>
This change adds support for attention in `CollapseDimensionsPass` so that the attention op will be collapsed as much as possible. This is motivated by reducing the different variants of attention that the sdxl attention spec has to handle. Changes to LinalgExt/Transforms/ReshapeFusion.cpp are mostly taken directly from https://github.com/llvm/llvm-project/blob/002a0a27bc4702d6f34434c1838cb1698a0b0098/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp (attributed at the top of the file). I attempted to keep not modify the original logic as much as possible to keep it general in case it needs to be reused for other `LinalgExt` ops. --------- Signed-off-by: Ian Wood <[email protected]> Signed-off-by: Giacomo Serafini <[email protected]>
Reland after fixing sdxl int8 regressions via #19012. Running CI revealed further performance regressions that have pending patches: #19325 and #19326. This reverts commit 8d3faf8. --------- Signed-off-by: Ian Wood <[email protected]>
This change adds support for attention in
CollapseDimensionsPass
so that the attention op will be collapsed as much as possible. This is motivated by reducing the different variants of attention that the sdxl attention spec has to handle.Changes to LinalgExt/Transforms/ReshapeFusion.cpp are mostly taken directly from https://github.com/llvm/llvm-project/blob/002a0a27bc4702d6f34434c1838cb1698a0b0098/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp (attributed at the top of the file). I attempted to keep not modify the original logic as much as possible to keep it general in case it needs to be reused for other
LinalgExt
ops.