[Codegen] Add the `bitcast -> extui` to `shuffle` folding patterns to EmulateNarrowTypes pass. #15102

MaheshRavishankar · 2023-10-04T21:05:52Z

Folding the bitcast -> arith.extui to shuffle seems like worth doing across all backends (all backends support shuffle better). Also add a pattern to push broadcasts past extui-like operations to increase the coverage of cases where this kicks in.

MaheshRavishankar · 2023-10-04T21:06:40Z

Depends on llvm/llvm-project#68257

Towards #15091

qedawkins

This pattern was added to SPIR-V in #15029. If it works might be worth removing the now duplicate call there.

dcaballe · 2023-10-06T22:53:48Z

Would you mind elaborating on what is the code before and after the patch? A bitcast is a no-op and an integer extension should be faster than a shuffle. Do you have the generated asm so that we can understand what is going on?

Similar patterns are also run during conversion to SPIR-V (added in `vector.transfer_read` in the `EmulateNarrowTypes` pass.

MaheshRavishankar · 2023-10-06T23:53:32Z

Would you mind elaborating on what is the code before and after the patch? A bitcast is a no-op and an integer extension should be faster than a shuffle. Do you have the generated asm so that we can understand what is going on?

No vector integer extension operations go through a bad lowering path in LLVM. THere was an issue filed on it I think which led to the shuffle instruction path being built upstream by Nicolas. EDIT: It might be that the sub-byte integer extensions dont work well.

dcaballe · 2023-10-07T00:35:39Z

Do you have a small end-to-end repro that we can use to report this to LLVM? Not sure we want to keep this low level peephole patterns here in the long run. This is pretty low level.

MaheshRavishankar · 2023-10-07T00:45:07Z

See #14914

qedawkins

LGTM. I would wait for confirmation from @dcaballe first as well though.

qedawkins · 2023-10-07T01:07:57Z

compiler/src/iree/compiler/Codegen/SPIRV/ConvertToSPIRVPass.cpp

+  /// load bearing.  Also these patterns are already run during
+  /// `EmulateNarrotType` pass but dont trigger there due to missing support for
+  /// emulation of `vector.transfer_read` in the emulation path. Remove the
+  /// patterns from here after that is done.


darn, thanks for trying.

MaheshRavishankar · 2023-10-07T01:13:48Z

Do you have a small end-to-end repro that we can use to report this to LLVM? Not sure we want to keep this low level peephole patterns here in the long run. This is pretty low level.

Well maybe, but I don't see a harm. Shuffles are basic instructions and probably fairly well supported on all llvm backends/hardware. Reason I moved it here is because even on SPIRV doing shuffles this way works well.
This pass is run on all backends and is done during lowertollvm/NVvm/spirv. I suspect direct lowering of such kinds will grow with time since performance is more important now and we can't always avoid falling into bad spots in llvm.

MaheshRavishankar · 2023-10-07T04:28:03Z

Landing this for now, but happy to make changes after the fact

dcaballe · 2023-10-09T18:17:19Z

I'd have liked to have more time to review this before landing. This is a pretty low level and generic peephole. At least, we should move this to a more generic and later place in the pipeline to make sure that: 1) this also triggers for any potential matches outside of the emulate narrow pass, 2) all the potential matches have been generated (i.e., some high-level vector ops have been lowered) and, 3) these patterns can be extended to support more cases as needed (instead of duplicate the rewrites again for cases outside of the emulate narrow pass). Do you think you could take care of this?

MaheshRavishankar · 2023-10-09T18:35:36Z

I'd have liked to have more time to review this before landing.

Understood. I didnt think there were major issues here, and these are patterns that are know to work well for all backends. I am not actually able to put a finger on the issue that you see here really. So maybe I didnt fully understand the concerns here.
em

This is a pretty low level and generic peephole. At least, we should move this to a more generic and later place in the pipeline to make sure that: 1) this also triggers for any potential matches outside of the emulate narrow pass, 2) all the potential matches have been generated (i.e., some high-level vector ops have been lowered) and, 3) these patterns can be extended to support more cases as needed (instead of duplicate the rewrites again for cases outside of the emulate narrow pass). Do you think you could take care of this?

I am not sure I follow this either.... These should trigger only after the emulate pass inserts the bitcasts needed to match the vector types. Could you clarify a bit more what (2) and (3) are?

dcaballe · 2023-10-16T22:43:38Z

I am not sure I follow this either.... These should trigger only after the emulate pass inserts the bitcasts needed to match the vector types. Could you clarify a bit more what (2) and (3) are?

This assumes that all the bitcast -> extui targeted by this peephole are generated by this pass. My point is that these ops can be generated by any other pass in MLIR now or in the future.
Patterns can (and probably will) be extended to support more combinations of types and they will grow, making micro changes #2 even more likely to happen.

In other words, this is a general peephole optimization so let's putting in a place where it can be genetically applied and extended outside of EmulateNarrowTypes.

MaheshRavishankar · 2023-10-17T01:20:49Z

I am not sure I follow this either.... These should trigger only after the emulate pass inserts the bitcasts needed to match the vector types. Could you clarify a bit more what (2) and (3) are?

This assumes that all the bitcast -> extui targeted by this peephole are generated by this pass. My point is that these ops can be generated by any other pass in MLIR now or in the future.

Patterns can (and probably will) be extended to support more combinations of types and they will grow, making micro changes #2 even more likely to happen.

In other words, this is a general peephole optimization so let's putting in a place where it can be genetically applied and extended outside of EmulateNarrowTypes.

Sure maybe, but for now keeping related things together. If there is a use for this outside of this pass, we can move it to a separate pass. Adding a new pass just for these patterns seems unnecessary right now

MaheshRavishankar requested a review from dcaballe as a code owner October 4, 2023 21:05

qedawkins reviewed Oct 4, 2023

View reviewed changes

MaheshRavishankar force-pushed the add_extui_foldingpatterns branch 2 times, most recently from cb9828b to a591841 Compare October 5, 2023 02:55

MaheshRavishankar requested a review from antiagainst as a code owner October 5, 2023 02:55

MaheshRavishankar force-pushed the add_extui_foldingpatterns branch from a591841 to a30fecf Compare October 5, 2023 04:24

dcaballe requested a review from qcolombet October 6, 2023 22:51

Use extsui/extui folding patterns with emulate narrow types pass.

3b0cdaf

Similar patterns are also run during conversion to SPIR-V (added in `vector.transfer_read` in the `EmulateNarrowTypes` pass.

MaheshRavishankar force-pushed the add_extui_foldingpatterns branch from a30fecf to 3b0cdaf Compare October 6, 2023 23:38

MaheshRavishankar requested a review from qedawkins October 7, 2023 00:55

qedawkins approved these changes Oct 7, 2023

View reviewed changes

Empty commit for poking GH

a1c3e29

MaheshRavishankar merged commit b5bbea2 into iree-org:main Oct 7, 2023

MaheshRavishankar deleted the add_extui_foldingpatterns branch April 13, 2024 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Codegen] Add the `bitcast -> extui` to `shuffle` folding patterns to EmulateNarrowTypes pass. #15102

[Codegen] Add the `bitcast -> extui` to `shuffle` folding patterns to EmulateNarrowTypes pass. #15102

MaheshRavishankar commented Oct 4, 2023

MaheshRavishankar commented Oct 4, 2023

qedawkins left a comment

dcaballe commented Oct 6, 2023

MaheshRavishankar commented Oct 6, 2023 •

edited

Loading

dcaballe commented Oct 7, 2023

MaheshRavishankar commented Oct 7, 2023

qedawkins left a comment

qedawkins Oct 7, 2023

MaheshRavishankar commented Oct 7, 2023

MaheshRavishankar commented Oct 7, 2023

dcaballe commented Oct 9, 2023 •

edited

Loading

MaheshRavishankar commented Oct 9, 2023

dcaballe commented Oct 16, 2023

MaheshRavishankar commented Oct 17, 2023

[Codegen] Add the bitcast -> extui to shuffle folding patterns to EmulateNarrowTypes pass. #15102

[Codegen] Add the bitcast -> extui to shuffle folding patterns to EmulateNarrowTypes pass. #15102

Conversation

MaheshRavishankar commented Oct 4, 2023

MaheshRavishankar commented Oct 4, 2023

qedawkins left a comment

Choose a reason for hiding this comment

dcaballe commented Oct 6, 2023

MaheshRavishankar commented Oct 6, 2023 • edited Loading

dcaballe commented Oct 7, 2023

MaheshRavishankar commented Oct 7, 2023

qedawkins left a comment

Choose a reason for hiding this comment

qedawkins Oct 7, 2023

Choose a reason for hiding this comment

MaheshRavishankar commented Oct 7, 2023

MaheshRavishankar commented Oct 7, 2023

dcaballe commented Oct 9, 2023 • edited Loading

MaheshRavishankar commented Oct 9, 2023

dcaballe commented Oct 16, 2023

MaheshRavishankar commented Oct 17, 2023

[Codegen] Add the `bitcast -> extui` to `shuffle` folding patterns to EmulateNarrowTypes pass. #15102

[Codegen] Add the `bitcast -> extui` to `shuffle` folding patterns to EmulateNarrowTypes pass. #15102

MaheshRavishankar commented Oct 6, 2023 •

edited

Loading

dcaballe commented Oct 9, 2023 •

edited

Loading