-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Codegen] Add the bitcast -> extui
to shuffle
folding patterns to EmulateNarrowTypes pass.
#15102
[Codegen] Add the bitcast -> extui
to shuffle
folding patterns to EmulateNarrowTypes pass.
#15102
Conversation
Depends on llvm/llvm-project#68257 Towards #15091 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern was added to SPIR-V in #15029. If it works might be worth removing the now duplicate call there.
cb9828b
to
a591841
Compare
a591841
to
a30fecf
Compare
Would you mind elaborating on what is the code before and after the patch? A bitcast is a no-op and an integer extension should be faster than a shuffle. Do you have the generated asm so that we can understand what is going on? |
Similar patterns are also run during conversion to SPIR-V (added in `vector.transfer_read` in the `EmulateNarrowTypes` pass.
a30fecf
to
3b0cdaf
Compare
No vector integer extension operations go through a bad lowering path in LLVM. THere was an issue filed on it I think which led to the shuffle instruction path being built upstream by Nicolas. EDIT: It might be that the sub-byte integer extensions dont work well. |
Do you have a small end-to-end repro that we can use to report this to LLVM? Not sure we want to keep this low level peephole patterns here in the long run. This is pretty low level. |
See #14914 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I would wait for confirmation from @dcaballe first as well though.
/// load bearing. Also these patterns are already run during | ||
/// `EmulateNarrotType` pass but dont trigger there due to missing support for | ||
/// emulation of `vector.transfer_read` in the emulation path. Remove the | ||
/// patterns from here after that is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
darn, thanks for trying.
Well maybe, but I don't see a harm. Shuffles are basic instructions and probably fairly well supported on all llvm backends/hardware. Reason I moved it here is because even on SPIRV doing shuffles this way works well. |
Landing this for now, but happy to make changes after the fact |
I'd have liked to have more time to review this before landing. This is a pretty low level and generic peephole. At least, we should move this to a more generic and later place in the pipeline to make sure that: 1) this also triggers for any potential matches outside of the emulate narrow pass, 2) all the potential matches have been generated (i.e., some high-level vector ops have been lowered) and, 3) these patterns can be extended to support more cases as needed (instead of duplicate the rewrites again for cases outside of the emulate narrow pass). Do you think you could take care of this? |
Understood. I didnt think there were major issues here, and these are patterns that are know to work well for all backends. I am not actually able to put a finger on the issue that you see here really. So maybe I didnt fully understand the concerns here. This is a pretty low level and generic peephole. At least, we should move this to a more generic and later place in the pipeline to make sure that: 1) this also triggers for any potential matches outside of the emulate narrow pass, 2) all the potential matches have been generated (i.e., some high-level vector ops have been lowered) and, 3) these patterns can be extended to support more cases as needed (instead of duplicate the rewrites again for cases outside of the emulate narrow pass). Do you think you could take care of this? I am not sure I follow this either.... These should trigger only after the emulate pass inserts the bitcasts needed to match the vector types. Could you clarify a bit more what (2) and (3) are? |
In other words, this is a general peephole optimization so let's putting in a place where it can be genetically applied and extended outside of |
Sure maybe, but for now keeping related things together. If there is a use for this outside of this pass, we can move it to a separate pass. Adding a new pass just for these patterns seems unnecessary right now |
Folding the
bitcast -> arith.extui
toshuffle
seems like worth doing across all backends (all backends support shuffle better). Also add a pattern to push broadcasts pastextui
-like operations to increase the coverage of cases where this kicks in.