Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Codegen] Add the bitcast -> extui to shuffle folding patterns to EmulateNarrowTypes pass. #15102

Merged

Conversation

MaheshRavishankar
Copy link
Contributor

Folding the bitcast -> arith.extui to shuffle seems like worth doing across all backends (all backends support shuffle better). Also add a pattern to push broadcasts past extui-like operations to increase the coverage of cases where this kicks in.

@MaheshRavishankar
Copy link
Contributor Author

Depends on llvm/llvm-project#68257

Towards #15091

Copy link
Contributor

@qedawkins qedawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern was added to SPIR-V in #15029. If it works might be worth removing the now duplicate call there.

@MaheshRavishankar MaheshRavishankar force-pushed the add_extui_foldingpatterns branch 2 times, most recently from cb9828b to a591841 Compare October 5, 2023 02:55
@MaheshRavishankar MaheshRavishankar force-pushed the add_extui_foldingpatterns branch from a591841 to a30fecf Compare October 5, 2023 04:24
@dcaballe dcaballe requested a review from qcolombet October 6, 2023 22:51
@dcaballe
Copy link
Contributor

dcaballe commented Oct 6, 2023

Would you mind elaborating on what is the code before and after the patch? A bitcast is a no-op and an integer extension should be faster than a shuffle. Do you have the generated asm so that we can understand what is going on?

Similar patterns are also run during conversion to SPIR-V (added in
`vector.transfer_read` in the `EmulateNarrowTypes` pass.
@MaheshRavishankar MaheshRavishankar force-pushed the add_extui_foldingpatterns branch from a30fecf to 3b0cdaf Compare October 6, 2023 23:38
@MaheshRavishankar
Copy link
Contributor Author

MaheshRavishankar commented Oct 6, 2023

Would you mind elaborating on what is the code before and after the patch? A bitcast is a no-op and an integer extension should be faster than a shuffle. Do you have the generated asm so that we can understand what is going on?

No vector integer extension operations go through a bad lowering path in LLVM. THere was an issue filed on it I think which led to the shuffle instruction path being built upstream by Nicolas. EDIT: It might be that the sub-byte integer extensions dont work well.

@dcaballe
Copy link
Contributor

dcaballe commented Oct 7, 2023

Do you have a small end-to-end repro that we can use to report this to LLVM? Not sure we want to keep this low level peephole patterns here in the long run. This is pretty low level.

@MaheshRavishankar
Copy link
Contributor Author

See #14914

Copy link
Contributor

@qedawkins qedawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I would wait for confirmation from @dcaballe first as well though.

/// load bearing. Also these patterns are already run during
/// `EmulateNarrotType` pass but dont trigger there due to missing support for
/// emulation of `vector.transfer_read` in the emulation path. Remove the
/// patterns from here after that is done.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

darn, thanks for trying.

@MaheshRavishankar
Copy link
Contributor Author

Do you have a small end-to-end repro that we can use to report this to LLVM? Not sure we want to keep this low level peephole patterns here in the long run. This is pretty low level.

Well maybe, but I don't see a harm. Shuffles are basic instructions and probably fairly well supported on all llvm backends/hardware. Reason I moved it here is because even on SPIRV doing shuffles this way works well.
This pass is run on all backends and is done during lowertollvm/NVvm/spirv. I suspect direct lowering of such kinds will grow with time since performance is more important now and we can't always avoid falling into bad spots in llvm.

@MaheshRavishankar
Copy link
Contributor Author

Landing this for now, but happy to make changes after the fact

@MaheshRavishankar MaheshRavishankar merged commit b5bbea2 into iree-org:main Oct 7, 2023
@dcaballe
Copy link
Contributor

dcaballe commented Oct 9, 2023

I'd have liked to have more time to review this before landing. This is a pretty low level and generic peephole. At least, we should move this to a more generic and later place in the pipeline to make sure that: 1) this also triggers for any potential matches outside of the emulate narrow pass, 2) all the potential matches have been generated (i.e., some high-level vector ops have been lowered) and, 3) these patterns can be extended to support more cases as needed (instead of duplicate the rewrites again for cases outside of the emulate narrow pass). Do you think you could take care of this?

@MaheshRavishankar
Copy link
Contributor Author

I'd have liked to have more time to review this before landing.

Understood. I didnt think there were major issues here, and these are patterns that are know to work well for all backends. I am not actually able to put a finger on the issue that you see here really. So maybe I didnt fully understand the concerns here.
em

This is a pretty low level and generic peephole. At least, we should move this to a more generic and later place in the pipeline to make sure that: 1) this also triggers for any potential matches outside of the emulate narrow pass, 2) all the potential matches have been generated (i.e., some high-level vector ops have been lowered) and, 3) these patterns can be extended to support more cases as needed (instead of duplicate the rewrites again for cases outside of the emulate narrow pass). Do you think you could take care of this?

I am not sure I follow this either.... These should trigger only after the emulate pass inserts the bitcasts needed to match the vector types. Could you clarify a bit more what (2) and (3) are?

@dcaballe
Copy link
Contributor

I am not sure I follow this either.... These should trigger only after the emulate pass inserts the bitcasts needed to match the vector types. Could you clarify a bit more what (2) and (3) are?

  1. This assumes that all the bitcast -> extui targeted by this peephole are generated by this pass. My point is that these ops can be generated by any other pass in MLIR now or in the future.

  2. Patterns can (and probably will) be extended to support more combinations of types and they will grow, making micro changes #2 even more likely to happen.

In other words, this is a general peephole optimization so let's putting in a place where it can be genetically applied and extended outside of EmulateNarrowTypes.

@MaheshRavishankar
Copy link
Contributor Author

I am not sure I follow this either.... These should trigger only after the emulate pass inserts the bitcasts needed to match the vector types. Could you clarify a bit more what (2) and (3) are?

  1. This assumes that all the bitcast -> extui targeted by this peephole are generated by this pass. My point is that these ops can be generated by any other pass in MLIR now or in the future.

  2. Patterns can (and probably will) be extended to support more combinations of types and they will grow, making micro changes #2 even more likely to happen.

In other words, this is a general peephole optimization so let's putting in a place where it can be genetically applied and extended outside of EmulateNarrowTypes.

Sure maybe, but for now keeping related things together. If there is a use for this outside of this pass, we can move it to a separate pass. Adding a new pass just for these patterns seems unnecessary right now

@MaheshRavishankar MaheshRavishankar deleted the add_extui_foldingpatterns branch April 13, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants