Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Codegen][GPU] Let integer range optimization narrow GPU computations to i32 #19473

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

krzysz00
Copy link
Contributor

Note: This PR is stacked on top of #19372, and so looks bigger than it is. The relevant changes are in the last commit.

Add an option to -iree-util-optimize-int-arithmetic to have it perform computations in i32 where possible, which is enabled when optimizing arithmetic for GPU codegen. This allows LLVM co correctly conclude that various computations don't need to be done at full 64-bit precision, thus saving registers and instructions. (LLVM has some rewrites for this, but they're, for example, gated on only having one use of the potentially-truncated value, which means that shared math stays in an over-wide data type).

Comment on lines +72 to +73
"Flag indicating if computations that can be performed with 32 bits shuld be."
" Mainly used for GPU code generation to not waste registers">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: needs a terminating .

…ree-org#19361)

This reverts commit cb5be1d.

Compaled to the previous revision, this one works around a correctness
bug in dataflow analysis that's being fixed by removing the analysis
after SCF->CF.

---

First, this patch implements InferIntRangeInterface for
hal.interface.workgroup.{size,id,count} using a local upper_bound
attribute.

Then, it adds a -iree-codegen-gpu-propagate-dispatch-size-bounds pass
that adds these upper_bounds identifiers to the interface.workgroup
operations and to gpu.thread_id based on static information available
late in the codegen pipeline.

Then, it uses -optimize-int-arithmetic to optimize indexing after
-lower-affine, getting rid of a bunch of "if the input's negative" logic
that isn't actually needed in many of our kernels.

It also ensures that these upper_bound values propagate to LLVM.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants