[Codegen][GPU] Let integer range optimization narrow GPU computations to i32 #19473

krzysz00 · 2024-12-12T01:08:10Z

Note: This PR is stacked on top of #19372, and so looks bigger than it is. The relevant changes are in the last commit.

Add an option to -iree-util-optimize-int-arithmetic to have it perform computations in i32 where possible, which is enabled when optimizing arithmetic for GPU codegen. This allows LLVM co correctly conclude that various computations don't need to be done at full 64-bit precision, thus saving registers and instructions. (LLVM has some rewrites for this, but they're, for example, gated on only having one use of the potentially-truncated value, which means that shared math stays in an over-wide data type).

benvanik · 2024-12-12T01:14:08Z

compiler/src/iree/compiler/Dialect/Util/Transforms/Passes.td

+      "Flag indicating if computations that can be performed with 32 bits shuld be."
+      " Mainly used for GPU code generation to not waste registers">


style nit: needs a terminating .

…ree-org#19361) This reverts commit cb5be1d. Compaled to the previous revision, this one works around a correctness bug in dataflow analysis that's being fixed by removing the analysis after SCF->CF. --- First, this patch implements InferIntRangeInterface for hal.interface.workgroup.{size,id,count} using a local upper_bound attribute. Then, it adds a -iree-codegen-gpu-propagate-dispatch-size-bounds pass that adds these upper_bounds identifiers to the interface.workgroup operations and to gpu.thread_id based on static information available late in the codegen pipeline. Then, it uses -optimize-int-arithmetic to optimize indexing after -lower-affine, getting rid of a bunch of "if the input's negative" logic that isn't actually needed in many of our kernels. It also ensures that these upper_bound values propagate to LLVM.

benvanik reviewed Dec 12, 2024

View reviewed changes

krzysz00 force-pushed the index-narrowing branch from d9af342 to 562f3ba Compare December 12, 2024 17:01

krzysz00 added 2 commits December 12, 2024 17:12

Let integer range optimizations narrow to i32

4dee634

krzysz00 force-pushed the index-narrowing branch from 562f3ba to 4dee634 Compare December 12, 2024 17:14

Update tests

be116ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Codegen][GPU] Let integer range optimization narrow GPU computations to i32 #19473

[Codegen][GPU] Let integer range optimization narrow GPU computations to i32 #19473

krzysz00 commented Dec 12, 2024

benvanik Dec 12, 2024

		"Flag indicating if computations that can be performed with 32 bits shuld be."
		" Mainly used for GPU code generation to not waste registers">

[Codegen][GPU] Let integer range optimization narrow GPU computations to i32 #19473

Are you sure you want to change the base?

[Codegen][GPU] Let integer range optimization narrow GPU computations to i32 #19473

Conversation

krzysz00 commented Dec 12, 2024

benvanik Dec 12, 2024

Choose a reason for hiding this comment