[gpu] Use clustered gpu.subgroup_reduce for nested layout distribution #18515

andfau-amd · 2024-09-12T20:23:29Z

There is now support in MLIR for expressing a subgroup reduction operation that operates on several "clusters" in parallel, so it is no longer necessary to build a series of shuffles.

It has been verified that, at least if the upstream patterns are used, the resulting sequence of shuffles is the same as the old code.

This commit also adds a new pass, ExpandGPUOps, which uses the upstream patterns to expand these ops, and adds it to the LLVMGPU pass list.

Resolves #18142.

Groverkss

LGTM! There seems to be a failure in some tests because of this patch. Looks like you forgot to add a lowering for this op to gpu.shuffle ops?

failed to legalize operation 'gpu.subgroup_reduce' that was explicitly marked illegal

compiler/src/iree/compiler/Codegen/Common/GPU/GPUNestedLayoutDistributionPatterns.cpp

andfau-amd · 2024-09-13T14:04:28Z

There seems to be a failure in some tests because of this patch. Looks like you forgot to add a lowering for this op to gpu.shuffle ops?

When I saw that failure, my gut feeling was that the ROCm pipelines just don't have the subgroup->shuffle lowering patterns, because previously they didn't need them, as they could always use the lowering to a dedicated op. If so, the problem is that the dedicated op lowering is now partial (can't be used when there's clusters). I might be wrong about this and will need to look closer at it soon. If you have any hints related to this I'd be grateful. I'm assuming I need to add the upstream shuffle lowering patterns somewhere in some pipeline.

compiler/src/iree/compiler/Codegen/Common/GPU/GPUNestedLayoutDistributionPatterns.cpp

kuhar · 2024-09-13T16:30:56Z

my gut feeling was that the ROCm pipelines just don't have the subgroup->shuffle lowering patterns, because previously they didn't need them

Right, I think this is the case. We should use patterns that expand these into primitive shuffles etc. and add to the rocm lowering pipeline.

qedawkins

Looks good to land once integrate pulls in the needed commits.

.gitmodules

compiler/src/iree/compiler/Codegen/Common/GPU/ExpandGPUOps.cpp

kuhar

Should we have a basic LIT test to check that the subgroup size gets correctly passed on to the subgroup expansion patterns?

compiler/src/iree/compiler/Codegen/Common/GPU/ExpandGPUOps.cpp

compiler/src/iree/compiler/Codegen/Common/GPU/GPUNestedLayoutDistributionPatterns.cpp

compiler/src/iree/compiler/Codegen/Utils/GPUUtils.h

andfau-amd · 2024-09-18T19:53:55Z

Should we have a basic LIT test to check that the subgroup size gets correctly passed on to the subgroup expansion patterns?

What should it look like? I think some kind of integration test might work, but if it's just testing the pass in isolation I'm not sure it'll be useful; it's the presence and accuracy of particular attributes that I think would cause problems, not whether it manages to read them.

kuhar · 2024-09-18T19:58:41Z

Should we have a basic LIT test to check that the subgroup size gets correctly passed on to the subgroup expansion patterns?

What should it look like? I think some kind of integration test might work, but if it's just testing the pass in isolation I'm not sure it'll be useful; it's the presence and accuracy of particular attributes that I think would cause problems, not whether it manages to read them.

I'd dump the IR just before the pass, strip away everything unrelated and run it in isolation, once with subgroup size 32 and once with 64, expecting a different number of shuffles

andfau-amd · 2024-09-18T20:02:29Z

The number of shuffles would be the same in both cases, because only subgroup reductions with a specified cluster size are being lowered. The subgroup size is basically just for the sake of error-checking in this case.

kuhar · 2024-09-18T20:12:58Z

Ah, OK then

kuhar

looks good, let's wait for the mlir commit to propagate to iree

compiler/src/iree/compiler/Codegen/Common/GPU/ExpandGPUOps.cpp

There is now support in MLIR for expressing a subgroup reduction operation that operates on several "clusters" in parallel, so it is no longer necessary to build a series of shuffles. It has been verified that, at least if the upstream patterns are used, the resulting sequence of shuffles is the same as the old code. This commit also adds a new pass, ExpandGPUOps, which uses the upstream patterns to expand these ops, and adds it to the LLVMGPU pass list. Resolves iree-org#18142. Signed-off-by: Andrea Faulds <[email protected]>

andfau-amd · 2024-09-23T16:24:23Z

New LLVM integrate has landed, I've rebased this on main now and removed the LLVM cherry-picks commit, so let's see if CI is happy.

kuhar

LGTM

andfau-amd requested review from antiagainst, qedawkins and MaheshRavishankar as code owners September 12, 2024 20:23

andfau-amd requested a review from Groverkss September 12, 2024 20:23

andfau-amd mentioned this pull request Sep 12, 2024

[GPU] Clustered Subgroup Reduction #18142

Closed

Groverkss approved these changes Sep 13, 2024

View reviewed changes

compiler/src/iree/compiler/Codegen/Common/GPU/GPUNestedLayoutDistributionPatterns.cpp Outdated Show resolved Hide resolved

andfau-amd force-pushed the 18142-clustered-subgroup-reduce-integration branch from efa3bcd to 21c8d62 Compare September 13, 2024 13:59

kuhar reviewed Sep 13, 2024

View reviewed changes

compiler/src/iree/compiler/Codegen/Common/GPU/GPUNestedLayoutDistributionPatterns.cpp Outdated Show resolved Hide resolved

compiler/src/iree/compiler/Codegen/Common/GPU/GPUNestedLayoutDistributionPatterns.cpp Outdated Show resolved Hide resolved

andfau-amd force-pushed the 18142-clustered-subgroup-reduce-integration branch from 36973a5 to 2093968 Compare September 18, 2024 19:00

andfau-amd requested review from ScottTodd and stellaraccident as code owners September 18, 2024 19:00

andfau-amd force-pushed the 18142-clustered-subgroup-reduce-integration branch 2 times, most recently from e9c8ccd to 6f86e2c Compare September 18, 2024 19:05

qedawkins approved these changes Sep 18, 2024

View reviewed changes

.gitmodules Outdated Show resolved Hide resolved

compiler/src/iree/compiler/Codegen/Common/GPU/ExpandGPUOps.cpp Outdated Show resolved Hide resolved

kuhar reviewed Sep 18, 2024

View reviewed changes

andfau-amd force-pushed the 18142-clustered-subgroup-reduce-integration branch from 6f86e2c to 73c6de6 Compare September 18, 2024 19:47

kuhar reviewed Sep 18, 2024

View reviewed changes

compiler/src/iree/compiler/Codegen/Common/GPU/ExpandGPUOps.cpp Outdated Show resolved Hide resolved

andfau-amd force-pushed the 18142-clustered-subgroup-reduce-integration branch from 73c6de6 to 25c1a41 Compare September 18, 2024 20:16

andfau-amd force-pushed the 18142-clustered-subgroup-reduce-integration branch from 25c1a41 to 4b984f1 Compare September 23, 2024 16:23

andfau-amd requested review from kuhar and Groverkss September 23, 2024 16:25

kuhar approved these changes Sep 23, 2024

View reviewed changes

kuhar merged commit c0909a4 into iree-org:main Sep 23, 2024
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[gpu] Use clustered gpu.subgroup_reduce for nested layout distribution #18515

[gpu] Use clustered gpu.subgroup_reduce for nested layout distribution #18515

andfau-amd commented Sep 12, 2024 •

edited

Loading

Groverkss left a comment •

edited

Loading

andfau-amd commented Sep 13, 2024 •

edited

Loading

kuhar commented Sep 13, 2024

qedawkins left a comment

kuhar left a comment

andfau-amd commented Sep 18, 2024

kuhar commented Sep 18, 2024

andfau-amd commented Sep 18, 2024

kuhar commented Sep 18, 2024

kuhar left a comment

andfau-amd commented Sep 23, 2024

kuhar left a comment

[gpu] Use clustered gpu.subgroup_reduce for nested layout distribution #18515

[gpu] Use clustered gpu.subgroup_reduce for nested layout distribution #18515

Conversation

andfau-amd commented Sep 12, 2024 • edited Loading

Groverkss left a comment • edited Loading

Choose a reason for hiding this comment

andfau-amd commented Sep 13, 2024 • edited Loading

kuhar commented Sep 13, 2024

qedawkins left a comment

Choose a reason for hiding this comment

kuhar left a comment

Choose a reason for hiding this comment

andfau-amd commented Sep 18, 2024

kuhar commented Sep 18, 2024

andfau-amd commented Sep 18, 2024

kuhar commented Sep 18, 2024

kuhar left a comment

Choose a reason for hiding this comment

andfau-amd commented Sep 23, 2024

kuhar left a comment

Choose a reason for hiding this comment

andfau-amd commented Sep 12, 2024 •

edited

Loading

Groverkss left a comment •

edited

Loading

andfau-amd commented Sep 13, 2024 •

edited

Loading