-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[gpu] Use clustered gpu.subgroup_reduce for nested layout distribution #18515
[gpu] Use clustered gpu.subgroup_reduce for nested layout distribution #18515
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! There seems to be a failure in some tests because of this patch. Looks like you forgot to add a lowering for this op to gpu.shuffle ops?
failed to legalize operation 'gpu.subgroup_reduce' that was explicitly marked illegal
compiler/src/iree/compiler/Codegen/Common/GPU/GPUNestedLayoutDistributionPatterns.cpp
Outdated
Show resolved
Hide resolved
efa3bcd
to
21c8d62
Compare
When I saw that failure, my gut feeling was that the ROCm pipelines just don't have the subgroup->shuffle lowering patterns, because previously they didn't need them, as they could always use the lowering to a dedicated op. If so, the problem is that the dedicated op lowering is now partial (can't be used when there's clusters). I might be wrong about this and will need to look closer at it soon. If you have any hints related to this I'd be grateful. I'm assuming I need to add the upstream shuffle lowering patterns somewhere in some pipeline. |
compiler/src/iree/compiler/Codegen/Common/GPU/GPUNestedLayoutDistributionPatterns.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUNestedLayoutDistributionPatterns.cpp
Outdated
Show resolved
Hide resolved
Right, I think this is the case. We should use patterns that expand these into primitive shuffles etc. and add to the rocm lowering pipeline. |
36973a5
to
2093968
Compare
e9c8ccd
to
6f86e2c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to land once integrate pulls in the needed commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a basic LIT test to check that the subgroup size gets correctly passed on to the subgroup expansion patterns?
compiler/src/iree/compiler/Codegen/Common/GPU/GPUNestedLayoutDistributionPatterns.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUNestedLayoutDistributionPatterns.cpp
Outdated
Show resolved
Hide resolved
6f86e2c
to
73c6de6
Compare
What should it look like? I think some kind of integration test might work, but if it's just testing the pass in isolation I'm not sure it'll be useful; it's the presence and accuracy of particular attributes that I think would cause problems, not whether it manages to read them. |
I'd dump the IR just before the pass, strip away everything unrelated and run it in isolation, once with subgroup size 32 and once with 64, expecting a different number of shuffles |
The number of shuffles would be the same in both cases, because only subgroup reductions with a specified cluster size are being lowered. The subgroup size is basically just for the sake of error-checking in this case. |
Ah, OK then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, let's wait for the mlir commit to propagate to iree
73c6de6
to
25c1a41
Compare
There is now support in MLIR for expressing a subgroup reduction operation that operates on several "clusters" in parallel, so it is no longer necessary to build a series of shuffles. It has been verified that, at least if the upstream patterns are used, the resulting sequence of shuffles is the same as the old code. This commit also adds a new pass, ExpandGPUOps, which uses the upstream patterns to expand these ops, and adds it to the LLVMGPU pass list. Resolves iree-org#18142. Signed-off-by: Andrea Faulds <[email protected]>
25c1a41
to
4b984f1
Compare
New LLVM integrate has landed, I've rebased this on main now and removed the LLVM cherry-picks commit, so let's see if CI is happy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There is now support in MLIR for expressing a subgroup reduction operation that operates on several "clusters" in parallel, so it is no longer necessary to build a series of shuffles.
It has been verified that, at least if the upstream patterns are used, the resulting sequence of shuffles is the same as the old code.
This commit also adds a new pass, ExpandGPUOps, which uses the upstream patterns to expand these ops, and adds it to the LLVMGPU pass list.
Resolves #18142.