Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply "Propagate reshapes through generics with reduction… (#18968) #19113

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

IanWood1
Copy link
Contributor

@IanWood1 IanWood1 commented Nov 12, 2024

Reland after fixing sdxl int8 regressions via #19012.

Running CI revealed further performance regressions that have pending patches: #19325 and #19326.

This reverts commit 8d3faf8.

@IanWood1 IanWood1 marked this pull request as ready for review November 12, 2024 19:29
@IanWood1 IanWood1 requested review from nithinsubbiah and removed request for ScottTodd November 12, 2024 19:29
hanhanW
hanhanW previously approved these changes Nov 12, 2024
@IanWood1
Copy link
Contributor Author

Waiting to rebase on top of #19088, then will land

@MaheshRavishankar
Copy link
Contributor

Am I reading this right? The dispatch count seems to have regressed?

@hanhanW
Copy link
Contributor

hanhanW commented Nov 14, 2024

Am I reading this right? The dispatch count seems to have regressed?

246 -> 245 looks like an improvment?

image

@IanWood1
Copy link
Contributor Author

Am I reading this right? The dispatch count seems to have regressed?

246 -> 245 looks like an improvment?
image

But the new benchmarks seem to be regressing. I'm not sure why since there was no effect on the old unet benchs

@MaheshRavishankar
Copy link
Contributor

Try rebasing

@IanWood1 IanWood1 force-pushed the reland_prop_through_reduction branch from 6383ab6 to 4765737 Compare November 14, 2024 03:39
@IanWood1
Copy link
Contributor Author

Try rebasing

I'll try that. I was getting some weird dispatch count results when testing locally with main

@IanWood1
Copy link
Contributor Author

I think it is all coming from:

%extracted_slice_237 = tensor.extract_slice %152[0, 0, 0] [2, 4096, 2560] [1, 1, 1] : tensor<2x4096x5120xf16> to tensor<2x4096x2560xf16>
%extracted_slice_238 = tensor.extract_slice %152[0, 0, 2560] [2, 4096, 2560] [1, 1, 1] : tensor<2x4096x5120xf16> to tensor<2x4096x2560xf16>
%expanded_239 = tensor.expand_shape %extracted_slice_237 [[0], [1, 2], [3]] output_shape [2, 64, 64, 2560] : tensor<2x4096x2560xf16> into tensor<2x64x64x2560xf16>
%expanded_240 = tensor.expand_shape %extracted_slice_238 [[0], [1, 2], [3]] output_shape [2, 64, 64, 2560] : tensor<2x4096x2560xf16> into tensor<2x64x64x2560xf16>

Which could be cloned into the dispatch if the expand_shape wasn't blocking

@IanWood1 IanWood1 force-pushed the reland_prop_through_reduction branch 2 times, most recently from 62bdaef to 98aa2a5 Compare November 18, 2024 20:32
@IanWood1
Copy link
Contributor Author

IanWood1 commented Nov 19, 2024

Looks like there is some bad codegen when reductions aren't collapsed:

  1. Slow dispatch -- expanded ~164us
  2. Fast dispatch -- collapsed ~ 79us

@MaheshRavishankar
Copy link
Contributor

That's interesting, but shouldn't the collapse dimension fold those back?

@IanWood1 IanWood1 force-pushed the reland_prop_through_reduction branch from 8409122 to ef99f8d Compare November 19, 2024 03:36
@IanWood1
Copy link
Contributor Author

IanWood1 commented Nov 19, 2024

That's interesting, but shouldn't the collapse dimension fold those back?

Yeah, I had to change it a bit to make isEligibleForCollapse less restrictive, as well as modifying getCollapsibleLoops so that it considered the results of all indexing_maps (not just the first one). Before force pushing, it seemed to regain perf (I just fixed the failing lit tests). hopefully this resolves all perf problems🤞

@IanWood1 IanWood1 requested a review from hanhanW November 27, 2024 18:19
@IanWood1
Copy link
Contributor Author

@hanhanW would you be able to look at this again? Only the latest commit is new since last review

Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanhanW would you be able to look at this again? Only the latest commit is new since last review

The PR title and description need to be updated. It is not just reapply a reverted commit. btw, should we create a separate PR for the last commit?

Comment on lines 64 to 72
if (llvm::any_of(origType.getShape(), ShapedType::isDynamic) ||
llvm::any_of(extractedType.getShape(), ShapedType::isDynamic) ||
llvm::any_of(expandedType.getShape(), ShapedType::isDynamic)) {
return failure();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use !xxxType.hasStaticShape().

Comment on lines 652 to 653


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove one blank line

Comment on lines 208 to 221
genericOp.getDpsInputOperands(), [&](OpOperand *operand) -> bool {
auto genericOperand =
operand->get().getDefiningOp<linalg::GenericOp>();
if (!genericOperand) {
return false;
}

if (genericOperand.getNumReductionLoops() == 0) {
return false;
}

auto map = genericOp.getMatchingIndexingMap(operand);
return !map.isPermutation() && map.isProjectedPermutation();
})) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional]: It'd be easier if you declare a lambda/function for this.

@hanhanW hanhanW dismissed their stale review November 27, 2024 18:37

remove my approval because of new changes.

@IanWood1
Copy link
Contributor Author

@hanhanW would you be able to look at this again? Only the latest commit is new since last review

The PR title and description need to be updated. It is not just reapply a reverted commit. btw, should we create a separate PR for the last commit?

That makes sense, this change had several perf regressions that needed to get fixed. Would you like me to create a new PR with the changes (with your comments applied) and then after landing I would rebase and force push so that this only contains the revert commit?

@hanhanW
Copy link
Contributor

hanhanW commented Nov 27, 2024

That makes sense, this change had several perf regressions that needed to get fixed. Would you like me to create a new PR with the changes (with your comments applied) and then after landing I would rebase and force push so that this only contains the revert commit?

Yes, that'd be great; it makes the codebase state and commit tracking better!

@IanWood1
Copy link
Contributor Author

IanWood1 commented Nov 27, 2024

I created 2 new PRs for each of the separate fixes and updated the description of this PR. I'll rebase this branch after those have landed so that this only serves as a revert.

Before force pushing, the previous HEAD was ef99f8d

IanWood1 added a commit that referenced this pull request Dec 3, 2024
This is the 1/2 changes needed to reland
#18857 (with an open PR
#19113).


Adds pattern to bubble up expand shape through extract slice. i.e
`expand(extract)` to `extract(expand)`. This only supports the case
where the expanded dimensions are not modified by the extract slice and
there are no dynamic dimensions.

This is important because `tensor.expand_shape` ops _cannot be cloned_
while `tensor.extract_slice` ops _can be cloned_. So, if the
`expand_shape` gets stuck on the bottom of the `extract_slice` it will
block it from being cloned and the `extract_slice` will have to be put
into its own dispatch.

---------

Signed-off-by: Ian Wood <[email protected]>
giacs-epic pushed a commit to giacs-epic/iree that referenced this pull request Dec 4, 2024
…#19325)

This is the 1/2 changes needed to reland
iree-org#18857 (with an open PR
iree-org#19113).

Adds pattern to bubble up expand shape through extract slice. i.e
`expand(extract)` to `extract(expand)`. This only supports the case
where the expanded dimensions are not modified by the extract slice and
there are no dynamic dimensions.

This is important because `tensor.expand_shape` ops _cannot be cloned_
while `tensor.extract_slice` ops _can be cloned_. So, if the
`expand_shape` gets stuck on the bottom of the `extract_slice` it will
block it from being cloned and the `extract_slice` will have to be put
into its own dispatch.

---------

Signed-off-by: Ian Wood <[email protected]>
Signed-off-by: Giacomo Serafini <[email protected]>
IanWood1 added a commit that referenced this pull request Dec 5, 2024
This is the 2/2 changes needed to reland
#18857 (open PR
#19113).


There are 2 small subchanges in this pr:
- Refactor to lambda `isPossiblySoftmax` and tweak the condition to
check for operand indexing maps that are projected permutations but not
permutations.
- Extend `getCollapsibleLoops` look at all operations to get contiguous
loops.

Also, I added test case that needed both these changes to pass (without
being collapsed it was causing regressions on the GPU backend).

---------

Signed-off-by: Ian Wood <[email protected]>
@IanWood1 IanWood1 force-pushed the reland_prop_through_reduction branch 2 times, most recently from 044da91 to 1a747d5 Compare December 5, 2024 21:27
@IanWood1 IanWood1 force-pushed the reland_prop_through_reduction branch from 1a747d5 to e609b15 Compare December 7, 2024 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants