Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Codegen] Resolve constantOp with multiple layouts users. #11

Open
wants to merge 4 commits into
base: anchorToLayoutOp
Choose a base branch
from

Conversation

raikonenfnu
Copy link
Owner

@raikonenfnu raikonenfnu commented Aug 21, 2024

Main motivation is to handle distribution of constantOp who has users
with different layouts.

Original use case is to ensure we can distribute attention when the tile
size for M,K1,N is the same. Which means the init of 1st contract, and
IV's init uses the same constantOp.

Since constantOp can only hold a single layout, but multiple to_layout ops
with different layouts, for each user, there will be non resolved to_layout
op(s). only one of the to_layout op can be resolved properly, the rest would
be a "non trivial" resolution since layouts are different.

To solve this issue, we introduce a mechanism that detect these cases
and make a copy of the arith.constant that get used by other users, when
we are trying to resolve for the current constantOp.

…out.

The motivation behind this PR was to solve the issue where despite us
emitting to_layout op at certain locations, we ended up generating
redundant to_simd earlier in the graph. This results us to not have
control on where the layout conversion actually happens.

One example of this case is when we emit a to_layout conversion for FP8
Attention. We emit a to_layout right before the 2nd contract and right
after the truncf to FP8. Since we do enforcement(backward propagation)
first and then propagate (forward propagation), we ended up generating
two layout conversions, one the original one we emitted, the second one
determined by layout analysis and placed in the middle of a elemwise op
that is part of the softmax.

This forces us to do shuffles/layout resolution in FP32 which hurts the
performance. Additionally, this is also not where we(compiler writers)
intended the layout conversion to happen.

To solve this issue, we re-ordered the layout analysis to do forward
propagation first and then enforcement(backward propagation). This gives
layout set by forward propagation precedence, which makes the to_layout
emitting more intuitive and behave more as expected.

Additionally, we made modification to getAgreedLayout to handle elemwise
propagation where some operands are not expected to have layout such as
arith.select who's condition operand probably don't have layouts.

Signed-off-by: Stanley Winata <[email protected]>
Signed-off-by: Stanley Winata <[email protected]>
Main motivation is to handle distribution of constantOp who has users
with different layouts.

Original use case is to ensure we can distribute attention when the tile
size for M,K1,N is the same. Which means the init of 1st contract, and
IV's init uses the same constantOp.

Since constantOp can only hold a single layout, but multiple to_layout ops
with different layouts,  for each user, there will be non resolved to_layout
op(s). only one of the to_layout op can be  resolved properly, the rest would
be a "non trivial" resolution since layouts are different.

To solve this issue, we introduce a mechanism that detect these cases
and make a copy of the arith.constant that get used by other users, when
we are trying to resolve for the current constantOp.

Signed-off-by: Stanley Winata <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant