Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fusion] Fold unit dims of globals #756

Closed
MaheshRavishankar opened this issue Jun 27, 2024 · 0 comments
Closed

[fusion] Fold unit dims of globals #756

MaheshRavishankar opened this issue Jun 27, 2024 · 0 comments
Assignees
Labels
sdxl-int8 Issues replated to SDXL quantized model support

Comments

@MaheshRavishankar
Copy link

MaheshRavishankar commented Jun 27, 2024

During compilation of quantized SDXL model there are artifacts of this form

  %extracted_slice_219 = tensor.extract_slice %236[0, 0, 0] [2, 4096, 2560] [1, 1, 1] : tensor<2x4096x5120xf16> to tensor<2x4096x2560xf16>
  %extracted_slice_220 = tensor.extract_slice %236[0, 0, 2560] [2, 4096, 2560] [1, 1, 1] : tensor<2x4096x5120xf16> to tensor<2x4096x2560xf16>
  %expanded_221 = tensor.expand_shape %extracted_slice_219 [[0], [1], [2, 3, 4]] output_shape [2, 4096, 1, 1, 2560] : tensor<2x4096x2560xf16> into tensor<2x4096x1x1x2560xf16>
  %expanded_222 = tensor.expand_shape %extracted_slice_220 [[0], [1], [2, 3, 4]] output_shape [2, 4096, 1, 1, 2560] : tensor<2x4096x2560xf16> into tensor<2x4096x1x1x2560xf16>
  %237 = tensor.empty() : tensor<2x4096x1x1x2560xi8>
  %238 = flow.dispatch.region -> (tensor<2x4096x1x1x2560xi8>) {
    %5295 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3, d4)>, affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3, d4)>, affine_map<(d0, d1, d2, d3, d4) -> (d2, d3, d4)\>, affine_map<(d0, d1, d2, d3, d4) -> ()>, affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3, d4)>], iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel"]} ins(%expanded_221, %expande\d_222, %__auto.down_blocks.1.attentions.0.transformer_blocks.0.ff.net.2.premul_input, %__auto.down_blocks.1.attentions.0.transformer_blocks.0.ff.net.2.q_input3Ascale : tensor<2x4096x1x1x2560xf16>, tensor<2x40\96x1x1x2560xf16>, tensor<1x1x2560xf16>, tensor<f32>) outs(%237 : tensor<2x4096x1x1x2560xi8>) {

Here we would like to fuse the extract_slice with its consumer. That gets blocked due to the presence of the tensor.expand_shape in between the slice and its use in the dispatch. The real reason why these epxand shapes exist is because after FoldUnitDims, the unit dimensions in the globals like %__auto.down_blocks.1.attentions.0.transformer_blocks.0.ff.net.2.premul_input and %__auto.down_blocks.1.attentions.0.transformer_blocks.0.ff.net.2.q_input3Ascale dont get folded away. This results in the collapse_shapes not fully folding away, which later propagation passes pick up.
While this could be accounted for during the propagation passes, it is also worth just folding the unit dimensions in the global variables away.

@MaheshRavishankar MaheshRavishankar converted this from a draft issue Jun 27, 2024
@MaheshRavishankar MaheshRavishankar added the sdxl-int8 Issues replated to SDXL quantized model support label Jun 27, 2024
IanWood1 added a commit to iree-org/iree that referenced this issue Jul 3, 2024
Currently reverting
[7884dc8](7884dc8)
to test regressions (there were problems with llama). Issue here
nod-ai/SHARK-ModelDev#756

Couldn't reproduce the issue with llama yet. It might be best to land
this since the unit dims should be folded in general, it just doesn't
play well with this model in particular.

Signed-off-by: Ian Wood <[email protected]>
@IanWood1 IanWood1 closed this as completed Jul 8, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in Turbine: SDXL on CDNA Jul 8, 2024
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this issue Jul 30, 2024
Currently reverting
[7884dc8](iree-org@7884dc8)
to test regressions (there were problems with llama). Issue here
nod-ai/SHARK-ModelDev#756

Couldn't reproduce the issue with llama yet. It might be best to land
this since the unit dims should be folded in general, it just doesn't
play well with this model in particular.

Signed-off-by: Ian Wood <[email protected]>
Signed-off-by: Lubo Litchev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sdxl-int8 Issues replated to SDXL quantized model support
Projects
Status: Done
Development

No branches or pull requests

2 participants