Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVMGPUVectorDistribute] Add general support for statically tiled codegen on dynamic shapes #19992

Closed
wants to merge 8 commits into from

Conversation

manupak
Copy link
Contributor

@manupak manupak commented Feb 14, 2025

This PR adds support to perform statically tiled codegen on dynamic shapes in vector distribute pipeline.
Basically, it could honor lowering configs on dynamic shapes using masking.

Some side-effect changes:

  • Currently block dynamic dimension pass, change the dimensionality of the generics without performing a projection of the lowering config that was provided higher up in the pipeline. Moreover, the requirement to do this becomes less now as we can tile generally on the dynamic dimension with the changes here -- unless Im missing something here.

This builds on the following PRs -- hence putting to draft:

future work:

@manupak manupak marked this pull request as draft February 14, 2025 16:04
@manupak manupak changed the title [LLVMGPUVectorDistribute] Add general support to statically tiled codegen on dynamic shapes [LLVMGPUVectorDistribute] Add general support for statically tiled codegen on dynamic shapes Feb 14, 2025
@manupak manupak force-pushed the distribute-mask-compute-v3 branch 3 times, most recently from e460bdb to afe0147 Compare February 24, 2025 16:50
  Also, keeping it disabled by default until lowering config
  projection is fixed.
* enable masking in generic vectorization
* add two runs of resolve type to fold tensor.dim in rank reducing
  type.

Signed-off-by: Manupa Karunaratne <[email protected]>
masked compute.

Signed-off-by: Manupa Karunaratne <[email protected]>
masked cases.

Signed-off-by: Manupa Karunaratne <[email protected]>
* only enable masking in vectorization in vector distribute

Signed-off-by: Manupa Karunaratne <[email protected]>
and add code not to run on ops where lowering config
is set.

Signed-off-by: Manupa Karunaratne <[email protected]>
@manupak manupak force-pushed the distribute-mask-compute-v3 branch from afe0147 to b4c35c3 Compare February 26, 2025 10:59
@manupak manupak marked this pull request as ready for review February 26, 2025 11:00
@manupak manupak force-pushed the distribute-mask-compute-v3 branch from b4c35c3 to 526cfc1 Compare February 26, 2025 11:01
@manupak
Copy link
Contributor Author

manupak commented Feb 26, 2025

@Groverkss this is ready for review

Signed-off-by: Manupa Karunaratne <[email protected]>
Comment on lines +318 to +324
// If lowering config is set, changing the dimensionality of
// of the op will break the mapping. Therefore, skip operations
// that has lowering config set.
if (op->hasAttrOfType<IREE::Codegen::LoweringConfigAttrInterface>(
"lowering_config")) {
return success();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this was used while debugging. We should remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No ...
BlockDynamicDimension pass deletes the lowering config if it changes the linalg op.
Also the lowering config does not make sense after the dimensionality change.

Comment on lines +632 to +636
// CHECK: %[[MASK:.+]] = vector.create_mask %c1, %{{.+}}, %{{.+}} : vector<1x1x2xi1>
// CHECK: vector.transfer_read
// CHECK-SAME: in_bounds = [true, false, false]
// CHECK-SAME: memref<1x?x24xf32
// CHECK-SAME: %[[MASK]]
// CHECK-SAME: in_bounds = [true, true, true]
// CHECK-SAME: memref<196x24x24xf32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. So instead of in_bounds attr, we are relying on masking. Does it produce the same code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they both produce conditional code.

Comment on lines +1323 to +1329
// {indexing_maps = [
// affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3)>,
// affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d4, d3)>,
// affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d4)
// ],
// iterator_types = ["parallel", "parallel", "parallel", "reduction", "parallel"]
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented out code

Copy link
Contributor Author

@manupak manupak Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I thought its useful to understand the lowering config dimensionality of
QK and PV matmul generics.
Otherwise it appears as a set of magic numbers that is not represented in linalg_ext.attention op.

Comment on lines 1356 to 1357
hal.executable.export public @attention_dynamic_masked ordinal(0) layout(#hal.pipeline.layout<constants = 6, bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) {
^bb0(%arg0: !hal.device, %arg1: index, %arg2: index, %arg3: index):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We don't need pipeline binding flags like "ReadOnly|Indirect" for tests. Check how other tests do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

}

module {
hal.executable public @decode_attn_dispatch_0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should be in pipeline_vector_distribute_gfx942_reduction.mlir (or whever the reduction test file is called)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? all other attention tests are here...

// -----

#translation = #iree_codegen.translation_info<pipeline = LLVMGPUVectorDistribute workgroup_size = [256, 1, 1] subgroup_size = 64>
#lowering_config = #iree_gpu.lowering_config<{reduction = [0, 0, 0, 0, 0, 512], workgroup = [1, 1, 1, 32, 0, 0]}>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use partial_reduction instead of reduction here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we should be tiling the outer K2 dimension to number of warps?

Copy link
Contributor Author

@manupak manupak Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just an example.
Would you be able to spell out the config that you d like tested here ?

Comment on lines 1397 to 1400
%27 = hal.interface.binding.subspan layout(<constants = 6, bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) binding(1) alignment(64) offset(%c0) flags("ReadOnly|Indirect") : !flow.dispatch.tensor<readonly:tensor<4x32x?x128xf16>>{%24}
%28 = hal.interface.binding.subspan layout(<constants = 6, bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) binding(2) alignment(64) offset(%c0) flags("ReadOnly|Indirect") : !flow.dispatch.tensor<readonly:tensor<4x32x128x?xf16>>{%25}
%29 = hal.interface.binding.subspan layout(<constants = 6, bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) binding(3) alignment(64) offset(%c0) flags("ReadOnly|Indirect") : !flow.dispatch.tensor<readonly:tensor<4x32x1x?xf16>>{%26}
%30 = flow.dispatch.tensor.load %22, offsets = [0, 0, 0, 0], sizes = [4, 32, 1, 128], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4x32x1x128xf16>> -> tensor<4x32x1x128xf16>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a simpler test? I don't think we need all these pipeline.binding flags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try; for some reason I thought its needed for the test as every other test in the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh you mean remove the flags but keep the hal ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok Ive removed the flags and made it look more similiar to other tests.

Signed-off-by: Manupa Karunaratne <[email protected]>
@Groverkss
Copy link
Contributor

Moving PR to #20144 so I can land new changes to the PR. Thanks for the work @manupak !

@Groverkss Groverkss closed this Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants