-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLVMGPUVectorDistribute] Add general support for statically tiled codegen on dynamic shapes #19992
Conversation
e460bdb
to
afe0147
Compare
Also, keeping it disabled by default until lowering config projection is fixed. * enable masking in generic vectorization * add two runs of resolve type to fold tensor.dim in rank reducing type. Signed-off-by: Manupa Karunaratne <[email protected]>
masked compute. Signed-off-by: Manupa Karunaratne <[email protected]>
masked cases. Signed-off-by: Manupa Karunaratne <[email protected]>
* only enable masking in vectorization in vector distribute Signed-off-by: Manupa Karunaratne <[email protected]>
PR. Signed-off-by: Manupa Karunaratne <[email protected]>
and add code not to run on ops where lowering config is set. Signed-off-by: Manupa Karunaratne <[email protected]>
afe0147
to
b4c35c3
Compare
b4c35c3
to
526cfc1
Compare
@Groverkss this is ready for review |
Signed-off-by: Manupa Karunaratne <[email protected]>
// If lowering config is set, changing the dimensionality of | ||
// of the op will break the mapping. Therefore, skip operations | ||
// that has lowering config set. | ||
if (op->hasAttrOfType<IREE::Codegen::LoweringConfigAttrInterface>( | ||
"lowering_config")) { | ||
return success(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this was used while debugging. We should remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No ...
BlockDynamicDimension pass deletes the lowering config if it changes the linalg op.
Also the lowering config does not make sense after the dimensionality change.
// CHECK: %[[MASK:.+]] = vector.create_mask %c1, %{{.+}}, %{{.+}} : vector<1x1x2xi1> | ||
// CHECK: vector.transfer_read | ||
// CHECK-SAME: in_bounds = [true, false, false] | ||
// CHECK-SAME: memref<1x?x24xf32 | ||
// CHECK-SAME: %[[MASK]] | ||
// CHECK-SAME: in_bounds = [true, true, true] | ||
// CHECK-SAME: memref<196x24x24xf32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. So instead of in_bounds attr, we are relying on masking. Does it produce the same code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they both produce conditional code.
// {indexing_maps = [ | ||
// affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3)>, | ||
// affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d4, d3)>, | ||
// affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d4) | ||
// ], | ||
// iterator_types = ["parallel", "parallel", "parallel", "reduction", "parallel"] | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented out code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I thought its useful to understand the lowering config dimensionality of
QK and PV matmul generics.
Otherwise it appears as a set of magic numbers that is not represented in linalg_ext.attention op.
hal.executable.export public @attention_dynamic_masked ordinal(0) layout(#hal.pipeline.layout<constants = 6, bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) { | ||
^bb0(%arg0: !hal.device, %arg1: index, %arg2: index, %arg3: index): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We don't need pipeline binding flags like "ReadOnly|Indirect" for tests. Check how other tests do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
} | ||
|
||
module { | ||
hal.executable public @decode_attn_dispatch_0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test should be in pipeline_vector_distribute_gfx942_reduction.mlir (or whever the reduction test file is called)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? all other attention tests are here...
// ----- | ||
|
||
#translation = #iree_codegen.translation_info<pipeline = LLVMGPUVectorDistribute workgroup_size = [256, 1, 1] subgroup_size = 64> | ||
#lowering_config = #iree_gpu.lowering_config<{reduction = [0, 0, 0, 0, 0, 512], workgroup = [1, 1, 1, 32, 0, 0]}> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use partial_reduction instead of reduction here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we should be tiling the outer K2 dimension to number of warps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just an example.
Would you be able to spell out the config that you d like tested here ?
%27 = hal.interface.binding.subspan layout(<constants = 6, bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) binding(1) alignment(64) offset(%c0) flags("ReadOnly|Indirect") : !flow.dispatch.tensor<readonly:tensor<4x32x?x128xf16>>{%24} | ||
%28 = hal.interface.binding.subspan layout(<constants = 6, bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) binding(2) alignment(64) offset(%c0) flags("ReadOnly|Indirect") : !flow.dispatch.tensor<readonly:tensor<4x32x128x?xf16>>{%25} | ||
%29 = hal.interface.binding.subspan layout(<constants = 6, bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) binding(3) alignment(64) offset(%c0) flags("ReadOnly|Indirect") : !flow.dispatch.tensor<readonly:tensor<4x32x1x?xf16>>{%26} | ||
%30 = flow.dispatch.tensor.load %22, offsets = [0, 0, 0, 0], sizes = [4, 32, 1, 128], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4x32x1x128xf16>> -> tensor<4x32x1x128xf16> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a simpler test? I don't think we need all these pipeline.binding flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try; for some reason I thought its needed for the test as every other test in the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh you mean remove the flags but keep the hal ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok Ive removed the flags and made it look more similiar to other tests.
Signed-off-by: Manupa Karunaratne <[email protected]>
This PR adds support to perform statically tiled codegen on dynamic shapes in vector distribute pipeline.
Basically, it could honor lowering configs on dynamic shapes using masking.
Some side-effect changes:
This builds on the following PRs -- hence putting to draft:
future work: