[GPUHeuristic] Modify schedule generator to consider distribution of tranfer_read layout anchor #17636

raikonenfnu · 2024-06-10T23:15:32Z

Currently we are generating invalid schedules who's transfer read cannot be distributed because the sizes do not match up.

For example in our case our [wgTileSize, elemPerThread, threadSize] = [192, 8, 128]. There is no good layout for this because, the numbers of threads needed would be 192/8 == 24. And Since the threadSize pre-determined by schedule is 128, 128 % 24 != 0. Hence we cannot distribute it.

This patch teaches the schedule generator about these constraints.

…r_read. Currently we are generating invalid schedules who's transfer read cannot be distributed because the sizes do not match up. For example in our case our [wgTileSize, elemPerThread, threadSize] = [192, 8, 128]. There is no good layout for this because, the numbers of threads needed would be 192/8 == 24. And Since the threadSize pre-determined by schedule is 128, 128 % 24 != 0. Hence we cannot distribute it. This patch teaches the schedule generator about these constraints. Signed-off-by: stanley-nod <[email protected]>

Signed-off-by: stanley-nod <[email protected]>

qedawkins · 2024-06-11T18:32:45Z

compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.cpp

+  int64_t nTileSize =
+      schedule.nSize * schedule.nTileCount * schedule.nWarpCount;
+  bool isDistributableN = (nTileSize / elemsPerThread) % wgThreads == 0 ||
+                          wgThreads % (nTileSize / elemsPerThread) == 0;


you commented that this will work for matmul_transpose_b but not for matmul because it depends on which dimension is inner most. Can we add identifiers for which dimension is inner most to GPUMatmulShapeType to inform this heuristic about when to check for this?

Yeah, we can do that let me think of a nice way to shuttle this data around. :)

Still no guarantees(though unlikely) that somewhere down the line this information may change, but defo better than no heuristics haha.

Signed-off-by: stanley-nod <[email protected]>

qedawkins

It's worth noting that the heuristic logic is starting to get very involved/opinionated due to quirks of the lowering pipelines. It will be hard to maintain this state moving forward unless we can find time to start cleaning up tech debt of unhandled cases. Approving for now because I don't have a better suggestion and don't want to block, but we should fix codegen to not fail on certain valid lowering configs.

qedawkins · 2024-06-11T23:48:42Z

compiler/src/iree/compiler/Codegen/SPIRV/KernelConfig.cpp

+    return op.emitError("kDim or nDim not found in RHS indexing map.");
+  }
+  bool transposedLhs = lhsMDim.value() > lhsKDim.value();
+  bool transposedRhs = rhsKDim.value() > rhsNDim.value();


Slightly simpler could be to just compare cast<AffineDimExpr>(maps[0].getResults().back()).getPosition() with mIndex and kIndex. We can bail out if neither of them are inner most for now (I do not think we've hit that case, and the pipeline more or less assumes that we don't). Then we don't need any calls to getResultPosition which scans the whole map.

nice idea, thanks :)

kuhar · 2024-06-11T23:49:58Z

compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp

+  auto maps = op.getIndexingMapsArray();
+  OpBuilder b(op);
+  auto lhsMDim = maps[0].getResultPosition(b.getAffineDimExpr(mDim));
+  auto lhsKDim = maps[0].getResultPosition(b.getAffineDimExpr(kDim));
+  if (!lhsMDim.has_value() || !lhsKDim.has_value()) {
+    return op.emitError("mDim or kDim not found in LHS indexing map.");
+  }
+  auto rhsKDim = maps[1].getResultPosition(b.getAffineDimExpr(kDim));
+  auto rhsNDim = maps[1].getResultPosition(b.getAffineDimExpr(nDim));


nit: Don't use auto when the type is not obvious based on the RHS only

done, after simplification :)

kuhar · 2024-06-11T23:51:18Z

compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp

+  auto rhsKDim = maps[1].getResultPosition(b.getAffineDimExpr(kDim));
+  auto rhsNDim = maps[1].getResultPosition(b.getAffineDimExpr(nDim));
+  if (!rhsKDim.has_value() || !rhsNDim.has_value()) {
+    return op.emitError("kDim or nDim not found in RHS indexing map.");


Is this an op error (IE bad input IR) or something we should use an assertion for (logic error)?

I see I see, it is a logic error, I'll change it to assert :)

done, dissappears after simplification :)

kuhar · 2024-06-11T23:54:05Z

compiler/src/iree/compiler/Codegen/SPIRV/KernelConfig.cpp

+  auto maps = op.getIndexingMapsArray();
+  OpBuilder b(op);
+  auto lhsMDim = maps[0].getResultPosition(b.getAffineDimExpr(mIndex));
+  auto lhsKDim = maps[0].getResultPosition(b.getAffineDimExpr(kIndex));
+  if (!lhsMDim.has_value() || !lhsKDim.has_value()) {
+    return op.emitError("mDim or kDim not found in LHS indexing map.");
+  }
+  auto rhsKDim = maps[1].getResultPosition(b.getAffineDimExpr(kIndex));
+  auto rhsNDim = maps[1].getResultPosition(b.getAffineDimExpr(nIndex));
+  if (!rhsKDim.has_value() || !rhsNDim.has_value()) {
+    return op.emitError("kDim or nDim not found in RHS indexing map.");
+  }


done, dissappears after simplification :)

Signed-off-by: stanley-nod <[email protected]>

kuhar

Looks good, just 2 remaining nits

kuhar · 2024-06-12T00:31:55Z

compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp

-      deduceMMASchedule(problem, intrinsics, seeds, maxSharedMemoryBytes);
+
+  // Infer if lhs or rhs is transposed to help generate better schedule.
+  auto maps = op.getIndexingMapsArray();


thanks2 done :)

kuhar · 2024-06-12T00:32:10Z

compiler/src/iree/compiler/Codegen/SPIRV/KernelConfig.cpp

@@ -928,6 +921,23 @@ LogicalResult setCooperativeMatrixConfig(
      subgroupSize = *minSize;
  }

+  // Infer if lhs or rhs is transposed to help generate better schedule.
+  auto maps = op.getIndexingMapsArray();


Signed-off-by: stanley-nod <[email protected]>

…tranfer_read layout anchor (iree-org#17636) Modify heuristic to take into account layout of transfer reads, S.T we will not generate invalid schedules who's transfer read cannot be distributed because the sizes do not match up. For example in one matmul with N-dim with these sizes [wgTileSize, elemPerThread, threadSize] = [192, 8, 128]. There is no good layout for this because, the numbers of threads needed would be 192/8 == 24, and Since the threadSize pre-determined by schedule is 128, we will have 128 % 24 != 0. Hence we cannot distribute it. This patch introduce constraints in our heuristic to solve these cases. --------- Signed-off-by: stanley-nod <[email protected]> Signed-off-by: Lubo Litchev <[email protected]>

raikonenfnu requested review from antiagainst, MaheshRavishankar, kuhar, qedawkins and Groverkss as code owners June 10, 2024 23:15

Clarify magic number

a762d17

Signed-off-by: stanley-nod <[email protected]>

qedawkins reviewed Jun 11, 2024

View reviewed changes

raikonenfnu added 2 commits June 11, 2024 16:30

Generalize distributable/good transfer_read layout.

0d83334

Signed-off-by: stanley-nod <[email protected]>

Clean up todo notes that is done.

a6a65fd

Signed-off-by: stanley-nod <[email protected]>

raikonenfnu requested a review from qedawkins June 11, 2024 23:40

qedawkins approved these changes Jun 11, 2024

View reviewed changes

kuhar reviewed Jun 11, 2024

View reviewed changes

Simplify transpose detector

b606ec7

Signed-off-by: stanley-nod <[email protected]>

raikonenfnu requested review from qedawkins and kuhar June 12, 2024 00:03

fix minor bug

f4125a6

Signed-off-by: stanley-nod <[email protected]>

kuhar approved these changes Jun 12, 2024

View reviewed changes

NIT affinemap

1d6ba84

Signed-off-by: stanley-nod <[email protected]>

raikonenfnu merged commit 52b21f8 into iree-org:main Jun 12, 2024
51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPUHeuristic] Modify schedule generator to consider distribution of tranfer_read layout anchor #17636

[GPUHeuristic] Modify schedule generator to consider distribution of tranfer_read layout anchor #17636

raikonenfnu commented Jun 10, 2024

qedawkins Jun 11, 2024

raikonenfnu Jun 11, 2024

qedawkins left a comment

qedawkins Jun 11, 2024

raikonenfnu Jun 12, 2024

kuhar Jun 11, 2024

raikonenfnu Jun 12, 2024

kuhar Jun 11, 2024

raikonenfnu Jun 11, 2024

raikonenfnu Jun 12, 2024

kuhar Jun 11, 2024

raikonenfnu Jun 12, 2024

kuhar left a comment

kuhar Jun 12, 2024

raikonenfnu Jun 12, 2024

kuhar Jun 12, 2024

[GPUHeuristic] Modify schedule generator to consider distribution of tranfer_read layout anchor #17636

[GPUHeuristic] Modify schedule generator to consider distribution of tranfer_read layout anchor #17636

Conversation

raikonenfnu commented Jun 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qedawkins left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuhar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment