-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlir][Transforms] GreedyPatternRewriteDriver
: verify IR
#74270
[mlir][Transforms] GreedyPatternRewriteDriver
: verify IR
#74270
Conversation
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-core Author: Matthias Springer (matthias-springer) ChangesThis commit adds an additional "expensive check" that verifies the IR after every pattern application. (Only if There are currently 89 tests that are failing with this check. Many of them are in the This commit does not fix any patterns, this is done in separate commits. Full diff: https://github.com/llvm/llvm-project/pull/74270.diff 1 Files Affected:
diff --git a/mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp b/mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
index 8e2bfe557c555..93a259a82cd2b 100644
--- a/mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
+++ b/mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
@@ -15,6 +15,7 @@
#include "mlir/Config/mlir-config.h"
#include "mlir/IR/Action.h"
#include "mlir/IR/Matchers.h"
+#include "mlir/IR/Verifier.h"
#include "mlir/Interfaces/SideEffectInterfaces.h"
#include "mlir/Rewrite/PatternApplicator.h"
#include "mlir/Transforms/FoldUtils.h"
@@ -468,11 +469,19 @@ bool GreedyPatternRewriteDriver::processWorklist() {
/*topLevel=*/config.scope ? config.scope->getParentOp() : op);
auto clearFingerprints =
llvm::make_scope_exit([&]() { debugFingerPrints.clear(); });
+ Operation *rootOp =
+ op->getParentWithTrait<mlir::OpTrait::IsIsolatedFromAbove>();
+ assert(rootOp && "expected non-null root op");
#endif // MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS
LogicalResult matchResult =
matcher.matchAndRewrite(op, *this, canApply, onFailure, onSuccess);
+#if MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS
+ if (failed(verify(rootOp)))
+ llvm::report_fatal_error("IR failed to verify after pattern application");
+#endif // MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS
+
if (succeeded(matchResult)) {
LLVM_DEBUG(logResultWithLine("success", "pattern matched"));
#if MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS
|
When two `complex.bitcast` ops are folded and the resulting bitcast is a non-complex -> non-complex bitcast, an `arith.bitcast` should be generated. Otherwise, the generated `complex.bitcast` op is invalid. Also remove a pattern that convertes non-complex -> non-complex `complex.bitcast` ops to `arith.bitcast`. Such `complex.bitcast` ops are invalid and should not appear in the input. Note: This bug can only be triggered by running with `-debug` (which will should intermediate IR that does not verify) or with `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` (llvm#74270).
Also see comment in #74271: Should we require patterns to generate valid IR? If all patterns are defined in the same component (e.g., canonicalization patterns of the same dialect) and the output is always valid, maybe intermediate invalid IR is OK. But things could get tricky when patterns from other components are mixed into the greedy pattern rewrite and stuff can suddenly break in unexpected ways. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we already have a bot checking with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS
?
When two `complex.bitcast` ops are folded and the resulting bitcast is a non-complex -> non-complex bitcast, an `arith.bitcast` should be generated. Otherwise, the generated `complex.bitcast` op is invalid. Also remove a pattern that convertes non-complex -> non-complex `complex.bitcast` ops to `arith.bitcast`. Such `complex.bitcast` ops are invalid and should not appear in the input. Note: This bug can only be triggered by running with `-debug` (which will should intermediate IR that does not verify) or with `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` (#74270).
`isDimOpValidSymbol` is used during the verification of `affine.for` ops. It is used to check if LB/UB values are valid symbols. This change adds support for `memref.cast`, which can be skipped over if it is a ranked -> ranked cast. This change fixes `mlir/test/Transforms.mlir`, which used to fail when verifying the IR after each pattern application (llvm#74270). In this test case, a pattern that folds dynamic offsets/sizes/strides to static ones is applied. This pattern inserts a trivial `memref.cast` that can be folded away. This folding happens after the pattern application, so the IR fails to verify after applying the offsets/sizes/strides canonicalization pattern. Note: The verifier of `affine.for` violates MLIR guidelines. Only local properties of an op should be verified. The verifier should not inspect the defining ops of operands. (This would mean that constraints such as "operand is a valid affine symbol" cannot be verified.)
Fixes a bug in `SplitDeallocWhenNotAliasingAnyOther`. This pattern used to generate invalid IR (op dominance error). We never noticed this bug in existing test cases because other patterns and/or foldings were applied afterwards and those rewrites "fixed up" the IR again. (The bug is visible when running `mlir-opt -debug`.) Also add additional comments to the implementation and simplify the code a bit. Apart from the fixed dominance error, this change is NFC. Without this change, buffer deallocation tests will fail when running with #74270.
`DecomposePrintOpConversion` used to generate invalid op such as: ``` error: 'arith.extsi' op operand type 'vector<10xi32>' and result type 'vector<10xi32>' are cast incompatible vector.print %v9 : vector<10xi32> ``` This commit fixes tests such as `mlir/test/Integration/Dialect/Vector/CPU/test-reductions-i32.mlir` when verifying the IR after each pattern (llvm#74270).
…tion `SimplifyClones` used to generate an invalid op: ``` error: 'memref.cast' op operand type 'memref<*xf32>' and result type 'memref<*xf32>' are cast incompatible %2 = bufferization.clone %1 : memref<*xf32> to memref<*xf32 ``` This commit fixes tests such as `mlir/test/Dialect/Bufferization/canonicalize.mlir` when verifying the IR after each pattern (llvm#74270).
`FoldInsertPadIntoFill` used to generate an invalid `tensor.insert_slice` op: ``` error: expected type to be 'tensor<?x?x?xf32>' or a rank-reduced version. (size mismatch) ``` This commit fixes tests such as `mlir/test/Dialect/Linalg/canonicalize.mlir` when verifying the IR after each pattern application (llvm#74270).
Linalg op fusion (`Linalg/Transforms/Fusion.cpp`) used to generate invalid fused producer ops: ``` error: 'linalg.conv_2d_nhwc_hwcf' op expected type of operand llvm#2 ('tensor<1x8x16x4xf32>') to match type of corresponding result ('tensor<?x?x?x?xf32>') note: see current operation: %24 = "linalg.conv_2d_nhwc_hwcf"(%21, %22, %23) <{dilations = dense<1> : tensor<2xi64>, operandSegmentSizes = array<i32: 2, 1>, strides = dense<2> : tensor<2xi64>}> ({ ^bb0(%arg9: f32, %arg10: f32, %arg11: f32): %28 = "arith.mulf"(%arg9, %arg10) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32 %29 = "arith.addf"(%arg11, %28) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32 "linalg.yield"(%29) : (f32) -> () }) {linalg.memoized_indexing_maps = [affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1 * 2 + d4, d2 * 2 + d5, d6)>, affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d4, d5, d6, d3)>, affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1, d2, d3)>]} : (tensor<1x?x?x3xf32>, tensor<3x3x3x4xf32>, tensor<1x8x16x4xf32>) -> tensor<?x?x?x?xf32> ``` This is a problem because the input IR to greedy pattern rewriter during `-test-linalg-greedy-fusion` is invalid. This commit fixes tests such as `mlir/test/Dialect/Linalg/tile-and-fuse-tensors.mlir` when verifying the IR after each pattern application (llvm#74270).
The `ShapeOfOp` folder used to generate invalid IR. Input: ``` %0 = shape.shape_of %arg1 : tensor<index> -> tensor<?xindex> ``` Output: ``` %0 = "shape.const_shape"() <{shape = dense<> : tensor<0xindex>}> : () -> tensor<?xindex> error: 'shape.const_shape' op inferred type(s) 'tensor<0xindex>' are incompatible with return type(s) of operation 'tensor<?xindex>' ``` This rewrite cannot be implemented as a folder because the result type may have to change. In the above example, the original `shape.shape_of` op had a return type of `tensor<?xindex>`, but the folded attribute (materialized as a `shape.const_shape` op) must have a type of `tensor<0xf32>` to be valid. This commit fixes tests such as `mlir/test/Dialect/Shape/canonicalize.mlir` when verifying the IR after each pattern application (llvm#74270).
…74438) The `ShapeOfOp` folder used to generate invalid IR. Input: ``` %0 = shape.shape_of %arg1 : tensor<index> -> tensor<?xindex> ``` Output: ``` %0 = "shape.const_shape"() <{shape = dense<> : tensor<0xindex>}> : () -> tensor<?xindex> error: 'shape.const_shape' op inferred type(s) 'tensor<0xindex>' are incompatible with return type(s) of operation 'tensor<?xindex>' ``` This rewrite cannot be implemented as a folder because the result type may have to change. In the above example, the original `shape.shape_of` op had a return type of `tensor<?xindex>`, but the folded attribute (materialized as a `shape.const_shape` op) must have a type of `tensor<0xf32>` to be valid. This commit fixes tests such as `mlir/test/Dialect/Shape/canonicalize.mlir` when verifying the IR after each pattern application (#74270).
…tion (#74417) `SimplifyClones` used to generate an invalid op: ``` error: 'memref.cast' op operand type 'memref<*xf32>' and result type 'memref<*xf32>' are cast incompatible %2 = bufferization.clone %1 : memref<*xf32> to memref<*xf32 ``` This commit fixes tests such as `mlir/test/Dialect/Bufferization/canonicalize.mlir` when verifying the IR after each pattern (#74270).
`DecomposePrintOpConversion` used to generate invalid op such as: ``` error: 'arith.extsi' op operand type 'vector<10xi32>' and result type 'vector<10xi32>' are cast incompatible vector.print %v9 : vector<10xi32> ``` This commit fixes tests such as `mlir/test/Integration/Dialect/Vector/CPU/test-reductions-i32.mlir` when verifying the IR after each pattern application (#74270).
The `ForallRewriter` pattern used to generate invalid IR: ``` mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir:0:0: error: 'scf.for' op expects region #0 to have 0 or 1 blocks mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir:0:0: note: see current operation: "scf.for"(%8, %2, %9) ({ ^bb0(%arg5: index): // ... "scf.yield"() : () -> () ^bb1(%10: index): // no predecessors "scf.yield"() : () -> () }) : (index, index, index) -> () ``` This commit fixes tests such as `mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir` when verifying the IR after each pattern application (llvm#74270).
It could be added as an option next to Another problem with this approach is that there is currently no API to signal a pass failure from the greedy pattern rewrite driver. I currently use |
The op used to support only float element types. This was inconsistent with `ConstantOp::isBuildableWith`, which allows integer element types. The complex type allows any float/integer element type. Note: The other complex dialect ops do not support non-float element types yet. The purpose of this change to fix `Tensor/canonicalize.mlir`, which is currently failing when verifying the IR after each pattern application (llvm#74270). ``` within split at mlir/test/Dialect/Tensor/canonicalize.mlir:231 offset :8:15: error: 'complex.constant' op result #0 must be complex type with floating-point elements, but got 'complex<i32>' %complex1 = tensor.extract %c1[] : tensor<complex<i32>> ^ within split at mlir/test/Dialect/Tensor/canonicalize.mlir:231 offset :8:15: note: see current operation: %0 = "complex.constant"() <{value = [1 : i32, 2 : i32]}> : () -> complex<i32> "func.func"() <{function_type = () -> tensor<3xcomplex<i32>>, sym_name = "extract_from_elements_complex_i"}> ({ %0 = "complex.constant"() <{value = [1 : i32, 2 : i32]}> : () -> complex<i32> %1 = "arith.constant"() <{value = dense<(3,2)> : tensor<complex<i32>>}> : () -> tensor<complex<i32>> %2 = "arith.constant"() <{value = dense<(1,2)> : tensor<complex<i32>>}> : () -> tensor<complex<i32>> %3 = "tensor.extract"(%1) : (tensor<complex<i32>>) -> complex<i32> %4 = "tensor.from_elements"(%0, %3, %0) : (complex<i32>, complex<i32>, complex<i32>) -> tensor<3xcomplex<i32>> "func.return"(%4) : (tensor<3xcomplex<i32>>) -> () }) : () -> () ```
…der` pattern (#74552) `ParallelOpSingleOrZeroIterationDimsFolder` used to produce invalid IR: ``` within split at mlir/test/Dialect/SCF/canonicalize.mlir:1 offset :11:3: error: 'scf.parallel' op expects region #0 to have 0 or 1 blocks scf.parallel (%i0, %i1, %i2) = (%c0, %c3, %c7) to (%c1, %c6, %c10) step (%c1, %c2, %c3) { ^ within split at mlir/test/Dialect/SCF/canonicalize.mlir:1 offset :11:3: note: see current operation: "scf.parallel"(%4, %5, %3) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>}> ({ ^bb0(%arg1: index): "memref.store"(%0, %arg0, %1, %arg1, %6) : (i32, memref<?x?x?xi32>, index, index, index) -> () "scf.yield"() : () -> () ^bb1(%8: index): // no predecessors "scf.yield"() : () -> () }) : (index, index, index) -> () ``` Together with #74551, this commit fixes `mlir/test/Dialect/SCF/canonicalize.mlir` when verifying the IR after each pattern application (#74270).
I don't ask to change code immediately and I don't know what the best solution is, just some usability concerns.
|
The return type of |
`createQuickSort` used to generate invalid IR: ``` "func.func"() <{function_type = (index, index, memref<?xindex>, memref<?xf32>, memref<?xi32>) -> (), sym_name = "_sparse_qsort_0_1_index_coo_1_f32_i32", sym_visibility = "private"}> ({ ^bb0(%arg0: index, %arg1: index, %arg2: memref<?xindex>, %arg3: memref<?xf32>, %arg4: memref<?xi32>): %0:2 = "scf.while"(%arg0, %arg1) ({ ^bb0(%arg5: index, %arg6: index): // ... "scf.condition"(%3, %arg5, %arg6) : (i1, index, index) -> () }, { ^bb0(%arg5: index, %arg6: index): // ... %7:2 = "scf.if"(%6) ({ %8 = "arith.cmpi"(%2, %3) <{predicate = 7 : i64}> : (index, index) -> i1 // ... "scf.yield"(%9#0, %9#1) : (index, index) -> () %10 = "arith.constant"() <{value = 0 : index}> : () -> index }, { "scf.yield"(%arg5, %arg5) : (index, index) -> () }) : (i1) -> (index, index) "scf.yield"(%7#0, %7#1) : (index, index) -> () }) : (index, index) -> (index, index) "func.return"() : () -> () }) : () -> () within split at mlir/test/Dialect/SparseTensor/buffer_rewriting.mlir:76 offset :11:1: error: 'scf.yield' op must be the last operation in the parent block ``` This commit fixes tests such as `mlir/test/Dialect/SparseTensor/buffer_rewriting.mlir` when verifying the IR after each pattern application (#74270).
The `ForallRewriter` pattern used to generate invalid IR: ``` mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir:0:0: error: 'scf.for' op expects region #0 to have 0 or 1 blocks mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir:0:0: note: see current operation: "scf.for"(%8, %2, %9) ({ ^bb0(%arg5: index): // ... "scf.yield"() : () -> () ^bb1(%10: index): // no predecessors "scf.yield"() : () -> () }) : (index, index, index) -> () ``` This commit fixes tests such as `mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir` when verifying the IR after each pattern application (#74270).
…#74564) The op used to support only float element types. This was inconsistent with `ConstantOp::isBuildableWith`, which allows integer element types. The complex type allows any float/integer element type. Note: The other complex dialect ops do not support non-float element types yet. The main purpose of this change to fix `Tensor/canonicalize.mlir`, which is currently failing when verifying the IR after each pattern application (#74270). ``` within split at mlir/test/Dialect/Tensor/canonicalize.mlir:231 offset :8:15: error: 'complex.constant' op result #0 must be complex type with floating-point elements, but got 'complex<i32>' %complex1 = tensor.extract %c1[] : tensor<complex<i32>> ^ within split at mlir/test/Dialect/Tensor/canonicalize.mlir:231 offset :8:15: note: see current operation: %0 = "complex.constant"() <{value = [1 : i32, 2 : i32]}> : () -> complex<i32> "func.func"() <{function_type = () -> tensor<3xcomplex<i32>>, sym_name = "extract_from_elements_complex_i"}> ({ %0 = "complex.constant"() <{value = [1 : i32, 2 : i32]}> : () -> complex<i32> %1 = "arith.constant"() <{value = dense<(3,2)> : tensor<complex<i32>>}> : () -> tensor<complex<i32>> %2 = "arith.constant"() <{value = dense<(1,2)> : tensor<complex<i32>>}> : () -> tensor<complex<i32>> %3 = "tensor.extract"(%1) : (tensor<complex<i32>>) -> complex<i32> %4 = "tensor.from_elements"(%0, %3, %0) : (complex<i32>, complex<i32>, complex<i32>) -> tensor<3xcomplex<i32>> "func.return"(%4) : (tensor<3xcomplex<i32>>) -> () }) : () -> () ```
`isDimOpValidSymbol` is used during the verification of `affine.for` ops. It is used to check if LB/UB values are valid symbols. This change adds support for `memref.cast`, which can be skipped over if it is a ranked -> ranked cast. This change fixes `mlir/test/Transforms/canonicalize.mlir`, which used to fail when verifying the IR after each pattern application (#74270). In this test case, a pattern that folds dynamic offsets/sizes/strides to static ones is applied. This pattern inserts a trivial `memref.cast` that can be folded away. This folding happens after the pattern application, so the IR fails to verify after applying the offsets/sizes/strides canonicalization pattern. Note: The verifier of `affine.for` violates MLIR guidelines. Only local properties of an op should be verified. The verifier should not inspect the defining ops of operands. (This would mean that constraints such as "operand is a valid affine symbol" cannot be verified.)
Linalg op fusion (`Linalg/Transforms/Fusion.cpp`) used to generate invalid fused producer ops: ``` error: 'linalg.conv_2d_nhwc_hwcf' op expected type of operand #2 ('tensor<1x8x16x4xf32>') to match type of corresponding result ('tensor<?x?x?x?xf32>') note: see current operation: %24 = "linalg.conv_2d_nhwc_hwcf"(%21, %22, %23) <{dilations = dense<1> : tensor<2xi64>, operandSegmentSizes = array<i32: 2, 1>, strides = dense<2> : tensor<2xi64>}> ({ ^bb0(%arg9: f32, %arg10: f32, %arg11: f32): %28 = "arith.mulf"(%arg9, %arg10) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32 %29 = "arith.addf"(%arg11, %28) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32 "linalg.yield"(%29) : (f32) -> () }) {linalg.memoized_indexing_maps = [affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1 * 2 + d4, d2 * 2 + d5, d6)>, affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d4, d5, d6, d3)>, affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1, d2, d3)>]} : (tensor<1x?x?x3xf32>, tensor<3x3x3x4xf32>, tensor<1x8x16x4xf32>) -> tensor<?x?x?x?xf32> ``` This is a problem because the input IR to greedy pattern rewriter during `-test-linalg-greedy-fusion` is invalid. This commit fixes tests such as `mlir/test/Dialect/Linalg/tile-and-fuse-tensors.mlir` when verifying the IR after each pattern application (#74270).
`FoldInsertPadIntoFill` used to generate an invalid `tensor.insert_slice` op: ``` error: expected type to be 'tensor<?x?x?xf32>' or a rank-reduced version. (size mismatch) ``` This commit fixes tests such as `mlir/test/Dialect/Linalg/canonicalize.mlir` when verifying the IR after each pattern application (#74270).
Not yet. One concern that I have is that some |
I'm not worried about hash collision for this feature |
Similar to `vector.transfer_read`/`vector.transfer_write`, allow 0-D vectors. This commit fixes `mlir/test/Dialect/Vector/vector-transfer-to-vector-load-store.mlir` when verifying the IR after each pattern (llvm#74270). That test produces a temporary 0-D load/store op.
Similar to `vector.transfer_read`/`vector.transfer_write`, allow 0-D vectors. This commit fixes `mlir/test/Dialect/Vector/vector-transfer-to-vector-load-store.mlir` when verifying the IR after each pattern (#74270). That test produces a temporary 0-D load/store op.
29a8a3f
to
0310e99
Compare
GreedyPatternRewriteDriver
: verify IR
…n application This commit adds an additional "expensive check" that verifies the IR after every pattern application. (Only if `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` is set.) There are currently 89 tests that are failing with this check. Many of them are in the `vector` and `SparseTensor` dialects, most of which are probably failing due to same reason. The actual number of patterns that produce invalid IR is probably in the order of 10. This commit does not fix any patterns, this is done in separate commits.
0310e99
to
805f123
Compare
I have to fix 5 more tests (some non-trivial), then we can add the new bot:
|
This commit adds an additional "expensive check" that verifies the IR before starting a greedy pattern rewriter, after every pattern application and after every folding. (Only if
MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS
is set.)It also adds an assertion that the
scope
region (part ofGreedyRewriteConfig
) is not being erased as part of the greedy pattern rewrite. That would break the scoping mechanism and the expensive checks.This commit does not fix any patterns, this is done in separate commits.