Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LowerToBackendContract: Explicitly error out on unimplemented operator #1947

Merged
merged 2 commits into from
Mar 20, 2023

Conversation

mgehre-amd
Copy link
Contributor

As discussed on discord [0], this PR will emit more direct errors like unsupported by backend contract: Unimplemented operator 'aten.clamp.Tensor' instead of diagnosing unknown ranks or missing value conversion.

[0] https://discordapp.com/channels/636084430946959380/742573221882364009/1083826291301679225

@mgehre-amd mgehre-amd requested a review from ramiro050 March 16, 2023 13:11
@mgehre-amd mgehre-amd force-pushed the mgehre.unsupported_op branch from bb69a9f to e8b9d10 Compare March 16, 2023 15:10
@mgehre-amd mgehre-amd force-pushed the mgehre.unsupported_op branch from e8b9d10 to bb80407 Compare March 17, 2023 08:35
@mgehre-amd
Copy link
Contributor Author

mgehre-amd commented Mar 17, 2023

I'm a bit surprised by the CI failure. Turn out that
./tools/e2e_test.sh -f NormalizeModule_basic -c lazy_tensor_core fails on this PR with error: unsupported by backend contract: Unimplemented operator 'aten.clamp_min.Tensor' but it passes without my PR.

(Same for LTC on ElementwiseClampMaxModule_basic, fails with aten.clamp_max.Tensor)

I don't quite understand how it could be passing before when the operator is not implemented?

@ramiro050
Copy link
Collaborator

That's very strange. Let me take a look and see what I find

@ramiro050
Copy link
Collaborator

I don't quite understand how it could be passing before when the operator is not implemented?

I think I know what's happening. The LTC reference backend in Torch-MLIR creates an MLIR graph that is expected to satisfy the backend contract, but then executes things using the TorchScript LTC backend. In other words, the MLIR graph is not compiled further down and executed.

// We don't have a way to execute a computation based on the generated MLIR,
// so we'll fallback to the implementation used by the TS LTC backend.

This is why the tests were passing even though there is no ODS for aten.clamp_max.Tensor.

In reality, the torch.operator is not the erroneous part. We expect to use torch.operators to represent custom ops in the future. The cases we want to error out on are when torch.operators don't satisfy the guarantees of the backend contract: value semantics, dtype, and shapes for all tensors.

In the case of LTC, the clamp tests create the op

%4 = torch.operator "aten.clamp_min.Tensor"(%3, %arg0) : (!torch.vtensor<[3,1],f32>, !torch.vtensor<[],f64>) -> !torch.vtensor<[3,1],f32>

which does satisfy the backend contract. The check should be modified to use the checkType helper to make sure the outputs of torch.operator satisfy the backend contract.

…nvalid

Otherwise it might be a custom op that the backend supports.
@mgehre-amd
Copy link
Contributor Author

I don't quite understand how it could be passing before when the operator is not implemented?

I think I know what's happening. The LTC reference backend in Torch-MLIR creates an MLIR graph that is expected to satisfy the backend contract, but then executes things using the TorchScript LTC backend. In other words, the MLIR graph is not compiled further down and executed.

// We don't have a way to execute a computation based on the generated MLIR,
// so we'll fallback to the implementation used by the TS LTC backend.

This is why the tests were passing even though there is no ODS for aten.clamp_max.Tensor.

In reality, the torch.operator is not the erroneous part. We expect to use torch.operators to represent custom ops in the future. The cases we want to error out on are when torch.operators don't satisfy the guarantees of the backend contract: value semantics, dtype, and shapes for all tensors.

In the case of LTC, the clamp tests create the op

%4 = torch.operator "aten.clamp_min.Tensor"(%3, %arg0) : (!torch.vtensor<[3,1],f32>, !torch.vtensor<[],f64>) -> !torch.vtensor<[3,1],f32>

which does satisfy the backend contract. The check should be modified to use the checkType helper to make sure the outputs of torch.operator satisfy the backend contract.

Thanks, that's it. I updated the PR and now it passes :-)

Copy link
Collaborator

@ramiro050 ramiro050 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@mgehre-amd mgehre-amd merged commit aa5bcb3 into llvm:main Mar 20, 2023
@mgehre-amd mgehre-amd deleted the mgehre.unsupported_op branch March 21, 2023 12:27
gpetters94 pushed a commit to gpetters94/mlir-npcomp that referenced this pull request May 10, 2023
llvm#1947)

* LowerToBackendContract: Explicitly error out on unimplemented operator

But only reject torch.operator when results are invalid.
Otherwise it might be a custom op that the backend supports.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants