-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Torch to TOSA conversion fails to legalize 'torch.constant.int' #961
Comments
I've encountered this already @Svoch - it also impacts MobilenetsV3 . Working on a fix internally but am getting some BERT ones out first. |
I just convert ConstantIntOp to arith while converting Bert to tosa, as a intermediate result. |
Below is the IR I get lowering the model above to Torch Dialect. If I've understood correctly, even though rewriters for AtenMmOp and AtenLinearOp do exist in TorchToTosa lowering pass, there is no lowering pattern for
|
The IR is really helpful to test against what I have and see if it legalizes right. I'll check today after morning meetings and post an update. |
Just pushed #1017 on this . |
I encounter the same issue to lower the huggingface gpt2. https://gist.github.com/AmosLewis/9b929414d5677afda3528122f92bbc73 @sjarus |
torch.constant.int is a known missing conversion. |
* Change how we get executable path Signed-off-by: Michael Holman <[email protected]> * fallback to kExecPath Signed-off-by: Michael Holman <[email protected]> * emit path in warning Signed-off-by: Michael Holman <[email protected]> Co-authored-by: Alexandre Eichenberger <[email protected]> Co-authored-by: gongsu832 <[email protected]>
Was this fixed? |
@silvasean - Yes. with the Torch to TOSA conversion of the Transpose Op merged, this issue can be marked as fixed. Please note that the symptom, i.e. |
The torch.constant.int error just means there are aten ops that use this torch.constant.int as operand haven’t been lowered successfully by your lowering code. You need to find the op that is not lower successfully in the IR of debug info. And understand each line of your lowering code that is related to this op and come up with a new plan. The error will come again and again on each op we try to lower until we lower it successfully. As you can see in the comment, I find this error many times when I started to work on gpt. This error will disappear when you lower your own ops to tosa(or other dialects stablehlo/linalg/tmtensor) correctly.
This is the command that you might need to get more debug info. you just need to replace the /tmp/aten_as_tride.mlir with op.mlir file you manually created. You can take my where.mlir file and command in comments as examples. Here is the link https://gist.github.com/AmosLewis/32847885f8b3ff27b7ef6564154fec59 For those who worked on tosa, here is the relationship of the 2 tosa-related flags for torch-mir-opt you need to understand before diving into debugging: -pass-pipeline='torch-backend-to-tosa-backend-pipeline' == "-convert-torch-to-tosa"+ some other clear/standard conversion pass(like clear the torch.constant.int for aten ops that successfully lowered to tosa) -pass-pipeline='torch-backend-to-tosa-backend-pipeline' will call this line 100, the whole function
-convert-torch-to-tosa will only call this line 102,
The torch.constant.int to tosa type should be clean around this line 113 if line 102 -convert-torch-to-tosa doing well.
And in each matchAndRewirte pattern, each aten ops has a corresponding Adaptor op. The adaptor is the mlir inside version of the aten ops. For example, for a where.mlir file, torch.aten.where.self(%arg0: !torch.vtensor<[1,1,5,5],i1> ), the arg0 if you use atenop.getSelf().dump(), you will get torch version tensor !torch.vtensor<[1,1,5,5],i1>. But if you use adaptor.getSelf().dump(), you will get tensor<1x1x5x5xi1>. Those useful op helper function like getSelf(), you can find them in you own building directory, build/tools/torc-mlir/include/torch-mlir/dialect/torch/IR/TorchOps.h.inc, their implementation is in build/tools/torc-mlir/include/torch-mlir/dialect/torch/IR/TorchOps.cpp.inc. This is automatic generated by tabelgen(.td file)of of mlir. The tablegen file location is at the similar dir structure of torch_mlir source code https://github.com/llvm/torch-mlir/blob/main/include/torch-mlir/Dialect/Torch/IR/GeneratedTorchOps.td. In this td file, you will find detail types of each aten ops, which will be very useful when you come up you new lowering plans And to play with adaptor's types, which is mlir internal type, like etc, you will need the function in external/llvm-project https://github.com/llvm/llvm-project/blob/798fa4b415eea55c868ae42b874083cb9886991e/mlir/include/mlir/IR/Types.h and https://github.com/llvm/llvm-project/blob/798fa4b415eea55c868ae42b874083cb9886991e/mlir/include/mlir/IR/BuiltinTypes.h We will have to go deep and read these codes, understand their design structure, and get familiar with them. Otherwise, nothing we can successfully debug. These codes are like the raw food for a cooker. C++ and python is our cooking tools. Our work is to come up with a recipe(lowering plan) and use the cooking tools to cook(implement/debug) it with this raw food. |
I am trying to compile a portion of a PyTorch Self-Attention module down to TOSA backend and am hitting an error on legalizing the
torch.contant.int
Op in TOSA conversion pass. The issue raises only when output type is set totorch_mlir.OutputType.TOSA
in Torch-MLIR compile API. The conversion to LinAlg Dialect and further down to backend works fine. However Torch to TOSA conversion is failing.Error log
Steps to reproduce
The script to reproduce the error is up in this draft PR on a local fork. The error can be reproduced using the code snippet below with the module definition and
torch_mlir.compile()
API call:This issue is potentially relevant to what @nithinsubbiah, @rdadolf and I are seeing in #910. I was also able to reproduce the error by simplifying the module above to a single Transpose Op (i.e.
torch.Tensor.transpose
in forward-propagate method).cc @sjarus @powderluv @silvasean - wonder if you have seen this or have any insight on what might have been going wrong here.
The text was updated successfully, but these errors were encountered: