Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Adding Torch to MHLO conversion #999

Closed
ZihengJiang opened this issue Jun 30, 2022 · 21 comments
Closed

[RFC] Adding Torch to MHLO conversion #999

ZihengJiang opened this issue Jun 30, 2022 · 21 comments

Comments

@ZihengJiang
Copy link
Collaborator

ZihengJiang commented Jun 30, 2022

Hello everyone, we are from the AML (Applied Machine Learning) team at Bytedance. While the TorchMLIR project provides a set of MLIR dialect for Torch, which bridges PyTorch and MLIR ecosystems, in our development, we also have observed interest in supporting the MHLO dialect as a lowering target, which is still missing in the project.
In the RFC, we propose to add conversation from Torch dialect to MHLO (Meta HLO Dialect). The MHLO dialect can be found here (https://github.com/tensorflow/mlir-hlo) and the corresponding operation definition is here (https://www.tensorflow.org/mlir/hlo_ops). We would add this conversion as an independent pass and this could provide an extra path where torch mlir can also be incorporated into MHLO dialect, and thus enrich the applications of TorchMLIR.

Motivation:

The HLO (High Level Optimizer) IR offers a carefully fixed selected list of operations. MHLO supports a HLO-like compilation pipeline using MLIR and provides a uniform interface to compile and execute these optimized HLO programs with dynamic shape support.
While the MHLO dialect supports Tensorflow models by nature, if we have the conversion pass from Torch dialect to MHLO, we could convert and lower those different frontend models into a unified IR. This would reduce engineering effort and ease the subsequent backend work.

Proposal:

Benefits:

  • Bridging Torch dialect with MHLO dialect.
  • Enabling user to convert and lower those different frontend models (PyTorch, TensorFlow, etc.) into the unified MHLO IR. This would reduce engineering effort and ease the subsequent backend works.
  • Provide more accessibility of Torch to other MLIR based projects with this extra path from Torch to MHLO

Status:

We have already implemented the whole lowering pipeline and operators conversion to covert BERT and ResNet inference. See the POC in the PR: #1025 . Welcome everyone interested being part of this effort as well.

Looking forward to any feedback!

cc @silvasean @powderluv @byronyi @Vremold

Update (07/13/2022):

According to the offline meeting between Bytedance and Alibaba. We decide to break the proposal into several PRs:

  • Integrate MHLO into TorchMLIR Repo
  • Basic conversion pipeline
  • Operator conversions
  • ResNet and BERT examples
  • End-to-end unit tests
@ZihengJiang
Copy link
Collaborator Author

I also learned that @fortianyou is working on the similar thing. Discussion and collaboration are more than welcome.

@silvasean
Copy link
Contributor

Thanks. I'm glad to see that multiple parties are interested in this. It sounds like there might be some duplicated effort, so let's try to land this soon so we can collaborate. All the code should be shared, and we will need to reconcile the lowering patterns between Bytedance and Alibaba.

For reference, Alibaba's code is here: https://github.com/alibaba/BladeDISC/blob/main/pytorch_blade/src/torch-mlir/lib/Conversion/TorchToMhlo/TorchToMhlo.cpp

nit: let's call it TorchToMhlo for consistency (instead of TorchToMHLO)

@powderluv
Copy link
Collaborator

Thank you for this effort.

@tanyokwok
Copy link
Collaborator

I also learned that @fortianyou is working on the similar thing. Discussion and collaboration are more than welcome.

That's exciting! @ZihengJiang @byronyi

@navahgar
Copy link
Collaborator

navahgar commented Jul 6, 2022

Since this proposal is suggesting MHLO as a unified IR, the ownership of MHLO dialect should be considered as well. Is it going to be MLIR-community owned?

@bhack
Copy link

bhack commented Jul 6, 2022

Since this proposal is suggesting MHLO as a unified IR, the ownership of MHLO dialect should be considered as well. Is it going to be MLIR-community owned?

There was an interesting related thread at:
https://discourse.llvm.org/t/rfc-bring-mhlo-and-lhlo-dialects-into-mlir-upstream/3186

@silvasean
Copy link
Contributor

I've invited the lead of the MHLO efforts to the meeting, who can speak about governance. In general, our bar for this can be different than upstream MLIR.

@tanyokwok
Copy link
Collaborator

It's nice that @silvasean had created the meeting invitation here: https://discourse.llvm.org/t/asia-friendly-developer-hour-mhlo-support/63625. FYI

@tanyokwok
Copy link
Collaborator

tanyokwok commented Jul 7, 2022

Since this proposal is suggesting MHLO as a unified IR, the ownership of MHLO dialect should be considered as well. Is it going to be MLIR-community owned?

There was an interesting related thread at: https://discourse.llvm.org/t/rfc-bring-mhlo-and-lhlo-dialects-into-mlir-upstream/3186

I had noticed that the main concern on upstreaming of mlir/hlo is its relation to XLA/HLO and TPU in the community. Indeed BladeDISC didn't run into trouble on this in practice. And we benefit from mlir/hlo a lot.

Also, I agree with @silvasean , that torch-mlir has a different bar than upstream MLIR. Community folks might be interested in building a new backend based on torch-mlir and mlir/hlo. I think this is in line with the vision of the torch-mlir community.

@stellaraccident
Copy link
Collaborator

There are various plans at Google to support these cases better, and Eugene can speak to some of that at tomorrow's meeting -- and get feedback from you on what would be useful.

When this came up in the past, the issue really hinged on development model disconnects and a lack of staffing to better support it in a broader construct -- no one was really ready to support it outside of the limited cases it was serving. There have been changes on both of those fronts, so hopefully this next conversation can be more productive than the previous ones.

(Disclosure: Sean and I work at Google but on a different team than MHLO -- we both have visibility into the plans and are trying to get the folks who own this part engaged here to work on it together. Hopefully that is what tomorrow's meeting will accomplish. This proposal kind of came unexpectedly, and we weren't quite ready to talk about everything yet but will do our best to be helpful)

@ZihengJiang
Copy link
Collaborator Author

Thanks for all the discussion. Looking forward to today's meeting.

FYI, here is the POC(#1025) for this RFC, in which we have implemented BERT and ResNet conversion from Torch to MHLO. We are going to refine it in next days.

cc @fortianyou @silvasean @powderluv @stellaraccident @navahgar @bhack

@silvasean
Copy link
Contributor

Thanks @ZihengJiang for posting the PR! One thing I noticed was that Alibaba has dynamic shapes as a requirement, and some of the patterns that I see in the PR you linked are tied to static shapes. So we will want to use the Alibaba patterns when they are available. For example:

https://github.com/alibaba/BladeDISC/blob/9043010a4e33f2b809a6db9933398877e39a5660/pytorch_blade/src/torch-mlir/lib/Conversion/TorchToMhlo/MhloLegalizeUtils.cpp#L220

@ZihengJiang
Copy link
Collaborator Author

ZihengJiang commented Jul 7, 2022

Hi @silvasean

Thanks for the suggestion! Dynamic shape support is a good point. We are happy to update the PR to support this or collaborate with @fortianyou to distribute those pattern tasks, if Alibaba plans to contribute their code. We can discuss those options in the meeting.

@bhack
Copy link

bhack commented Jul 8, 2022

Any meeting log or summary on this topic?

@silvasean
Copy link
Contributor

The meeting notes are in: https://discourse.llvm.org/t/asia-friendly-developer-hour-mhlo-support/63625/3

@ZihengJiang
Copy link
Collaborator Author

According to the offline meeting between Bytedance and Alibaba. We decide to break the proposal into several PRs:

  • Integrate MHLO into TorchMLIR Repo
  • Basic conversion pipeline
  • Operator conversions
  • ResNet and BERT examples

Looking forward to working together and landing this. cc @fortianyou @silvasean

@silvasean
Copy link
Contributor

I would like the initial PR to add support for the e2e test suite too, even if only one test (like tanh) passes. That way, we can see a minimal working system in a single PR to make it easier to review and make sure that all the pieces are present.

tanyokwok pushed a commit to tanyokwok/torch-mlir that referenced this issue Jul 21, 2022
See RFC: llvm#999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
tanyokwok pushed a commit to tanyokwok/torch-mlir that referenced this issue Jul 21, 2022
See RFC: llvm#999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
tanyokwok pushed a commit to tanyokwok/torch-mlir that referenced this issue Jul 21, 2022
See RFC: llvm#999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
tanyokwok pushed a commit that referenced this issue Jul 22, 2022
See RFC: #999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
tanyokwok pushed a commit to tanyokwok/torch-mlir that referenced this issue Jul 22, 2022
See RFC: llvm#999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
tanyokwok pushed a commit to tanyokwok/torch-mlir that referenced this issue Jul 22, 2022
See RFC: llvm#999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
tanyokwok pushed a commit to tanyokwok/torch-mlir that referenced this issue Jul 22, 2022
See RFC: llvm#999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
tanyokwok pushed a commit to tanyokwok/torch-mlir that referenced this issue Aug 2, 2022
See RFC llvm#999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
tanyokwok pushed a commit to tanyokwok/torch-mlir that referenced this issue Aug 2, 2022
See RFC llvm#999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
tanyokwok pushed a commit to tanyokwok/torch-mlir that referenced this issue Aug 2, 2022
See RFC llvm#999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
tanyokwok pushed a commit to tanyokwok/torch-mlir that referenced this issue Aug 2, 2022
See RFC llvm#999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
tanyokwok pushed a commit that referenced this issue Aug 3, 2022
See RFC #999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
tanyokwok pushed a commit to tanyokwok/torch-mlir that referenced this issue Aug 3, 2022
See RFC llvm#999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
tanyokwok pushed a commit to tanyokwok/torch-mlir that referenced this issue Aug 4, 2022
See RFC llvm#999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
ZihengJiang pushed a commit that referenced this issue Aug 4, 2022
See RFC #999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
Vremold added a commit that referenced this issue Aug 4, 2022
* [MHLO] Init MHLO pooling-like op conversion and remove 'op' suffix in filenames

Co-authored-by: Bairen Yi <[email protected]>
Co-authored-by: Jiawei Wu <[email protected]>
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan <[email protected]>
Co-authored-by: Ziheng Jiang <[email protected]>

See RFC #999
tanyokwok pushed a commit that referenced this issue Aug 6, 2022
* [MHLO] fix tensor mode aten.div op pattern

See RFC #999
Co-authored-by: Bairen Yi <[email protected]>
Co-authored-by: Jiawei Wu <[email protected]>
Co-authored-by: Tianyou Guo <[email protected]>
Co-authored-by: Xu Yan <[email protected]>
Co-authored-by: Ziheng Jiang <[email protected]>
tanyokwok pushed a commit that referenced this issue Aug 14, 2022
See RFC #999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
tanyokwok pushed a commit that referenced this issue Aug 15, 2022
See RFC #999

Co-authored-by: Bairen Yi [email protected]
Co-authored-by: Jiawei Wu [email protected]
Co-authored-by: Tianyou Guo [email protected]
Co-authored-by: Xu Yan [email protected]
Co-authored-by: Ziheng Jiang [email protected]
@ZihengJiang
Copy link
Collaborator Author

With the ResNet and BERT examples (from Torch to MHLO) has been merged into the main branch, we could say that we have completed the promise we made in the RFC. Thanks to everyone for making this happen! @fortianyou @Vremold @Yancey1989 @silvasean @powderluv

@silvasean
Copy link
Contributor

Amazing work folks!! I can't believe you did all this in < 2 months!

@tanyokwok
Copy link
Collaborator

@ZihengJiang Congrats!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants