Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dtype functions for two tensor promotion ops #1831

Merged

Conversation

ramiro050
Copy link
Collaborator

This commit adds dtype functions for ops in RefineTypes under the category of "Promote the two dtypes". The only ops not added here are convolution ops, since they take an optional tensor argument, and the dtype pipeline currently does not correctly handle that case. I will add a follow up patch fixing this.

This commit also adds two helper functions that perform a very thorough testing of dtype functions. The helper function _check_two_tensor_op is able to independently test invalid input dtypes and invalid output dtypes.

Lastly, this commit also XFAILs "MobilenetV3Module_basic".

@ramiro050
Copy link
Collaborator Author

This is for #1807

@ramiro050
Copy link
Collaborator Author

@li-plus, this is the patch with the new helper functions to test the dtype functions.

@ramiro050 ramiro050 force-pushed the add-promotion-ops-2 branch from 9e80fac to daf4147 Compare January 27, 2023 17:28
This commit adds dtype functions for ops in RefineTypes under the
category of "Promote the two dtypes". The only ops not added here are
convolution ops, since they take an optional tensor argument, and the
dtype pipeline currently does not correctly handle that case. I will
add a follow up patch fixing this.

This commit also adds two helper functions that perform a very
thorough testing of dtype functions. The helper function
`_check_two_tensor_op` is able to independently test invalid input
dtypes and invalid output dtypes.

Lastly, this commit also XFAILs "MobilenetV3Module_basic".
@ramiro050 ramiro050 force-pushed the add-promotion-ops-2 branch from daf4147 to 517dc05 Compare January 27, 2023 19:10
@ramiro050
Copy link
Collaborator Author

@li-plus can you review this patch, since you're familiar with most of the changes?

@li-plus
Copy link
Collaborator

li-plus commented Feb 1, 2023

@ramiro050 LGTM, except for the float16 dtype. I think the shape function shouldn't raise an error for float16 since it is implemented on cuda.

@ramiro050
Copy link
Collaborator Author

@ramiro050 LGTM, except for the float16 dtype. I think the shape function shouldn't raise an error for float16 since it is implemented on cuda.

Because our testing framework in the CI currently runs only on CPU, we need to raise an error for f16. However, there are discussions about moving the CI testing to GPU instead. See: #1669. If that happens in the future, we can remove the f16 assert checks

@ramiro050 ramiro050 merged commit 981ac88 into llvm:dtype-functions-staging Feb 1, 2023
@ramiro050 ramiro050 deleted the add-promotion-ops-2 branch February 1, 2023 22:30
@li-plus
Copy link
Collaborator

li-plus commented Feb 2, 2023

Ok I see, but the problem is that we have lots of gpu models to convert using torch-mlir. Most of them are half()ed and full of float16 ops. Currently we are running torch-mlir with cuda-version pytorch to convert these models, and float16 can be correctly inferred by RefineTypes.cpp.

If shape functions raise at float16, all our pipelines will fail. So I was wondering if we could skip float16 checking on cpu machine in CI, and we don't need to raise an error for float16. In fact, float16 always fail on cpu, so maybe there is no point testing it.

@qingyunqu
Copy link
Collaborator

Hi, we strongly recommend to support float16 in shape infer and dtype infer. At least not to raise an error.

@ramiro050
Copy link
Collaborator Author

It turns out that if we perform the dtype checks using meta tensors, we can avoid this issue of cpu not supporting f16.

In [6]: torch.tanh(a)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 torch.tanh(a)

RuntimeError: "tanh_vml_cpu" not implemented for 'Half'

In [7]: torch.tanh(a.to("meta"))
Out[7]: tensor(..., device='meta', size=(3, 4, 5), dtype=torch.float16)

I will create a PR soon adding this change.

@qingyunqu
Copy link
Collaborator

How about the progress of pytorch's migration to meta tensor? I have commited some ops's meta migration to pytorch two years ago.
BTW, is there any discussion about implementing dynamic shape in torch-mlir? Including dynamic inputs and dynamic ops(like non_zero)?

@ramiro050
Copy link
Collaborator Author

How about the progress of pytorch's migration to meta tensor? I have commited some ops's meta migration to pytorch two years ago.

Are meta tensors not finished in PyTorch?

BTW, is there any discussion about implementing dynamic shape in torch-mlir? Including dynamic inputs and dynamic ops(like non_zero)?

What do you mean by dynamic inputs and dynamic ops? We currently support at the torch dialect level dynamic sizes for the tensors. In the case of torch.nonzero, the op should work without a problem in the case where as_tuple=False. It's just a matter of the backends to torch-mlir being able to handle the behavior.

The case of as_tuple=True is not yet supported in torch-mlir, since currently need to have a known size at compile time. As far as I know, there is currently no discussion for adding support for tuples that don't have a known size at compile time. However, if this is something that is needed, we can start such a discussion

ramiro050 added a commit to ramiro050/torch-mlir that referenced this pull request May 9, 2023
commit bafe339
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Mon May 8 21:26:56 2023 +0000

    Add dtype functions for aten.atan and prims.squeeze

commit bebf695
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Mon May 8 21:26:10 2023 +0000

    Remove duplicate code from merge with main

commit 0d11895
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Fri May 5 21:39:02 2023 +0000

    Update LLVM tag

commit 73d5c07
Merge: 899d8bc eaaaeb6
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Fri May 5 21:30:09 2023 +0000

    Merge remote-tracking branch 'upstream/main' into merge-main

commit 899d8bc
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Mon Mar 13 21:39:14 2023 +0000

    Add dtype functions for `aten.ge.Tensor` and `aten.le.Tensor`

commit f58f9c2
Merge: ce7abf4 4912c39
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Mon Mar 13 21:32:00 2023 +0000

    Merge branch 'main' into merge-main

commit ce7abf4
Author: Jiahao Li <[email protected]>
Date:   Wed Feb 22 06:54:41 2023 +0800

    Add dtype functions for ops that take dtype from 2nd operand (llvm#1891)

commit 63945a2
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Mon Feb 13 17:56:09 2023 -0800

    Change dtype functions interface to take ints tuple for each tensor (llvm#1865)

    The original design for the dtype functions outlined in
    llvm#1462 was unable to properly
    handle ops that take optional tensors as an input when the optional
    tensor has a value of None. By the time the op gets imported into
    torch-mlir, if an optional value is None, all information about the
    original type is lost from the op type signature, preventing
    torch-mlir from knowing if a value of None was from an optional tensor
    or not, which was crucial in the original design since each tensor
    argument must be turned into two separate arguments for the dtype
    function.

    This commit changes the interface to dtype functions such that each
    tensor turns into a tuple of two ints, the first representing the rank
    of the tensor and the second the dtype of the tensor. Since now there
    is a one-to-one correspondence between the operands of an op and the
    operands of its dtype function, there is no ambiguity about which
    operand of the op corresponds with which operand of the dtype
    function.

    To test the implementation, this commit defines dtype functions for
    the convolution ops, all of which take one optional tensor as an
    argument.

commit 981ac88
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Wed Feb 1 22:30:27 2023 +0000

    Add dtype functions for two tensor promotion ops (llvm#1831)

    This commit adds dtype functions for ops in RefineTypes under the
    category of "Promote the two dtypes". The only ops not added here are
    convolution ops, since they take an optional tensor argument, and the
    dtype pipeline currently does not correctly handle that case. I will
    add a follow up patch fixing this.

    This commit also adds two helper functions that perform a very
    thorough testing of dtype functions. The helper function
    `_check_two_tensor_op` is able to independently test invalid input
    dtypes and invalid output dtypes.

    Lastly, this commit also XFAILs "MobilenetV3Module_basic".

commit 83d4e89
Author: Jiahao Li <[email protected]>
Date:   Sat Jan 21 02:39:41 2023 +0800

    Add dtype functions for floating point ops (llvm#1813)

commit 8cae5ba
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Mon Jan 16 14:32:23 2023 -0800

    Add dtype functions for comparison ops (llvm#1806)

    This commit adds dtype functions for comparison ops that always return
    a tensor of dtype `i1`.

commit 5b77c15
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Mon Jan 16 20:27:49 2023 +0000

    Add CI to `dtype-functions-staging` branch

commit ac94ba2
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Thu Jan 12 22:41:04 2023 +0000

    Move dtype functions into their own section in lib gen file

    In order to easily keep track of the dtype functions that have been
    moved to `abstract_interp_lib_gen.py` and make it easier to add new
    ones, this commit groups all the dtype functions together, rather than
    having them interspersed between the shape functions.
ramiro050 added a commit to ramiro050/torch-mlir that referenced this pull request May 9, 2023
commit bafe339
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Mon May 8 21:26:56 2023 +0000

    Add dtype functions for aten.atan and prims.squeeze

commit bebf695
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Mon May 8 21:26:10 2023 +0000

    Remove duplicate code from merge with main

commit 0d11895
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Fri May 5 21:39:02 2023 +0000

    Update LLVM tag

commit 73d5c07
Merge: 899d8bc eaaaeb6
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Fri May 5 21:30:09 2023 +0000

    Merge remote-tracking branch 'upstream/main' into merge-main

commit 899d8bc
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Mon Mar 13 21:39:14 2023 +0000

    Add dtype functions for `aten.ge.Tensor` and `aten.le.Tensor`

commit f58f9c2
Merge: ce7abf4 4912c39
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Mon Mar 13 21:32:00 2023 +0000

    Merge branch 'main' into merge-main

commit ce7abf4
Author: Jiahao Li <[email protected]>
Date:   Wed Feb 22 06:54:41 2023 +0800

    Add dtype functions for ops that take dtype from 2nd operand (llvm#1891)

commit 63945a2
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Mon Feb 13 17:56:09 2023 -0800

    Change dtype functions interface to take ints tuple for each tensor (llvm#1865)

    The original design for the dtype functions outlined in
    llvm#1462 was unable to properly
    handle ops that take optional tensors as an input when the optional
    tensor has a value of None. By the time the op gets imported into
    torch-mlir, if an optional value is None, all information about the
    original type is lost from the op type signature, preventing
    torch-mlir from knowing if a value of None was from an optional tensor
    or not, which was crucial in the original design since each tensor
    argument must be turned into two separate arguments for the dtype
    function.

    This commit changes the interface to dtype functions such that each
    tensor turns into a tuple of two ints, the first representing the rank
    of the tensor and the second the dtype of the tensor. Since now there
    is a one-to-one correspondence between the operands of an op and the
    operands of its dtype function, there is no ambiguity about which
    operand of the op corresponds with which operand of the dtype
    function.

    To test the implementation, this commit defines dtype functions for
    the convolution ops, all of which take one optional tensor as an
    argument.

commit 981ac88
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Wed Feb 1 22:30:27 2023 +0000

    Add dtype functions for two tensor promotion ops (llvm#1831)

    This commit adds dtype functions for ops in RefineTypes under the
    category of "Promote the two dtypes". The only ops not added here are
    convolution ops, since they take an optional tensor argument, and the
    dtype pipeline currently does not correctly handle that case. I will
    add a follow up patch fixing this.

    This commit also adds two helper functions that perform a very
    thorough testing of dtype functions. The helper function
    `_check_two_tensor_op` is able to independently test invalid input
    dtypes and invalid output dtypes.

    Lastly, this commit also XFAILs "MobilenetV3Module_basic".

commit 83d4e89
Author: Jiahao Li <[email protected]>
Date:   Sat Jan 21 02:39:41 2023 +0800

    Add dtype functions for floating point ops (llvm#1813)

commit 8cae5ba
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Mon Jan 16 14:32:23 2023 -0800

    Add dtype functions for comparison ops (llvm#1806)

    This commit adds dtype functions for comparison ops that always return
    a tensor of dtype `i1`.

commit 5b77c15
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Mon Jan 16 20:27:49 2023 +0000

    Add CI to `dtype-functions-staging` branch

commit ac94ba2
Author: Ramiro Leal-Cavazos <[email protected]>
Date:   Thu Jan 12 22:41:04 2023 +0000

    Move dtype functions into their own section in lib gen file

    In order to easily keep track of the dtype functions that have been
    moved to `abstract_interp_lib_gen.py` and make it easier to add new
    ones, this commit groups all the dtype functions together, rather than
    having them interspersed between the shape functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants