Some ops that take f16 tensor inputs require GPU to run in e2e tests #1669

ramiro050 · 2022-12-01T19:24:26Z

There are some ops in PyTorch that don't have a CPU implementation for f16 inputs. For example:

In [15]: a = torch.rand((2,4,3))

In [16]: torch.ops.aten.native_layer_norm(a.to(torch.float16), [2,4,3], a.to(torch.float64), a
    ...: .to(torch.float64), 0.0)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [16], in <cell line: 1>()
----> 1 torch.ops.aten.native_layer_norm(a.to(torch.float16), [2,4,3], a.to(torch.float64), a.to(torch.float64), 0.0)

File ~/torch-mlir/venv-torch-mlir/lib/python3.10/site-packages/torch/_ops.py:446, in OpOverloadPacket.__call__(self, *args, **kwargs)
    441 def __call__(self, *args, **kwargs):
    442     # overloading __call__ to ensure torch.ops.foo.bar()
    443     # is still callable from JIT
    444     # We save the function ptr as the `op` attribute on
    445     # OpOverloadPacket to access it here.
--> 446     return self._op(*args, **kwargs or {})

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Because currently the CIs only use CPUs, there is no way of testing f16 support of these ops e2e. We should have a CI that has access to a GPU and add support to the e2e testing library for specifying device in order to ensure correctness of the f16 implementations.

The text was updated successfully, but these errors were encountered:

ramiro050 · 2022-12-07T19:01:45Z

Copying over @powderluv's message for reference:

yes easy to add a GPU builder/tester. However there is logistics / cost with standing it up since it has to be done by someone with admin access to the LLVM project. We can find a gpu VM for the purpose.

However -- another option would be to mark the GPU tests experimental and not run as part of the CI but available for anyone to try locally.

ramiro050 · 2022-12-07T19:02:48Z

@joker-eph, do you know who we can talk to to get access to a GPU in the CI?

powderluv · 2022-12-07T19:06:20Z

I think we can get an VM with GPUs on Google Cloud, but the LLVM org has to add a self-hosted runner.

powderluv · 2022-12-14T19:52:06Z

The gpu builder is online as a100. I will try to move the current builds / tests to it on a test branch. @ashay fyi

ramiro050 mentioned this issue Dec 1, 2022

Native layer norm gpu kernel conflicts return dtypes. #1623

Open

ramiro050 mentioned this issue Feb 1, 2023

Add dtype functions for two tensor promotion ops #1831

Merged

ramiro050 mentioned this issue Mar 29, 2023

Remove convolution_overrideable, convolution_backward_overrideable #1984

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some ops that take f16 tensor inputs require GPU to run in e2e tests #1669

Some ops that take f16 tensor inputs require GPU to run in e2e tests #1669

ramiro050 commented Dec 1, 2022

ramiro050 commented Dec 7, 2022

ramiro050 commented Dec 7, 2022

powderluv commented Dec 7, 2022

powderluv commented Dec 14, 2022

Some ops that take f16 tensor inputs require GPU to run in e2e tests #1669

Some ops that take f16 tensor inputs require GPU to run in e2e tests #1669

Comments

ramiro050 commented Dec 1, 2022

ramiro050 commented Dec 7, 2022

ramiro050 commented Dec 7, 2022

powderluv commented Dec 7, 2022

powderluv commented Dec 14, 2022