Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some ops that take f16 tensor inputs require GPU to run in e2e tests #1669

Open
ramiro050 opened this issue Dec 1, 2022 · 4 comments
Open

Comments

@ramiro050
Copy link
Collaborator

There are some ops in PyTorch that don't have a CPU implementation for f16 inputs. For example:

In [15]: a = torch.rand((2,4,3))

In [16]: torch.ops.aten.native_layer_norm(a.to(torch.float16), [2,4,3], a.to(torch.float64), a
    ...: .to(torch.float64), 0.0)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [16], in <cell line: 1>()
----> 1 torch.ops.aten.native_layer_norm(a.to(torch.float16), [2,4,3], a.to(torch.float64), a.to(torch.float64), 0.0)

File ~/torch-mlir/venv-torch-mlir/lib/python3.10/site-packages/torch/_ops.py:446, in OpOverloadPacket.__call__(self, *args, **kwargs)
    441 def __call__(self, *args, **kwargs):
    442     # overloading __call__ to ensure torch.ops.foo.bar()
    443     # is still callable from JIT
    444     # We save the function ptr as the `op` attribute on
    445     # OpOverloadPacket to access it here.
--> 446     return self._op(*args, **kwargs or {})

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Because currently the CIs only use CPUs, there is no way of testing f16 support of these ops e2e. We should have a CI that has access to a GPU and add support to the e2e testing library for specifying device in order to ensure correctness of the f16 implementations.

@ramiro050
Copy link
Collaborator Author

Copying over @powderluv's message for reference:

yes easy to add a GPU builder/tester. However there is logistics / cost with standing it up since it has to be done by someone with admin access to the LLVM project. We can find a gpu VM for the purpose.

However -- another option would be to mark the GPU tests experimental and not run as part of the CI but available for anyone to try locally.

@ramiro050
Copy link
Collaborator Author

@joker-eph, do you know who we can talk to to get access to a GPU in the CI?

@powderluv
Copy link
Collaborator

I think we can get an VM with GPUs on Google Cloud, but the LLVM org has to add a self-hosted runner.

@powderluv
Copy link
Collaborator

The gpu builder is online as a100. I will try to move the current builds / tests to it on a test branch. @ashay fyi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants