Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow creation of pseudo devices for testing purposes #61654

Open
avitase opened this issue Jul 14, 2021 · 12 comments
Open

Allow creation of pseudo devices for testing purposes #61654

avitase opened this issue Jul 14, 2021 · 12 comments
Labels
good first issue module: internals Related to internal abstractions in c10 and ATen triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@avitase
Copy link

avitase commented Jul 14, 2021

🚀 Feature

Add a pseudo device cpu_testing for testing purposes.

Motivation

For machines with only a CPU there is currently no straight-forward way to test if torch.device is set properly for all tensors since cpu is the default. Having a dummy device, e.g., named cpu_testing which internally falls back to the default device but throws if two tensors with different devices are combined would allow the creation of unit tests.

Pitch

It happens that some machines purely used for CI don't have any fancy devices other than CPU. Since CPU is the default device errors frequently occur in production where the model missed to set tensors to, e.g., GPU and this case cannot be covered in a unit test. Creating a pseudo device which also works for machines with only CPUs could help to change this defect.

cc @ezyang @bhosmer @smessmer @ljk53 @bdhirsh

@zou3519 zou3519 added module: internals Related to internal abstractions in c10 and ATen triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jul 14, 2021
@bdhirsh
Copy link
Contributor

bdhirsh commented Jul 14, 2021

It sounds like the meta device might fit your use case? With the snag being that it doesn't work for all operators yet (although many of the most common ops have been ported already).

>>> a = torch.ones(2) # defaults to cpu
>>> b = torch.ones(2, device='meta')
>>> c = a + b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, meta and cpu!

@avitase
Copy link
Author

avitase commented Jul 14, 2021

That's neat. Wasn't aware of this. Thanks for your help!

@bdhirsh could you explain what the purpose of the meta device was originally?

@avitase avitase closed this as completed Jul 14, 2021
@bdhirsh
Copy link
Contributor

bdhirsh commented Jul 14, 2021

the meta API lets you run dtype and shape inference. For example:

>>> a = torch.ones(2, device='meta', dtype=torch.float64)
>>> b = torch.ones((1, 2), device='meta', dtype=torch.float32)
>>> a + b
tensor(..., device='meta', size=(1, 2), dtype=torch.float64)

The above involved type promotion (to promote the output type to float64), and broadcasting (to broadcast the output shape to (1,2), but didn't allocate any storage and didn't actually "add" anything.

Tensors on the meta device don't have any storage, and calling operators on meta tensors doesn't involve any compute - it only involves input error checking + running the "meta" computation.

The original RFC is here if you're interested: https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md#how-to-get-involved-with-structured-kernels. "Structured Kernels" are a new way of authoring kernels internally, that require you separate your "meta" computation from your core kernel logic. Which is what allows us to create a meta API.

@avitase avitase reopened this Jul 14, 2021
@avitase
Copy link
Author

avitase commented Jul 14, 2021

Thanks @bdhirsh, this looks indeed promising but being able to test the device and the value at the same time could also be handy. I therefore leave this feature request open but I will definitely give meta a try!

@ezyang
Copy link
Contributor

ezyang commented Jul 14, 2021

I think there are some slight semantics differences that would make meta more useful for your use case here, so definitely OK to keep this open.

@ikamensh
Copy link
Contributor

For people waiting for this feature and using MacOS, you can use "mps" device (metal performance shaders). It turns out to be available even on macs without M1+ hardware, e.g. I have intel macbook with AMD gpu.

@njzjz
Copy link
Contributor

njzjz commented Feb 8, 2024

For people waiting for this feature and using MacOS, you can use "mps" device (metal performance shaders). It turns out to be available even on macs without M1+ hardware, e.g. I have intel macbook with AMD gpu.

When I tried it, I found mps doesn't support float64, though.

@chanind
Copy link

chanind commented Feb 15, 2024

This would be very helpful, since tests running in CI (e.g. Github Actions) usually don't have a GPU and this allows bugs related to mismatched tensors to slip through into production. The meta device doesn't work for anything involving actual calculation, which any integration test will necessarily run, and mps is only available on macs which are typically not what you're using for CI.

@ezyang
Copy link
Contributor

ezyang commented Feb 19, 2024

It might be relatively simple to do this with privateuse1. The general idea would be to look at the sample code for how to add a new custom device type that tests privateuse1 right now, but then actually just implement a single fallback kernel for everything that just redispatches to cpu.

@ezyang
Copy link
Contributor

ezyang commented Feb 19, 2024

cc @albanD

@albanD
Copy link
Collaborator

albanD commented Feb 21, 2024

Ho yes we can definitely do that.
You can actually see the current testing we have for PrivateUse1 is doing part of this:

I would agree with Ed that we can make this a fully fledged extension that has fallback to CPU implementation if we need a version you can run actual compute tests against.

@chanind
Copy link

chanind commented Oct 18, 2024

Is it possible to share an example of how to use PrivateUse1 to create a fake cpu device which can be used to validate that devices are set correctly in tests? I keep deploying broken code that passes tests in CI but fails on real devices because I can't find a way to test this in CI.

I tried the following from the pytorch test shared by @albanD, but it gives errors:

import torch
from typing import Union

class DummyModule:

    @staticmethod
    def device_count() -> int:
        return 1

    @staticmethod
    def get_rng_state(device: Union[int, str, torch.device] = "foo") -> torch.Tensor:
        # create a tensor using our custom device object.
        return torch.empty(4, 4, device="foo")

    @staticmethod
    def set_rng_state(
        new_state: torch.Tensor, device: Union[int, str, torch.device] = "foo"
    ) -> None:
        pass

    @staticmethod
    def is_available():
        return True

    @staticmethod
    def current_device():
        return 0


torch.utils.rename_privateuse1_backend("foo")
torch._register_device_module("foo", DummyModule)

x = torch.empty(4, 4, device="foo")

This returns the error:

NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'foo' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build).

I've tried registering the actual torch.cpu device as privateuse1 as well, but this still gives the same error. Is there an example somewhere that demonstrates how to do this for tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue module: internals Related to internal abstractions in c10 and ATen triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

8 participants