Fix v2 transforms in spawn mp context #8067

NicolasHug · 2023-10-25T15:56:27Z

Fixes #8066

cc @vfdev-5

pytorch-bot · 2023-10-25T15:56:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8067

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 30 Unrelated Failures

As of commit 33dc494 with merge base 96d2ce9 ():

NEW FAILURES - The following jobs have failed:

CMake / windows (windows.g5.4xlarge.nvidia.gpu, cuda, 11.8) / windows-job (gh)
Tests / unittests-windows (3.8, windows.g5.4xlarge.nvidia.gpu, cuda, 11.8) / windows-job (gh)

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

NicolasHug · 2023-10-25T15:58:32Z

test/datasets_utils.py

+        if isinstance(dataset, VOCDetection):
+            assert wrapped_sample[0][0].size == (321, 123)


the point of this test is just to show that the fix works. It obviously won't stay as-is.

We should really add transforms to this test though. @pmeier Do you have any suggestion on how you'd prefer to do that? Otherwise, I'll kinda just do something similar for all datasets, i.e. just check that the images in wrapped_sample are of a specific expected size.

NicolasHug · 2023-10-25T16:04:40Z

torchvision/tv_tensors/_dataset_wrapper.py

+        dataset.transform = self.transform
+        dataset.transforms = self.transforms
+        dataset.target_transform = self.target_transform
+        return wrap_dataset_for_transforms_v2, (dataset, self._target_keys)


TBH, I don't really understand why we needed to have __reduce__ in the first place. I understand it's needed to support pickle, but my understanding stops here.

The whole logic seems really strange, i.e. calling yet again wrap_dataset_for_transforms_v2(), which is how the instance-being-pickled was created in the first place anyway. And now on top of that we add the "field resetting logic" i.e. we just undo what we did in __init__.

I understand it's needed to support pickle, but my understanding stops here.

@pmeier can you remind me the details of this? Do you think we could support pickleability of these datasets in a different way that would require this "fix"?

pickle does not support dynamic types like we do in

vision/torchvision/tv_tensors/_dataset_wrapper.py

Line 104 in 3fb88b3

wrapped_dataset_cls = type(f"Wrapped{type(dataset).__name__}", (VisionDatasetTVTensorWrapper, type(dataset)), {})

by default. Thus, we implement the __reduce__ method and tell it how to construct the object by its parts. Now only the items on L212 need to be pickled. While unpickling, the dynamic type is created anew and thus we circumvent the issue.

We have the dynamic type in the first place to support isinstance checks. There was also a different option in #7239 (comment) that would work without a dynamic type. I think that could potentially work without your fix, but I would need to test. However, this option also has its drawbacks (see discussion in the original PR).

NicolasHug · 2023-10-27T10:37:26Z

torchvision/tv_tensors/_dataset_wrapper.py

@@ -198,8 +199,19 @@ def __getitem__(self, idx):
    def __len__(self):
        return len(self._dataset)

+    # TODO: maybe we should use __getstate__ and __setstate__ instead of __reduce__, as recommended in the docs.


See just above this link

We can try, but I think it is not possible. The "state" in __get_state__ and __set_state__ is the second return value of __reduce__ below. And while we can recreate a VisionDatasetTVTensorWrapper from them, pickle does not know how to create the dynamic type. I'll give it a shot.

NicolasHug · 2023-10-27T10:39:02Z

test/test_datasets.py

@@ -1005,9 +1014,11 @@ def inject_fake_data(self, tmpdir, config):
            )
        return num_videos_per_class * len(classes)

+    @pytest.mark.xfail(reason="FIXME")


This fails, I think it's because Kinetics doesn't convert its transform attribute into a transforms attribute, but I haven't double-checked. If that's OK I'd like to merge this PR right now to get it behind us, and investigate that separately just after.

OK with investigating later. Will have a look.

pmeier

Thanks!

pmeier · 2023-10-27T10:54:00Z

test/datasets_utils.py

+    # https://github.com/pytorch/vision/issues/8066
+    # Implicitly, this also checks that the wrapped datasets are pickleable.
+
+    # To save CI/test time, we only check on macOS where "spawn" is the default


Spawn is also the default on Windows. Since macOS CI is by far the costliest, should we just use Windows here?

pmeier · 2023-10-27T10:56:27Z

test/test_datasets.py

@@ -1005,9 +1014,11 @@ def inject_fake_data(self, tmpdir, config):
            )
        return num_videos_per_class * len(classes)

+    @pytest.mark.xfail(reason="FIXME")


OK with investigating later. Will have a look.

pmeier · 2023-10-27T11:00:19Z

torchvision/tv_tensors/_dataset_wrapper.py

@@ -198,8 +199,19 @@ def __getitem__(self, idx):
    def __len__(self):
        return len(self._dataset)

+    # TODO: maybe we should use __getstate__ and __setstate__ instead of __reduce__, as recommended in the docs.


We can try, but I think it is not possible. The "state" in __get_state__ and __set_state__ is the second return value of __reduce__ below. And while we can recreate a VisionDatasetTVTensorWrapper from them, pickle does not know how to create the dynamic type. I'll give it a shot.

NicolasHug · 2023-10-27T12:31:35Z

Thanks for the review! Failure seem to be unrelated.

There was

FAILED test/test_datasets.py::FlyingThings3DTestCase::test_str_smoke - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmp3_12x8ek\\FlyingThings3D\\optical_flow\\TRAIN\\B\\0001\\into_future\\left'
= 1 failed, 21433 passed, 16912 skipped, 2 xfailed, 231 warnings in 2584.93s (0:43:04) =

on the 3.10 windows job but this dataset isn't touched by this PR, and the other windows job are executing the new test and running fine. Let's hope it's just a one-off, I'll merge

Reviewed By: vmoens Differential Revision: D50789087 fbshipit-source-id: 4feccf22ac58fa8b7028dd64c85dd247800e466d

Fix v2 transforms in spawn mp context

ad22dca

NicolasHug added the bug label Oct 25, 2023

facebook-github-bot added the cla signed label Oct 25, 2023

NicolasHug requested a review from pmeier October 25, 2023 15:56

NicolasHug commented Oct 25, 2023

View reviewed changes

NicolasHug added the module: transforms label Oct 25, 2023

NicolasHug added 4 commits October 27, 2023 11:34

Add tests

9e584c7

typo

5132eab

typo

52f05fb

Add TODO

b8e2e1b

NicolasHug commented Oct 27, 2023

View reviewed changes

pmeier approved these changes Oct 27, 2023

View reviewed changes

Test on Windows instead

e948758

NicolasHug mentioned this pull request Oct 27, 2023

[v0.16.1] bugfix Release Tracker #8039

Closed

Merge branch 'main' into fix_spawn

33dc494

NicolasHug merged commit b80bdb7 into pytorch:main Oct 27, 2023
2 checks passed

NicolasHug deleted the fix_spawn branch October 27, 2023 12:31

NicolasHug added a commit to NicolasHug/vision that referenced this pull request Oct 27, 2023

Fix v2 transforms in spawn mp context (pytorch#8067)

65b6289

NicolasHug mentioned this pull request Oct 27, 2023

[Cherry-pick] Fix v2 transforms in spawn mp context (#8067) #8074

Merged

NicolasHug added a commit that referenced this pull request Oct 28, 2023

[Cherry-pick] Fix v2 transforms in spawn mp context (#8067) (#8074)

fdea156

facebook-github-bot pushed a commit that referenced this pull request Nov 14, 2023

[fbsync] Fix v2 transforms in spawn mp context (#8067)

a665c78

Reviewed By: vmoens Differential Revision: D50789087 fbshipit-source-id: 4feccf22ac58fa8b7028dd64c85dd247800e466d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix v2 transforms in spawn mp context #8067

Fix v2 transforms in spawn mp context #8067

NicolasHug commented Oct 25, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Oct 25, 2023 •

edited

Loading

NicolasHug Oct 25, 2023

NicolasHug Oct 25, 2023

pmeier Oct 26, 2023

NicolasHug Oct 27, 2023

pmeier Oct 27, 2023

NicolasHug Oct 27, 2023

pmeier Oct 27, 2023

pmeier left a comment

pmeier Oct 27, 2023

pmeier Oct 27, 2023

pmeier Oct 27, 2023

NicolasHug commented Oct 27, 2023 •

edited

Loading

		if isinstance(dataset, VOCDetection):
		assert wrapped_sample[0][0].size == (321, 123)

Fix v2 transforms in spawn mp context #8067

Fix v2 transforms in spawn mp context #8067

Conversation

NicolasHug commented Oct 25, 2023 • edited by pytorch-bot bot Loading

pytorch-bot bot commented Oct 25, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8067

❌ 2 New Failures, 30 Unrelated Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmeier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug commented Oct 27, 2023 • edited Loading

NicolasHug commented Oct 25, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Oct 25, 2023 •

edited

Loading

NicolasHug commented Oct 27, 2023 •

edited

Loading