-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Video transforms #1353
Video transforms #1353
Conversation
88ea2c6
to
717aeba
Compare
Codecov Report
@@ Coverage Diff @@
## master #1353 +/- ##
==========================================
+ Coverage 65.47% 65.98% +0.51%
==========================================
Files 75 77 +2
Lines 5827 5932 +105
Branches 892 900 +8
==========================================
+ Hits 3815 3914 +99
- Misses 1742 1746 +4
- Partials 270 272 +2
Continue to review full report at Codecov.
|
torchvision/transforms/transforms.py
Outdated
@@ -434,17 +434,17 @@ def __init__(self, size, padding=None, pad_if_needed=False, fill=0, padding_mode | |||
self.padding_mode = padding_mode | |||
|
|||
@staticmethod | |||
def get_params(img, output_size): | |||
def get_params(w, h, output_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some reservations with respect to changing the API of the existing transforms, but I wonder how often this particular one is used externally.
Should we issue a warning maybe (cc @fmassa)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, we should not be doing a BC-breaking change here. There are ways of achieving the same thing without breaking BC, see for example https://github.com/pytorch/vision/pull/1104/files#diff-fc1f220b470714d05cf3ea6acf9fed59R34
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One high-level reservation that i have is the fact that @fmassa et al were looking into introducing batched tensors, which would render this unnecessary, but I don't know what is the status on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR Zhicheng!
I'm thinking about a way of unifying the video and image cases. I'll come back with a proposal in the next day or so
torchvision/transforms/transforms.py
Outdated
@@ -434,17 +434,17 @@ def __init__(self, size, padding=None, pad_if_needed=False, fill=0, padding_mode | |||
self.padding_mode = padding_mode | |||
|
|||
@staticmethod | |||
def get_params(img, output_size): | |||
def get_params(w, h, output_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, we should not be doing a BC-breaking change here. There are ways of achieving the same thing without breaking BC, see for example https://github.com/pytorch/vision/pull/1104/files#diff-fc1f220b470714d05cf3ea6acf9fed59R34
_is_tensor_video_clip(clip) | ||
if not clip.dtype == torch.uint8: | ||
raise TypeError("clip tensor should have data type uint8. Got %s" % str(clip.dtype)) | ||
return clip.float().permute(3, 0, 1, 2) / 255.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll be using memory_format
in the data reading functionality, so that this permutation is maybe handled automatically for us, in a safer way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I'm also thinking about creating a new transform for performing image type conversions, like https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/image/convert_image_dtype , which would let us perform the scaling for different dtypes
I will be merging this PR as is for now to unblock @stephenyan1231 , but I'll be making changes to how things are structured in a follow-up PR. |
Summary: Pull Request resolved: #62 Current dependency torchvision 0.4.0 was released in August. It missed quite a few PRs that are merged after that, and that are needed for video classification, such as - pytorch/vision#1437 - pytorch/vision#1431 - pytorch/vision#1423 - pytorch/vision#1418 - pytorch/vision#1408 - pytorch/vision#1376 - pytorch/vision#1363 - pytorch/vision#1353 - pytorch/vision#1303 This will fail the CI test when a diff uses changes made in those PRs. Before a new official version of TorchVision is released, we can temporarily use the nightly torchvision to get all the recent PRs, and unblock the PR merging. We plan to use a fixed version of TorchVision later. Reviewed By: vreis Differential Revision: D17944239 fbshipit-source-id: 86ff540e3fc4f08ef767e84ef103525db5158201
* video transforms * [video transforms]in ToTensorVideo, divide value by 255.0 * [video transforms] fix a bug * fix linting * Make changes backwards-compatible
Are these documented? |
I suppose that not yet but they will be :) #1429 |
@fepegar exactly, the video transforms will probably be unified with the image transforms, so that you can seamlessly use the same transform for both data types. |
Hey @fmassa, any update on the unification and doc updation for video transform? |
@pulkitkumar95 unification is happening, but a bit slower than initially planned. See #2282 for the approach we will be tackling. |
This PR replaces #1306 because the commit history of that one is polluted.
New features
Implement the following transforms for video clips
Unit test