Adds video reading / saving functionalities #1039

fmassa · 2019-06-21T15:36:49Z

This PR introduces functions for reading from and writing to video files, with support for audio streams.

It uses PyAV internally, and can decode pretty-much all videos that can be decoded by FFmpeg.

The functions available here currently have a very simple API.
We might extend it at later moments.

cc @bjuncek @stephenyan1231 for review

bjuncek

Looks good to me;
let's chat offline about wether we want to add the mentioned functionality to initial release.

test/test_io.py

bjuncek · 2019-06-21T16:19:40Z

torchvision/io/video.py

+    return aframes[:, s_idx:e_idx]
+
+
+def read_video(filename, start_pts=0, end_pts=math.inf):


Probably not a priority, but would it be possible to add functionality of reading with a stride?

So if your data is at 30fps, but the models are trained on videos at 15fps (see https://github.com/facebookresearch/VMZ), having this allows the end-user not to re-encode the data.

Having said that, I suppose, you could do the same in the dataloader by striding the tensor so maybe not crucial.

(note, I have this in the experimental repo)

Great question.

It is currently not visible to the user what the fps for a given video is.

I see two options:

provide the fps somewhere, either by returning it in this function or by a separate helper function

always enforce that the videos that are returned follow a particular fps. This might imply doing some expensive postprocessing / frame interpolation maybe?

At least the temporal video data would be consistent.

Thoughts?

I feel like the two options are not (necessarily) mutually exclusive?

"provide the fps somewhere, either by returning it in this function or by a separate helper function"
This would be incredibly useful as that has been the main reproducibility issue of the formerly mentioned repo -- relatively non-standard encoding. Maybe we could (following torchaudio example) have read_video() return (video, audio), stats where stats would be a list/dict/tensor with video fps and audio sampling rate? It should be a part of the stream codec as VideoCodecContext.framerate.

"always enforce that the videos that are returned follow a particular fps. This might imply doing some expensive postprocessing / frame interpolation maybe?"
This might be very slow could cause issues/artefacts within the video. We could leave that one out for future releases? Video frame sampling rate could be a relatively straightforward way to get a similar functionality that would allow some control without the need for expensive operations.

bjuncek · 2019-06-21T16:23:50Z

torchvision/io/video.py

+    container.close()
+
+
+def _read_from_stream(container, start_offset, end_offset, stream, stream_name):


Do we want to add an option to resample audio to a specific SR online?

(note, I have this in the experimental repo)

Great question again.

I'm inclined to always return the audio at a fixed frequency (say 44kHz), so that the results are always consistent.

Thoughts?

"I'm inclined to always return the audio at a fixed frequency"

definitely, BUT I think the exact value should be left for the user to decide?
I feel like a simple if sr != user_defined_sr call resampler should be enough?

I think that for a reading function, we should try to make it as simple as possible, and add additional transforms for resampling the audio / video if needed.

But you have great points about the stats that should be returned.

I'll modify the implementation to return a third argument, a dict with the fps etc.

codecov-io · 2019-06-26T14:03:22Z

Codecov Report

Merging #1039 into master will increase coverage by 0.13%.
The diff coverage is 71.27%.

@@            Coverage Diff             @@
##           master    #1039      +/-   ##
==========================================
+ Coverage    63.9%   64.04%   +0.13%     
==========================================
  Files          66       68       +2     
  Lines        5275     5373      +98     
  Branches      793      814      +21     
==========================================
+ Hits         3371     3441      +70     
- Misses       1673     1693      +20     
- Partials      231      239       +8

Impacted Files	Coverage Δ
torchvision/__init__.py	`70.58% <100%> (+1.83%)`	⬆️
torchvision/io/__init__.py	`100% <100%> (ø)`
torchvision/io/video.py	`70.32% <70.32%> (ø)`
torchvision/transforms/transforms.py	`82.12% <0%> (-0.61%)`	⬇️
torchvision/models/resnet.py	`88.27% <0%> (+0.45%)`	⬆️
torchvision/datasets/cityscapes.py	`20.58% <0%> (+0.58%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e42e80...38596b0. Read the comment docs.

fmassa · 2019-07-02T09:40:48Z

Approved by @stephenyan1231 and @bjuncek

bjuncek reviewed Jun 21, 2019

View reviewed changes

fmassa mentioned this pull request Jun 24, 2019

Update CI to Python 3.6 #1044

Merged

fmassa added 12 commits June 24, 2019 16:30

WIP

cf5d0a0

WIP

d71c919

Add some documentation

94ac03c

Improve tests and add GC collection

fd0fd47

[WIP] add timestamp getter

3ac0ead

Bugfixes

3d8c4b9

Improvements and travis

fc6cf38

Add audio fine-grained alignment

aad8910

More doc

159cef4

Remove unecessary file

0b1d703

Remove comment

657eb01

Lazy import av

30ce403

fmassa force-pushed the video-reader branch from 3d30dd3 to 30ce403 Compare June 24, 2019 14:30

fmassa added 3 commits June 24, 2019 17:03

Remove hard-coded constants for the test

6d4bad4

Return info stats from read

1e1c7e1

Fix for Python-2

38596b0

fmassa mentioned this pull request Jun 28, 2019

[WIP] Video dataset functionalities fmassa/vision-1#1

Open

fmassa merged commit d293c4c into pytorch:master Jul 2, 2019

fmassa mentioned this pull request Jul 2, 2019

To extend torchvision for video #855

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds video reading / saving functionalities #1039

Adds video reading / saving functionalities #1039

fmassa commented Jun 21, 2019

bjuncek left a comment

bjuncek Jun 21, 2019 •

edited

Loading

fmassa Jun 24, 2019

bjuncek Jun 24, 2019

bjuncek Jun 21, 2019 •

edited

Loading

fmassa Jun 24, 2019

bjuncek Jun 24, 2019

fmassa Jun 26, 2019

codecov-io commented Jun 26, 2019 •

edited

Loading

fmassa commented Jul 2, 2019

		return aframes[:, s_idx:e_idx]


		def read_video(filename, start_pts=0, end_pts=math.inf):

		container.close()


		def _read_from_stream(container, start_offset, end_offset, stream, stream_name):

Adds video reading / saving functionalities #1039

Adds video reading / saving functionalities #1039

Conversation

fmassa commented Jun 21, 2019

bjuncek left a comment

Choose a reason for hiding this comment

bjuncek Jun 21, 2019 • edited Loading

Choose a reason for hiding this comment

fmassa Jun 24, 2019

Choose a reason for hiding this comment

bjuncek Jun 24, 2019

Choose a reason for hiding this comment

bjuncek Jun 21, 2019 • edited Loading

Choose a reason for hiding this comment

fmassa Jun 24, 2019

Choose a reason for hiding this comment

bjuncek Jun 24, 2019

Choose a reason for hiding this comment

fmassa Jun 26, 2019

Choose a reason for hiding this comment

codecov-io commented Jun 26, 2019 • edited Loading

Codecov Report

fmassa commented Jul 2, 2019

bjuncek Jun 21, 2019 •

edited

Loading

bjuncek Jun 21, 2019 •

edited

Loading

codecov-io commented Jun 26, 2019 •

edited

Loading