-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds video reading / saving functionalities #1039
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me;
let's chat offline about wether we want to add the mentioned functionality to initial release.
torchvision/io/video.py
Outdated
return aframes[:, s_idx:e_idx] | ||
|
||
|
||
def read_video(filename, start_pts=0, end_pts=math.inf): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not a priority, but would it be possible to add functionality of reading with a stride?
So if your data is at 30fps, but the models are trained on videos at 15fps (see https://github.com/facebookresearch/VMZ), having this allows the end-user not to re-encode the data.
Having said that, I suppose, you could do the same in the dataloader by striding the tensor so maybe not crucial.
(note, I have this in the experimental repo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question.
It is currently not visible to the user what the fps for a given video is.
I see two options:
- provide the fps somewhere, either by returning it in this function or by a separate helper function
- always enforce that the videos that are returned follow a particular fps. This might imply doing some expensive postprocessing / frame interpolation maybe?
At least the temporal video data would be consistent.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the two options are not (necessarily) mutually exclusive?
-
"provide the fps somewhere, either by returning it in this function or by a separate helper function"
This would be incredibly useful as that has been the main reproducibility issue of the formerly mentioned repo -- relatively non-standard encoding. Maybe we could (following torchaudio example) haveread_video()
return(video, audio), stats
where stats would be a list/dict/tensor with video fps and audio sampling rate? It should be a part of the stream codec as VideoCodecContext.framerate. -
"always enforce that the videos that are returned follow a particular fps. This might imply doing some expensive postprocessing / frame interpolation maybe?"
This might be very slow could cause issues/artefacts within the video. We could leave that one out for future releases? Video frame sampling rate could be a relatively straightforward way to get a similar functionality that would allow some control without the need for expensive operations.
container.close() | ||
|
||
|
||
def _read_from_stream(container, start_offset, end_offset, stream, stream_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add an option to resample audio to a specific SR online?
(note, I have this in the experimental repo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question again.
I'm inclined to always return the audio at a fixed frequency (say 44kHz), so that the results are always consistent.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"I'm inclined to always return the audio at a fixed frequency"
definitely, BUT I think the exact value should be left for the user to decide?
I feel like a simple if sr != user_defined_sr call resampler
should be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that for a reading function, we should try to make it as simple as possible, and add additional transforms for resampling the audio / video if needed.
But you have great points about the stats that should be returned.
I'll modify the implementation to return a third argument, a dict with the fps etc.
Codecov Report
@@ Coverage Diff @@
## master #1039 +/- ##
==========================================
+ Coverage 63.9% 64.04% +0.13%
==========================================
Files 66 68 +2
Lines 5275 5373 +98
Branches 793 814 +21
==========================================
+ Hits 3371 3441 +70
- Misses 1673 1693 +20
- Partials 231 239 +8
Continue to review full report at Codecov.
|
Approved by @stephenyan1231 and @bjuncek |
This PR introduces functions for reading from and writing to video files, with support for audio streams.
It uses PyAV internally, and can decode pretty-much all videos that can be decoded by FFmpeg.
The functions available here currently have a very simple API.
We might extend it at later moments.
cc @bjuncek @stephenyan1231 for review