Re-encode AV Files Only when Necessary #1926
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses #1912
These changes concern the creation of digital object reference derivatives for audio and video (AV) files. The changes here are authored by myself and @yenaing-oo.
We have optimized the
ffmpeg
commands used to generate audio and video derivatives by analyzing the master files withffprobe
before encoding. Based on a file's parameters, a minimal FFmpeg command is generated to make only those changes that are required to create the reference file.For example, if only the audio needs to be re-encoded in a video file, then the video stream is copied and the audio is re-encoded to bring it in line with the expected audio encoding and sample rate.
The purpose of this change is to make importing audio and video objects quicker, especially when files are already in the correct format. If an AV file is in the right format already, it is simply
copy()
-ed, rather than processed with FFmpeg.We created a test package of audio and video files that this change can be tested against: issue-1912-test-files-import.zip
Assuming AtoM is being run in Docker, extract them to the root of the repository, and run the following to import them:
We saw an improvement in import time using the changes included in this PR:
To run the timing tests, we used this command to avoid including nested-set build time and search indexing time:
This is a modest timing improvement, but since a lot of the test files do require re-encoding we'd expect to see a greater timing improvement for files that don't require any re-encoding.