Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-encode AV Files Only when Necessary #1926

Open
wants to merge 8 commits into
base: qa/2.x
Choose a base branch
from

Conversation

danloveg
Copy link

Addresses #1912

These changes concern the creation of digital object reference derivatives for audio and video (AV) files. The changes here are authored by myself and @yenaing-oo.

We have optimized the ffmpeg commands used to generate audio and video derivatives by analyzing the master files with ffprobe before encoding. Based on a file's parameters, a minimal FFmpeg command is generated to make only those changes that are required to create the reference file.

For example, if only the audio needs to be re-encoded in a video file, then the video stream is copied and the audio is re-encoded to bring it in line with the expected audio encoding and sample rate.

The purpose of this change is to make importing audio and video objects quicker, especially when files are already in the correct format. If an AV file is in the right format already, it is simply copy()-ed, rather than processed with FFmpeg.


We created a test package of audio and video files that this change can be tested against: issue-1912-test-files-import.zip

Assuming AtoM is being run in Docker, extract them to the root of the repository, and run the following to import them:

php symfony csv:import issue-1912-test-files-import.csv --index

We saw an improvement in import time using the changes included in this PR:

  • With the new changes, importing the test package took on average 11.39s (over three runs)
  • Without these changes, importing the test package took on average 15.05s (over three runs)

To run the timing tests, we used this command to avoid including nested-set build time and search indexing time:

php symfony cc && time php symfony csv:import issue-1912-test-files-import.csv --skip-nested-set-build

This is a modest timing improvement, but since a lot of the test files do require re-encoding we'd expect to see a greater timing improvement for files that don't require any re-encoding.

danloveg and others added 7 commits February 7, 2025 11:43
* Add docker settings to run on Mac

* Build command to efficiently encode video file

* Revert "Add docker settings to run on Mac"

This reverts commit c0fe79c.

* Use boolean variables, check fasttracking

* Add separate case to fix container

* Rename method checking if file is in right mp4 format

* Fix wrong variable being used in conditions

* Fix wrong path being used for copy

* Use ffmpeg to perform copy instead of PHP copy to allow fasttracking

* Fix missing space

* Remove redundant method to check MP4 format

* Add case for simple copy if no changes to file is required

* WIP: Test new methods in QubitDigitalObject

* Extract method to generate ffmpeg command

* Broken: change methods to non-static, mock functions in test class

* Revert "Broken: change methods to non-static, mock functions in test class"

This reverts commit 187600c.

* Revert "Extract method to generate ffmpeg command"

This reverts commit 47a140c.

* Revert "WIP: Test new methods in QubitDigitalObject"

This reverts commit fdb2c41.

* Exit early upon finding moov or mdat atom, remove status code check

* Make code cleaner

* Swap ffmpeg with ffprobe for checking faststart

* Remove -i flag, add back check for status code

* Remove error logs
* Make audio processing more efficient, fix variable casing

* Apply suggestion

Co-authored-by: Daniel Lovegrove <[email protected]>

* Remove error_logs

---------

Co-authored-by: Daniel Lovegrove <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants