-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for HW-accelerated video encoding. #125
Conversation
Signed-off-by: Martin Pecka <[email protected]>
I've also added an integration test that decodes-encodes-decodes. By default, it runs without HW acceleration, so no code from HWEncoder is tested, but if you know you're building on a machine with a GPU, the acceleration tests can be enabled via the environment variables. |
Windows HW acceleration was tested with this build of ffmpeg: https://www.gyan.dev/ffmpeg/builds/packages/ffmpeg-4.3.1-2020-10-28-full_build-shared.7z . |
Outch, segfault on Mac even without HW acceleration. Unfortunately, I don't have access to any to debug... |
Okay, after misusing the buildfarm a bit, I got it, swscale needs aligned data on Mac. But I'm done for today, so I'll apply it properly tomorrow. But thanks to this Mac-only bug, I also found the reason why decoders and encoders shortened the videos - the implementations of NextFrame() and Encode() do not "finish" the operation by entering draining modes of the respective codecs. I've already got a fix for the decoder part, and it nicely reads the exact same number of frames as ffmpeg. Encoder should be analogic, but it will require the last drained frames to be actually written from SaveToFile(). I'll add these fixes to this PR, too. |
Okay, here it goes. Mac segfaults are fixed (by defaulting to using 32-byte-aligned frames everywhere except raw input/output buffers). This should be potentially beneficial for other systems with e.g. custom builds of libav which might use more optimized codecs that require alignment. Draining is implemented for both encoder and decoder. Decoding now doesn't lose a single frame (it reads the same number of frames as ffprobe reports), encoding loses exactly one frame (tested on two video files) compared to 10-20 frames lost without this fix. I figured out that all changes proposed in #114 were needed here, too, so I closed #114. I also transplanted commit d2ce032 from #115 (which has been merged to the still open #105). Cherry-pick was not possible because that would need bringing in more changes from #105 which I did not want. But the merge should be fine. |
Fantastic @peci1, I appreciate all of your work here. I'm a bit busy today and tomorrow (ROS World), but I'll try to set aside some time to get a review for you! |
Rebased on top of ign-common3 to incorporate changes from #105 . All looks good! |
There's a funny (well, not really that funny) thing I stumbled upon. I'm running playback recorders with this PR on a 40-core 8-GPU server. I was looking forward for having the recordings pretty fast. But no way - the server has GeForce 2080 TIs, and NVidia puts an artificial limit on the number of currently running NVENC sessions when using GeForce cards - 3 sessions per system (yes, per system, not per GPU). So I can only run 3 concurrent playback recorders. The server-class cards like Tesla do not have this limitation (don't ask me why we have GeForces in a server). Maybe that could be mentioned somewhere? In this case, avcodec_open2 returns ENOMEM, which is not really a unique way to identify this case, but we could at least add an additional text to the error message pointing to this problem. There is an unofficial patch which removes the artificial limit by patching libnvidia-encode.so library. Should the error message point to this repo, too? This is how it looks in the console (for the 4th session):
What's even worse, once this 4th session starts recording, all other playback recorders segfault. Also, this loop from ign-gazebo results in repeated calls to |
I read about this briefly, so a quick question would be is this a limitation purely for market differentiation? I don't think that you would be unique in having multiple consumer-grade Nvidia cards in a server. The performance-per-dollar makes a lot of sense for many users to prefer the GeForce over the server-class cards (myself included!). For the time, I think it makes sense to have some user-configurable soft cap in the the VideoEncoder that is defaulted to the most common value for NVENC (3 in this case). Maybe a reasonable API would be something like |
Yes, the limit is in the drivers - 8 GPUs would be okay to encode at least 8 streams, not only 3. The problem is that the limit is system-wide across all processes. And I googled a bit but haven't found an API to determine the number of remaining sessions. So I think we can just detect when the system was not able to bring up the decoder, and when we detect the ENOMEM error, we can hint the user that he might be looking in this direction. Maybe I could also add an env var to specify whether software encoder fallback is allowed? Maybe somebody would rather want the encoder to fail than to run on CPU... |
Quite annoying that they don't surface an API for this, or even something in I think that your approach sounds good. I would prefer that we could configure the fallback behavior through the API itself rather than environment variables, if possible. Since the point of this is to keep the API generic, maybe it would make sense to have a mechanism for setting implementation-specific flags, via something like GET_CAPS/SET_CAPS? |
That sounds a bit more complicated than I wanted the API to be. Or maybe I just misunderstood you, can you elaborate on that? Environment variables are just one option to configure the behavior - they're used when you call the original |
No, on second thought, you are right. It would take a long time for this to propagate (in the case that downstream folks would even care about it). I think as long as the environment variables have sane defaults and are documented well, it shouldn't be an issue. As a follow-up, would you be interested in writing a short tutorial to cover how the hardware encoding works? I believe it would give it more visibility. |
The default is to use solely software rendering :)
Sure, I can do that once the feature is finalized and merged. |
… - we're not resizing at all, just changing pixel format. Signed-off-by: Martin Pecka <[email protected]>
…systems. Signed-off-by: Martin Pecka <[email protected]>
…xhausted. This will make sure decoding does not lose frames at the end of the video file. Signed-off-by: Martin Pecka <[email protected]>
It can be completely turned off by build option IGN_COMMON_BUILD_HW_VIDEO (defaults to ON). By default, the HW-accelerated encoders are not used. It is for stability and security reasons - it is HW interaction in the end, so segfaults may occur, though I did my best to handle all error cases correctly. The search for HW encoders has to be enabled by setting environment variable IGN_VIDEO_ALLOWED_ENCODERS=ALL (or a specific encoder if you know which works for you). If you do not know which one would work, setting ALL should be fine - it triggeres a loop over all available HW encoders and tries them with a good set of default GPU devices. That could work well on roughly 80% of supported platforms (x86_64 Win 10 and x86_64 Linux). When the automatic configuration fails, the encoder falls back to software encoding. There are 2 more variables that affect the HW-accelerated encoder configuration: IGN_VIDEO_ENCODER_DEVICE and IGN_VIDEO_USE_HW_SURFACE. See VideoEncoder.hh for their documentation. The whole HWEncoder support class is package-private (it does not export its symbols nor install its header file). That is because it is highly coupled with VideoEncoder and it is not meant to be a general-purpose acceleration library. No libav functions used in this commit should increase the requirements for minimum versions of any libav library. Signed-off-by: Martin Pecka <[email protected]> Signed-off-by: Martin Pecka <[email protected]>
This should prevent some potential crashes. Signed-off-by: Martin Pecka <[email protected]>
…topped. This makes sure frames at the end of the encoding session are not dropped. Signed-off-by: Martin Pecka <[email protected]> Signed-off-by: Martin Pecka <[email protected]>
…created. Also add a simple VideoEncoder::Start() override that allows turning HW acceleration detection on/off. Signed-off-by: Martin Pecka <[email protected]> Signed-off-by: Martin Pecka <[email protected]>
I rebased on the latest changes in #118. I added a check for the special case of NVENC failing to start due to ENOMEM. It now prints this warning:
I also added an override of |
…n Windows (happens on buildfarm but not locally). Signed-off-by: Martin Pecka <[email protected]> Signed-off-by: Martin Pecka <[email protected]>
@peci1 Can you just clean up the merge conflicts here and I think this will be good to go. |
# Conflicts: # av/src/VideoEncoder.cc # include/ignition/common/FlagSet.hh # src/FlagSet_TEST.cc
…err2str_cpp . Signed-off-by: Martin Pecka <[email protected]> Signed-off-by: Martin Pecka <[email protected]>
Signed-off-by: Martin Pecka <[email protected]> Signed-off-by: Martin Pecka <[email protected]>
All green! I re-enabled the video tests disabled in #149 because they now work. The PermissionError when deleting files on Windows happens when the file is still open. I guess that happened because of a missing free in I also merged the |
Thanks @peci1! If I could ask one more small favor, it would be great to have a short tutorial on how to use this feature. Can you add that in a follow-up? |
Sure, I'll have a look at it. Should I add it to ign-common/tutorials? |
Yes, and add it to the index here: https://github.com/ignitionrobotics/ign-common/blob/ign-common3/tutorials.md.in |
Tutorial: #169. |
This PR adds support for HW-accelerated video encoding to the VideoEncoder class. Supported (so far) are NVENC on Windows x86_64 (untested) and Linux x86_64 (tested), VAAPI on Linux x86_64 (tested) and QSV on Windows x86_64 (tested) and Linux x86_64 (untested). It should be quite easy to add more.
For better review, I've split this PR into smaller commits, as I had to do many small fixes in the existing pieces of the av library. Each of the commits has a commit message that explains why the change/fix was needed. The hw-accelerated encoding itself is then commited as the last single large commit.
This PR "depends" on #118 (it is based off of it), and commit 2c19bba is a cherry-pick of f518c91 from
main
. It should be relatively independent from #105 - I guess the merge shouldn't be difficult.The general idea was to make the support "non-intrusive" - it can be switched off via a build option. When built in, it doesn't do anything until you tell it so via a special
VideoEncoder::Start()
signature, or via environment variables. It doesn't add any new dependencies to the library, nor does it increase the required versions of libav. The whole HWEncoder support class is package-private (it does not export its symbols nor install its header file). That is because it is highly coupled with VideoEncoder and it is not meant to be a general-purpose acceleration library.I'm open for discussion about the architecture of the added code. This was just my best try.
HW-acceleration support can be completely turned off by build option
IGN_COMMON_BUILD_HW_VIDEO
(defaults to ON).By default, the HW-accelerated encoders are not used. It is for stability and security reasons - it is HW interaction in the end, so segfaults may occur, though I did my best to handle all error cases correctly.
The search for HW encoders has to be enabled by setting environment variable
IGN_VIDEO_ALLOWED_ENCODERS=ALL
(or a specific encoder if you know which works for you). If you do not know which one would work, setting ALL should be fine - it triggeres a loop over all available HW encoders and tries them with a good set of default GPU devices. That could work well on roughly 80% of supported platforms. When the automatic configuration fails, the encoder falls back to software encoding.There are 2 more variables that affect the HW-accelerated encoder configuration:
IGN_VIDEO_ENCODER_DEVICE
andIGN_VIDEO_USE_HW_SURFACE
. SeeVideoEncoder.hh
for their documentation.