Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for HW-accelerated video encoding. #125

Merged
merged 30 commits into from
Feb 4, 2021

Conversation

peci1
Copy link
Contributor

@peci1 peci1 commented Nov 10, 2020

This PR adds support for HW-accelerated video encoding to the VideoEncoder class. Supported (so far) are NVENC on Windows x86_64 (untested) and Linux x86_64 (tested), VAAPI on Linux x86_64 (tested) and QSV on Windows x86_64 (tested) and Linux x86_64 (untested). It should be quite easy to add more.

For better review, I've split this PR into smaller commits, as I had to do many small fixes in the existing pieces of the av library. Each of the commits has a commit message that explains why the change/fix was needed. The hw-accelerated encoding itself is then commited as the last single large commit.

This PR "depends" on #118 (it is based off of it), and commit 2c19bba is a cherry-pick of f518c91 from main. It should be relatively independent from #105 - I guess the merge shouldn't be difficult.

The general idea was to make the support "non-intrusive" - it can be switched off via a build option. When built in, it doesn't do anything until you tell it so via a special VideoEncoder::Start() signature, or via environment variables. It doesn't add any new dependencies to the library, nor does it increase the required versions of libav. The whole HWEncoder support class is package-private (it does not export its symbols nor install its header file). That is because it is highly coupled with VideoEncoder and it is not meant to be a general-purpose acceleration library.

I'm open for discussion about the architecture of the added code. This was just my best try.

HW-acceleration support can be completely turned off by build option IGN_COMMON_BUILD_HW_VIDEO (defaults to ON).

By default, the HW-accelerated encoders are not used. It is for stability and security reasons - it is HW interaction in the end, so segfaults may occur, though I did my best to handle all error cases correctly.

The search for HW encoders has to be enabled by setting environment variable IGN_VIDEO_ALLOWED_ENCODERS=ALL (or a specific encoder if you know which works for you). If you do not know which one would work, setting ALL should be fine - it triggeres a loop over all available HW encoders and tries them with a good set of default GPU devices. That could work well on roughly 80% of supported platforms. When the automatic configuration fails, the encoder falls back to software encoding.

There are 2 more variables that affect the HW-accelerated encoder configuration: IGN_VIDEO_ENCODER_DEVICE and IGN_VIDEO_USE_HW_SURFACE. See VideoEncoder.hh for their documentation.

@peci1 peci1 requested a review from mjcarroll as a code owner November 10, 2020 20:45
@github-actions github-actions bot added 🏢 edifice Ignition Edifice 🏰 citadel Ignition Citadel 📜 blueprint Ignition Blueprint 🔮 dome Ignition Dome labels Nov 10, 2020
@peci1
Copy link
Contributor Author

peci1 commented Nov 10, 2020

I've also added an integration test that decodes-encodes-decodes. By default, it runs without HW acceleration, so no code from HWEncoder is tested, but if you know you're building on a machine with a GPU, the acceleration tests can be enabled via the environment variables.

@peci1
Copy link
Contributor Author

peci1 commented Nov 10, 2020

Windows HW acceleration was tested with this build of ffmpeg: https://www.gyan.dev/ffmpeg/builds/packages/ffmpeg-4.3.1-2020-10-28-full_build-shared.7z .

@peci1
Copy link
Contributor Author

peci1 commented Nov 10, 2020

Outch, segfault on Mac even without HW acceleration. Unfortunately, I don't have access to any to debug...

@peci1
Copy link
Contributor Author

peci1 commented Nov 11, 2020

Okay, after misusing the buildfarm a bit, I got it, swscale needs aligned data on Mac. But I'm done for today, so I'll apply it properly tomorrow.

But thanks to this Mac-only bug, I also found the reason why decoders and encoders shortened the videos - the implementations of NextFrame() and Encode() do not "finish" the operation by entering draining modes of the respective codecs. I've already got a fix for the decoder part, and it nicely reads the exact same number of frames as ffmpeg. Encoder should be analogic, but it will require the last drained frames to be actually written from SaveToFile(). I'll add these fixes to this PR, too.

@peci1
Copy link
Contributor Author

peci1 commented Nov 11, 2020

Okay, here it goes.

Mac segfaults are fixed (by defaulting to using 32-byte-aligned frames everywhere except raw input/output buffers). This should be potentially beneficial for other systems with e.g. custom builds of libav which might use more optimized codecs that require alignment.

Draining is implemented for both encoder and decoder. Decoding now doesn't lose a single frame (it reads the same number of frames as ffprobe reports), encoding loses exactly one frame (tested on two video files) compared to 10-20 frames lost without this fix.

I figured out that all changes proposed in #114 were needed here, too, so I closed #114.

I also transplanted commit d2ce032 from #115 (which has been merged to the still open #105). Cherry-pick was not possible because that would need bringing in more changes from #105 which I did not want. But the merge should be fine.

@mjcarroll
Copy link
Contributor

Fantastic @peci1, I appreciate all of your work here. I'm a bit busy today and tomorrow (ROS World), but I'll try to set aside some time to get a review for you!

@peci1
Copy link
Contributor Author

peci1 commented Nov 17, 2020

Rebased on top of ign-common3 to incorporate changes from #105 . All looks good!

@peci1
Copy link
Contributor Author

peci1 commented Nov 21, 2020

There's a funny (well, not really that funny) thing I stumbled upon. I'm running playback recorders with this PR on a 40-core 8-GPU server. I was looking forward for having the recordings pretty fast. But no way - the server has GeForce 2080 TIs, and NVidia puts an artificial limit on the number of currently running NVENC sessions when using GeForce cards - 3 sessions per system (yes, per system, not per GPU). So I can only run 3 concurrent playback recorders. The server-class cards like Tesla do not have this limitation (don't ask me why we have GeForces in a server).

Maybe that could be mentioned somewhere? In this case, avcodec_open2 returns ENOMEM, which is not really a unique way to identify this case, but we could at least add an additional text to the error message pointing to this problem. There is an unofficial patch which removes the artificial limit by patching libnvidia-encode.so library. Should the error message point to this repo, too?

This is how it looks in the console (for the 4th session):

[GUI] [Msg] Recording video using sim time.
[GUI] [Msg] Recording video in lockstep mode
[GUI] [Msg] Recording video using bitrate: 8000000
[GUI] [Msg] Found known HW encoder: h264_nvenc
[GUI] [Msg] Initialized NVENC on device /dev/nvidia7
[GUI] [Msg] Using encoder h264_nvenc
[GUI] [Err] [Util.cc:58] ffmpeg [h264_nvenc] OpenEncodeSessionEx failed: out of memory (10)

[GUI] [Err] [VideoEncoder.cc:507] Could not open video codec: Cannot allocate memory. Video encoding is not started

What's even worse, once this 4th session starts recording, all other playback recorders segfault.

Also, this loop from ign-gazebo results in repeated calls to VideoEncoder::Start(): https://github.com/ignitionrobotics/ign-gazebo/blob/a5abacfa74db6e8db0218bd32c55980e15cf249b/src/systems/camera_video_recorder/CameraVideoRecorder.cc#L297 . When the encoder fails to open (as in the 4th stream case), IsEncoding() stays false, and the next PostRender callback will try to initialize the recorder again. ign-gazebo does not detect that the encoder can't be brought up. Should I add some method like VideoRecorder::IsOk() (there'd probably be a better name) that could tell the user that repeated calls to Start() will fail? Or maybe it's enough to actually read the return value of Start()? The ign-gazebo code doesn't do it... It could be specified that if Start() returns false, repeated calls to it with the same parameters will also fail (until some external circumstances change - e.g. the GPU being freed up from other tasks).

@mjcarroll
Copy link
Contributor

I read about this briefly, so a quick question would be is this a limitation purely for market differentiation?

I don't think that you would be unique in having multiple consumer-grade Nvidia cards in a server. The performance-per-dollar makes a lot of sense for many users to prefer the GeForce over the server-class cards (myself included!).

For the time, I think it makes sense to have some user-configurable soft cap in the the VideoEncoder that is defaulted to the most common value for NVENC (3 in this case).

Maybe a reasonable API would be something like StreamsAvailable so that the CameraVideoRecorder will only attempt to Start if there is still underlying capability?

@peci1
Copy link
Contributor Author

peci1 commented Nov 24, 2020

Yes, the limit is in the drivers - 8 GPUs would be okay to encode at least 8 streams, not only 3.

The problem is that the limit is system-wide across all processes. And I googled a bit but haven't found an API to determine the number of remaining sessions. So I think we can just detect when the system was not able to bring up the decoder, and when we detect the ENOMEM error, we can hint the user that he might be looking in this direction. Maybe I could also add an env var to specify whether software encoder fallback is allowed? Maybe somebody would rather want the encoder to fail than to run on CPU...

@mjcarroll
Copy link
Contributor

Quite annoying that they don't surface an API for this, or even something in /proc.

I think that your approach sounds good. I would prefer that we could configure the fallback behavior through the API itself rather than environment variables, if possible.

Since the point of this is to keep the API generic, maybe it would make sense to have a mechanism for setting implementation-specific flags, via something like GET_CAPS/SET_CAPS?

@peci1
Copy link
Contributor Author

peci1 commented Nov 24, 2020

Since the point of this is to keep the API generic, maybe it would make sense to have a mechanism for setting implementation-specific flags, via something like GET_CAPS/SET_CAPS?

That sounds a bit more complicated than I wanted the API to be. Or maybe I just misunderstood you, can you elaborate on that?

Environment variables are just one option to configure the behavior - they're used when you call the original Start() signature. But I've also added the extended signature which allows to configure all the parameters in code. The reason for making a part of the API configurable via environment variables was that it would take a long time for all downstream code to properly implement all the choices and propagate them to the user via a UI. And some downstream code could be completely uninterested in implementing it. Environment variables are quite simple and anyone can start using the acceleration right away.

@mjcarroll
Copy link
Contributor

That sounds a bit more complicated than I wanted the API to be. Or maybe I just misunderstood you, can you elaborate on that?

No, on second thought, you are right. It would take a long time for this to propagate (in the case that downstream folks would even care about it).

I think as long as the environment variables have sane defaults and are documented well, it shouldn't be an issue.

As a follow-up, would you be interested in writing a short tutorial to cover how the hardware encoding works? I believe it would give it more visibility.

@peci1
Copy link
Contributor Author

peci1 commented Nov 24, 2020

as long as the environment variables have sane defaults

The default is to use solely software rendering :)

As a follow-up, would you be interested in writing a short tutorial to cover how the hardware encoding works? I believe it would give it more visibility.

Sure, I can do that once the feature is finalized and merged.

peci1 and others added 7 commits January 8, 2021 13:33
… - we're not resizing at all, just changing pixel format.

Signed-off-by: Martin Pecka <[email protected]>
…xhausted.

This will make sure decoding does not lose frames at the end of the video file.

Signed-off-by: Martin Pecka <[email protected]>
It can be completely turned off by build option IGN_COMMON_BUILD_HW_VIDEO (defaults to ON).

By default, the HW-accelerated encoders are not used. It is for stability and security reasons - it is HW interaction in the end, so segfaults may occur, though I did my best to handle all error cases correctly.

The search for HW encoders has to be enabled by setting environment variable IGN_VIDEO_ALLOWED_ENCODERS=ALL (or a specific encoder if you know which works for you). If you do not know which one would work, setting ALL should be fine - it triggeres a loop over all available HW encoders and tries them with a good set of default GPU devices. That could work well on roughly 80% of supported platforms (x86_64 Win 10 and x86_64 Linux). When the automatic configuration fails, the encoder falls back to software encoding.

There are 2 more variables that affect the HW-accelerated encoder configuration: IGN_VIDEO_ENCODER_DEVICE and IGN_VIDEO_USE_HW_SURFACE. See VideoEncoder.hh for their documentation.

The whole HWEncoder support class is package-private (it does not export its symbols nor install its header file). That is because it is highly coupled with VideoEncoder and it is not meant to be a general-purpose acceleration library.

No libav functions used in this commit should increase the requirements for minimum versions of any libav library.

Signed-off-by: Martin Pecka <[email protected]>
Signed-off-by: Martin Pecka <[email protected]>
This should prevent some potential crashes.

Signed-off-by: Martin Pecka <[email protected]>
…topped.

This makes sure frames at the end of the encoding session are not dropped.

Signed-off-by: Martin Pecka <[email protected]>
Signed-off-by: Martin Pecka <[email protected]>
…created. Also add a simple VideoEncoder::Start() override that allows turning HW acceleration detection on/off.

Signed-off-by: Martin Pecka <[email protected]>
Signed-off-by: Martin Pecka <[email protected]>
@peci1
Copy link
Contributor Author

peci1 commented Jan 8, 2021

I rebased on the latest changes in #118.

I added a check for the special case of NVENC failing to start due to ENOMEM. It now prints this warning:

  ignwarn << "If this computer has non-server-class GPUs (like GeForce), "
          << "it is possible that you have reached the maximum number of "
          << "simultaneous NVENC sessions (most probably 3). This limit is "
          << "not per GPU, but per the whole computer regardless of the "
          << "number of GPUs installed. You can try to circumvent this "
          << "limit by using the unofficial driver patch at "
          << "https://github.com/keylase/nvidia-patch . If you cannot (or "
          << "do not want) install this patch, do not run more than 3 "
          << "HW-accelerated video encoding tasks on this computer "
          << "simultaneously.\n";

I also added an override of VideoEncoder::Start() with just a simple bool argument _allowHwAccel which turns on/off the configuration using environment variables. So, it can be suggested to downstream code to first call the Start() method as they used to, and if it fails, they can call it once more with _allowHwAccel=false to see if disabling HW acceleration would help. I first thought of some more automated fallback mechanism, but then I figured out the NVENC error happens too late - after the NVENC encoder was detected and selected by ConfigHWAccel(). Doing some automated attempt at repairs by detecting what exactly failed during the run of VideoEncoder would be a pretty complicated topic. Randomly turning on/off things would probably not help.

…n Windows (happens on buildfarm but not locally).

Signed-off-by: Martin Pecka <[email protected]>
Signed-off-by: Martin Pecka <[email protected]>
@mjcarroll
Copy link
Contributor

@peci1 Can you just clean up the merge conflicts here and I think this will be good to go.

peci1 added 3 commits February 3, 2021 23:43
# Conflicts:
#	av/src/VideoEncoder.cc
#	include/ignition/common/FlagSet.hh
#	src/FlagSet_TEST.cc
…err2str_cpp .

Signed-off-by: Martin Pecka <[email protected]>
Signed-off-by: Martin Pecka <[email protected]>
@peci1
Copy link
Contributor Author

peci1 commented Feb 3, 2021

All green!

I re-enabled the video tests disabled in #149 because they now work. The PermissionError when deleting files on Windows happens when the file is still open. I guess that happened because of a missing free in VideoEncoder::Stop(). This PR improves the behavior of this method and fixed one leaked resource which might have been the reason for the test failures on Windows. Feel free to run a few more test rounds to see if the tests are not flaky.

I also merged the av_err2str function added in #149 with the one I defined earlier in this PR. I didn't like redefining a library function, so I rather kept the changed name. I also decided to keep it in the header file so that other files using ffmpeg have access to this function and do not have to reimplement it.

@mjcarroll mjcarroll merged commit f2637fe into gazebosim:ign-common3 Feb 4, 2021
@mjcarroll
Copy link
Contributor

Thanks @peci1!

If I could ask one more small favor, it would be great to have a short tutorial on how to use this feature. Can you add that in a follow-up?

@peci1
Copy link
Contributor Author

peci1 commented Feb 4, 2021

Sure, I'll have a look at it. Should I add it to ign-common/tutorials?

@mjcarroll
Copy link
Contributor

@peci1
Copy link
Contributor Author

peci1 commented Feb 8, 2021

Tutorial: #169.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📜 blueprint Ignition Blueprint 🏰 citadel Ignition Citadel 🔮 dome Ignition Dome 🏢 edifice Ignition Edifice
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants