-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate the NVIDIA container toolkit #52
Comments
TLDR: as of the latest Nvidia Container Toolkit ( What's the issue?With the latest versions I can run both Wolf and the Gstreamer pipeline just by running the container with
Last one seems to just be a simlink to Now this is running from an X11 host and I can see that those additional libraries aren't present in my host system:
Anyone that can confirm the output of
on a Nvidia Wayland host? What can we do better?I think we should keep manually downloading and linking the drivers like we are doing at the moment. We should probably add a proper check for mismatch between the downloaded drivers and the host installed drivers either on startup of the containers (somewhere in the |
@ABeltramo One comment I have here is that there isn't really a concept called an "Xorg host" and a "Wayland host". It depends on what the desktop environment and login manager uses, and by default, all drivers bundle both libraries. We will discuss more in NVIDIA/libnvidia-container#118. |
Could you provide a link to that/those Dockerfiles maybe ? |
Oh sorry I actually know what they're talking about : it's adamrehn/ue4-runtime. I think I wrote an answer and forgot to send ^^ |
If those are the images that @ohayak was talking about, unfortunately, there's nothing there that can help us.
I'm very open to suggestions or alternative solutions! |
Our experience was related to the above kernel module. |
The core issue is whether the EGL Wayland library is installed or not, likely not the container toolkit. This is available if you use the I don't think the Debian/Ubuntu PPA repositories install the Wayland components automatically.
But, solution: install the |
Hi, I am currently trying to get this flying with CRI-o and the nvidia-ctk by using the runtime and CDI config. Afair this can be used in docker as well. Currently I'm facing the vblank resource unavailable issue. (Driver version 550.something - cannot look it up atm) |
Here are some short steps for what I've done so far (Nvidia Driver version nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
nvidia-ctk runtime configure --runtime=crio For docker I guess it's enough to use I used the nvidia runtime when starting the container(s) and setting the following Env-Vars:
As documented, this enables the CDI integration which will mount the host libs and binaries. What is working so far:
What is not working:
|
It seems that |
Edit: Is it possible that Edit2: yes libnvidia-vulkan-producer was removed recently: https://www.nvidia.com/Download/driverResults.aspx/214102/en-us/
|
Those libraries are for the EGLStreams backend. I believe compositors have now stopped supporting them. |
Maybe. The latest Nvidia Driver comes with a gbm backend. Maybe that's something useful. I'll give From my approach yesterday evening, glxgears is running without any error messages so far. But the Output in moonlight stays black. |
I think NVIDIA Container Toolkit 1.15.0 (released not long ago) fixes most of the problems. Please check it out. I am trying to fix the remainder of issues with NVIDIA/nvidia-container-toolkit#490. Please feedback. I've written about the situation more detailedly in: NVIDIA/nvidia-container-toolkit#490 (comment) Within the scope of Wolf, the |
I've finally been successful to run steam and cyberpunk with wolf using the nvidia container toolkit instead of the drivers image. mkdir -p /usr/lib/x86_64-linux-gnu/gbm;
ln -sv ../libnvidia-allocator.so.1 /usr/lib/x86_64-linux-gnu/gbm/nvidia-drm_gbm.so; After that launching steam and the game was successful. This all works without the use of |
That sounds really good, thanks for reporting back! |
If someone has knowledge of Go, could they contribute fixes for the unsolved aspects of the PR for NVIDIA/nvidia-container-toolkit#490? I will give write access to https://github.com/ehfd/nvidia-container-toolkit/tree/main if they ask in order to keep it into one PR. |
I can take a look at it later |
@ehfd Thanks for trusting with the access to your fork. I have addressed the pending issues on the code side and requested some feedback. |
No sweat. You know Go better than me and seems like you did a great job at it. |
Congrats on getting this merged! This is going to substantially simplify getting the drivers up and running |
I've tried upgrading Gstreamer to
It looks like |
@ABeltramo This should not be an issue as long as NVRTC CUDA version is kept at around 11.3, yes, 11.0 will not work. |
Thanks for the very quick reply! What's the compatibility matrix for NVRTC? Would upgrading to 11.3 still work for older Cuda installations? |
Mostly the internal ABI of NVIDIA. You can probably fix the issue yourself on GStreamer and then backport it to GStreamer 1.24.6 by a simple error handling in the C code. Probably #ifdef can work. |
Thanks, I've got enough on my plate already.. This is definitely lower priority compared to the rest, looks like we are going to stay on 1.22.7 for a bit longer.. |
The pull request was technically merged today because it was reverted. |
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/7223 This issue has been resolved for 1.24.6. |
Thanks for the heads-up, I'm going to test it out again in the next few days! |
@tux-rampage did you manage to make it work with CDI? I was trying myself (cause in NixOS the old nvidia wrapper method is now deprecated) but didn't manage to make wolf to use my gpu. |
I discovered that CDI overwrites |
Copying the libraries might break stuff when there's an update in the Nvidia libraries, and you inevitably upgrade the installation. I've never used CDI myself, but normally with the nvidia runtime the amount and type of libraries that gets mounted from the host is controlled with the
Must be >= 1.16.0 |
@ABeltramo The thing is that with CDI (which is the adviced way of using nvidia with docker now) docker mounts everything defined in a json. One of the entries has wolf/docker/gpu-drivers.Dockerfile Line 36 in 36f407b
|
This is the docker-compose I'm trying to use and I get the the error of not finding libnvrtc.so services:
wolf:
image: ghcr.io/games-on-whales/wolf:stable
environment:
- XDG_RUNTIME_DIR=/tmp/sockets
- HOST_APPS_STATE_FOLDER=/etc/wolf
- NVIDIA_DRIVER_CAPABILITIES=all
- NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all
volumes:
- /etc/wolf/:/etc/wolf/
- /tmp/sockets:/tmp/sockets:rw
- /var/run/docker.sock:/var/run/docker.sock:rw
- /dev/:/dev/:rw
- /run/udev:/run/udev:rw
device_cgroup_rules:
- 'c 13:* rmw'
devices:
- /dev/dri
- /dev/uinput
- /dev/uhid
deploy:
resources:
reservations:
devices:
- driver: cdi
device_ids:
- nvidia.com/gpu=all
network_mode: host
restart: unless-stopped |
Sorry, my bad. I've misunderstood what you meant, and I forgot about that extra library that we added. Have you tried re-building the image with that change and see if it fixes it? I'm going to try it out moving that lib and using Wolf with the toolkit. If it doesn't break it, I'm going to push it later so that a new image will be built automatically. |
I tried with moving
|
this is the stacktrace (sorry to add this here, no idea if you wanna make it a different issue):
|
Sounds like CDI isn't mounting the right libraries from the host, definitely worth opening up a separate issue for this.
|
Yes, but I use CRIO. Does your Docker instance use the nvidia runtime? If not, the drivers and devices will not be populated. |
This is an issue that has been spun off from the Discord channel.
@Murazaki : It might be good to find a better workflow for providing drivers to Wolf.
On Debian, drivers are pretty old in the main stable repo, and updated ones can be found on CUDA drivers, but do not exactly match manual installation ones.
@ABeltramo : I guess I should go back to look into the Nvidia Docker Toolkit for people that would like to use that
I agree though, it's a bit of a pain point at the moment
@Murazaki : Cuda drivers repo :
https://developer.download.nvidia.com/compute/cuda/repos/
Linux manual installer :
https://download.nvidia.com/XFree86/Linux-x86_64/
right now, latest in cuda packaged installs is 545.23.08.
It doesn't exist as a manual installer.
that breaks the dockerfile and renders wolf unusable
I wanted to make a docker image for debian packages install, but it uses apt-add-repository which is installing a bunch of supplementary stuff
Here it is for Debian Bookworm :
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Debian&target_version=12&target_type=deb_network
More thorough installation steps here :
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
@juliosueiras : There is one problem though, nvidia driver toolkit doesn’t inject driver in the container and still require a driver installed in the container image itself,
And here, I start with what interventions I made with NVIDIA for the last three years to not require the NVIDIA drivers to run Wayland inside the NVIDIA container toolkit.
What NVIDIA container toolkit does: it's pretty simple. It injects (1) kernel devices, and (2) userspace libraries, into a container. (1) and (2) compose a subset of the driver.
(1) kernel devices:
/dev/nvidiaN
,/dev/nvidiactl
,/dev/nvidia-modeset
,/dev/nvidia-uvm
, and/dev/nvidia-uvm-tools
. In addition,/dev/dri/cardX
and/dev/dri/renderDY
, where N, X, and Y depend on the GPU the container toolkit provisions. The/dev/dri
devices were added with NVIDIA/libnvidia-container#118.(2) userspace libraries:
OpenGL libraries including EGL:
'/usr/lib/libGL.so.1', '/usr/lib/libEGL.so.1', '/usr/lib/libGLESv1_CM.so.525.78.01', '/usr/lib/libGLESv2.so.525.78.01', '/usr/lib/libEGL_nvidia.so.0', '/usr/lib/libOpenGL.so.0', '/usr/lib/libGLX.so.0', and '/usr/lib/libGLdispatch.so.0', '/usr/lib/libnvidia-tls.so.525.78.01'
Vulkan libraries:
'/usr/lib/libGLX_nvidia.so.0' and the configuration '/etc/vulkan/icd.d/nvidia_icd.json'
EGLStreams-Wayland and GBM-Wayland libraries:
'/usr/lib/libnvidia-egl-wayland.so.1' and the config '/usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json' '/usr/lib/libnvidia-egl-gbm.so.1' and the config '/usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json'
NVENC libraries:
/usr/lib/libnvidia-encode.so.525.78.01
, which depends on/usr/lib/libnvcuvid.so.525.78.01
, which depends on/usr/lib/x86_64-linux-gnu/libcuda.so.1
VDPAU libraries:
/usr/lib/vdpau/libvdpau_nvidia.so.525.78.01
NVFBC libraries:
/usr/lib/libnvidia-fbc.so.525.78.01
OPTIX libraries:
/usr/lib/libnvoptix.so.1
Not very relevant but of note, perhaps for XWayland: NVIDIA X.Org driver:
/usr/lib/xorg/modules/drivers/nvidia_drv.so
, NVIDIA X.org GLX driver:/usr/lib/xorg/modules/extensions/libglxserver_nvidia.so.525.78.01
In many cases, things don't work because the below configuration files are absent inside the container. Without these, applications inside the container don't know which library to call (what each file does is self-explanatory):
The contents of
/usr/share/glvnd/egl_vendor.d/10_nvidia.json
:The contents of
/etc/vulkan/icd.d/nvidia_icd.json
(note thatapi_version
is variable based on the Driver version):The contents of
/etc/OpenCL/vendors/nvidia.icd
:The contents of
/usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json
:The contents of
/usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json
:I'm pretty sure that now (was different a few months ago), the newest NVIDIA container toolkit provisions all of the required libraries plus the
json
configurations for Wayland (not for X11 but you don't have to care).If only the
json
configurations are absent, it's trivial to manually add the above template.About GStreamer:
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3108
Now, it is correct that NVENC does require CUDA. But that doesn't mean that it requires the whole CUDA Toolkit (separate from the CUDA drivers). The CUDA drivers are the four following libraries installed with the display drivers, independent of the CUDA Toolkit:
libcuda.so
,libnvidia-ptxjitcompiler.so
,libnvidia-nvvm.so
,libcudadebugger.so
These versions go with the display drivers, and are all injected into the container by the NVIDIA container toolkit.
GStreamer 1.22 and before in
nvcodec
requires just two files of the CUDA Toolkit:libnvrtc.so
andlibnvrtc-bulletins.so
. This can be installed from the network repository like the current approach, or be extracted from a PyPi package:One thing to note here is that
libnvrtc.so
is not minor version compatible with CUDA. Thus, it will error on any display driver version older than its corresponding display driver version. However, backwards compatibility always works. Thus, it is a good idea to use the oldest possiblelibnvrtc.so
version.https://docs.nvidia.com/deploy/cuda-compatibility/
So, I have moderate to high confidence that if you guys try the newest NVIDIA container toolkit again, you won't need to install the drivers, assuming that you ensure the
json
files are present or written.Environment variables that currently work for me:
The text was updated successfully, but these errors were encountered: