-
Notifications
You must be signed in to change notification settings - Fork 2k
Failures on Debian with ldconfig
#1399
Comments
…cache` and work around this failure on Debian: NVIDIA/nvidia-docker#1399
@klueska - do you have any insight into this issue? |
In your example I see you use sudo: Here's an attempt with an old CUDA 10.2 image that does work on Debian 10 and an older version of nvidia-docker (1.2.0):
I'm leaning towards these possibilities:
|
I am also unable to reproduce this on a fresh debian10 system with the latest drivers and the latest My entire history of commands from the time I brought the system online until I executed docker:
With output:
System info:
Package info:
Contents of
|
I don't think it's a problem with
1) Logically fails when you omit both NVIDIA_VISIBLE_DEVICES & NVIDIA_DRIVER_CAPABILITIES:
2) With an NVIDIA_VISIBLE_DEVICES set to first device.
So by default without NVIDIA_DRIVER_CAPABILITIES it loads the config & management libs:
By the way, don't confuse the index order with the order at host even when your host has two GPUs with the nvidia device being the second one the index still starts at 0 as NVIDIA_VISIBLE_DEVICES naturally sees only nvidia devices. 3) Here's a strange case, you may give an invalid list for NVIDIA_DRIVER_CAPABILITIES
Although some of the libs got loaded, ...
... 4) While with all capabilites
|
@3ronco What you are saying is mostly correct, except that the exact error being reported by this issue:
and not:
Which means they are running a container that definitely triggers the Also, this issue only seems to appear on debian 10 systems and not others. |
@klueska ok, i see ... i've used an image built from source (nvidia/libnvidia-container/debian10-amd64) ... may explain the missing setting for the env vars in my setup but like you i'm unable to reproduce the issue the debug output from
|
I am having similar problems. This is on a Debian installed with OMV . When I change
|
it's because when trying to execveat host |
it's not AppArmor, it still happens with apparmor=0 boot param. not sure how else to trace who is denying the LSM hook. |
I think the next logical step to debug this is to do the following:
This will show all calls made into linux and if/why they failed. |
It's the |
i should try and see if it happens with buster 4.19 kernel, since most reports seem to be from buster-backports 5.9 kernels. |
works on buster w/ 4.19, driver 460.32: eli@casper:~$ uname -a
Linux casper 4.19.0-13-amd64 NVIDIA/nvidia-docker#1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 GNU/Linux
eli@casper:~$ dpkg -l | grep 'nvidia-\(docker\|driver\|container\)'
ii libnvidia-container-tools 1.3.3-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.3.3-1 amd64 NVIDIA container runtime library
ii nvidia-container-runtime 3.4.2-1 amd64 NVIDIA container runtime
ii nvidia-container-toolkit 1.4.2-1 amd64 NVIDIA container runtime hook
ii nvidia-docker2 2.5.0-1 all nvidia-docker CLI wrapper
ii nvidia-driver 460.32.03-1~bpo10+1 amd64 NVIDIA metapackage
ii nvidia-driver-bin 460.32.03-1~bpo10+1 amd64 NVIDIA driver support binaries
ii nvidia-driver-libs:amd64 460.32.03-1~bpo10+1 amd64 NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)
eli@casper:~$ nvidia-smi
Sat Feb 13 11:30:20 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... On | 00000000:07:00.0 On | N/A |
| 0% 39C P8 16W / 250W | 322MiB / 11175MiB | 27% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1667 G /usr/lib/xorg/Xorg 211MiB |
| 0 N/A N/A 4227 G ...AAAAAAAAA= --shared-files 43MiB |
+-----------------------------------------------------------------------------+
eli@casper:~$ docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
Sat Feb 13 19:30:24 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... On | 00000000:07:00.0 On | N/A |
| 0% 38C P8 15W / 250W | 322MiB / 11175MiB | 28% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+ |
Hi @rjeli , Thanks for taking the time to dig into this. So it seems that the reason I was never able to reproduce this issue is because I was using the standard Regarding your comment on:
Whether it's reported here or there doesn't really matter -- the same set of people look at the issues. |
Cool. I'm going to stick with the buster kernel so I don't have reason to look into it further, but it might come up in the future for people using newer kernels over time, especially since backports hosts newer nvidia drivers and people might do a full upgrade to backports packages without realizing. 👍 |
I've filed an internal NVIDIA bug (3264046) to track this, as it's impacting a lot of users, as well as my development teams at NVIDIA. |
Has anyone successfully gotten this working with Emby in a docker environment? I have a P400 that I just bought and have just about given up after over a week of trying everything I can based on these threads. |
@cuyax1975 no idea about emby but i had success uninstalling backports kernel, and uninstall/reinstalling backports nvidia-dkms so it builds against 4.19 kernel. cuda works |
Thanks. I am a total newb on Linux. What would be good to google to figure out how to step through what you described? |
@cuyax1975 Maybe you can try this workaround: #1163 (comment). I do that for my Jellyfin docker server. |
Any update on this? |
1 similar comment
Any update on this? |
A workaround that worked in my case was to both replace |
Does this work if the container doesn't contain ldconfig locally? If not can I load "ldconfig" to the container manually to make it work? Running debian with the backported kernel: |
I don't know. |
I just want to confirm that this does work for me, as suggested by original poster, to add
Meanwhile this doesn't work:
|
I ran into the same issue. The |
Hello, 👋 I'm sorry to add again an entry "me too", but this problem is starting to be an issue for us. Maybe I was mislead by #1537 (comment), but @klueska seemed to imply that this was working with debian 11. 🤔 I'm still facing this issue... Details on my setup$ sudo apt list --installed | grep container
containerd.io/bullseye,now 1.4.12-1 amd64 [installed]
libnvidia-container-tools/buster,now 1.8.1-1 amd64 [installed]
libnvidia-container1/buster,now 1.8.1-1 amd64 [installed]
nvidia-container-toolkit/buster,now 1.8.1-1 amd64 [installed,automatic] $ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ cat /etc/nvidia-container-runtime/config.toml
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
#user = "root:video"
ldconfig = "@/sbin/ldconfig"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log" $ nvidia-smi
Tue Feb 15 15:58:55 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GT 720 Off | 00000000:01:00.0 N/A | N/A |
| 30% 30C P0 N/A / N/A | 0MiB / 980MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+ $ docker run --rm --runtime nvidia nvidia/cuda:11.0-base bash -c "nvidia-smi; echo ""; ldconfig && nvidia-smi"
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
Tue Feb 15 15:01:06 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GT 720 Off | 00000000:01:00.0 N/A | N/A |
| 30% 30C P0 N/A / N/A | 0MiB / 980MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+ $ nvidia-container-cli -k -d /dev/tty info 2>&1
-- WARNING, the following logs are for debugging purposes only --
I0215 15:09:07.702294 2695077 nvc.c:376] initializing library context (version=1.8.1, build=abd4e14d8cb923e2a70b7dcfee55fbc16bffa353)
I0215 15:09:07.702411 2695077 nvc.c:350] using root /
I0215 15:09:07.702426 2695077 nvc.c:351] using ldcache /etc/ld.so.cache
I0215 15:09:07.702443 2695077 nvc.c:352] using unprivileged user 1000:1000
I0215 15:09:07.702499 2695077 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0215 15:09:07.702808 2695077 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment
W0215 15:09:07.704132 2695079 nvc.c:273] failed to set inheritable capabilities
W0215 15:09:07.704233 2695079 nvc.c:274] skipping kernel modules load due to failure
I0215 15:09:07.704873 2695080 rpc.c:71] starting driver rpc service
I0215 15:09:07.861413 2695086 rpc.c:71] starting nvcgo rpc service
I0215 15:09:07.861949 2695077 nvc_info.c:759] requesting driver information with ''
I0215 15:09:07.862871 2695077 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.460.91.03
I0215 15:09:07.862923 2695077 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.460.91.03
I0215 15:09:07.862964 2695077 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.460.91.03
I0215 15:09:07.863012 2695077 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.460.91.03
I0215 15:09:07.863035 2695077 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.460.91.03
I0215 15:09:07.863055 2695077 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.460.91.03
I0215 15:09:07.863154 2695077 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.460.91.03
I0215 15:09:07.863234 2695077 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.460.91.03
I0215 15:09:07.863273 2695077 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libEGL_nvidia.so.460.91.03
W0215 15:09:07.863284 2695077 nvc_info.c:398] missing library libnvidia-cfg.so
W0215 15:09:07.863287 2695077 nvc_info.c:398] missing library libnvidia-nscq.so
W0215 15:09:07.863290 2695077 nvc_info.c:398] missing library libnvidia-opencl.so
W0215 15:09:07.863292 2695077 nvc_info.c:398] missing library libnvidia-fatbinaryloader.so
W0215 15:09:07.863294 2695077 nvc_info.c:398] missing library libnvidia-allocator.so
W0215 15:09:07.863296 2695077 nvc_info.c:398] missing library libnvidia-compiler.so
W0215 15:09:07.863298 2695077 nvc_info.c:398] missing library libnvidia-pkcs11.so
W0215 15:09:07.863300 2695077 nvc_info.c:398] missing library libnvidia-ngx.so
W0215 15:09:07.863302 2695077 nvc_info.c:398] missing library libvdpau_nvidia.so
W0215 15:09:07.863304 2695077 nvc_info.c:398] missing library libnvidia-encode.so
W0215 15:09:07.863306 2695077 nvc_info.c:398] missing library libnvidia-opticalflow.so
W0215 15:09:07.863308 2695077 nvc_info.c:398] missing library libnvcuvid.so
W0215 15:09:07.863311 2695077 nvc_info.c:398] missing library libnvidia-fbc.so
W0215 15:09:07.863313 2695077 nvc_info.c:398] missing library libnvidia-ifr.so
W0215 15:09:07.863315 2695077 nvc_info.c:398] missing library libnvidia-rtcore.so
W0215 15:09:07.863317 2695077 nvc_info.c:398] missing library libnvoptix.so
W0215 15:09:07.863319 2695077 nvc_info.c:398] missing library libGLESv2_nvidia.so
W0215 15:09:07.863321 2695077 nvc_info.c:398] missing library libGLESv1_CM_nvidia.so
W0215 15:09:07.863323 2695077 nvc_info.c:398] missing library libnvidia-glvkspirv.so
W0215 15:09:07.863327 2695077 nvc_info.c:398] missing library libnvidia-cbl.so
W0215 15:09:07.863329 2695077 nvc_info.c:402] missing compat32 library libnvidia-ml.so
W0215 15:09:07.863333 2695077 nvc_info.c:402] missing compat32 library libnvidia-cfg.so
W0215 15:09:07.863336 2695077 nvc_info.c:402] missing compat32 library libnvidia-nscq.so
W0215 15:09:07.863340 2695077 nvc_info.c:402] missing compat32 library libcuda.so
W0215 15:09:07.863343 2695077 nvc_info.c:402] missing compat32 library libnvidia-opencl.so
W0215 15:09:07.863347 2695077 nvc_info.c:402] missing compat32 library libnvidia-ptxjitcompiler.so
W0215 15:09:07.863350 2695077 nvc_info.c:402] missing compat32 library libnvidia-fatbinaryloader.so
W0215 15:09:07.863354 2695077 nvc_info.c:402] missing compat32 library libnvidia-allocator.so
W0215 15:09:07.863357 2695077 nvc_info.c:402] missing compat32 library libnvidia-compiler.so
W0215 15:09:07.863360 2695077 nvc_info.c:402] missing compat32 library libnvidia-pkcs11.so
W0215 15:09:07.863363 2695077 nvc_info.c:402] missing compat32 library libnvidia-ngx.so
W0215 15:09:07.863367 2695077 nvc_info.c:402] missing compat32 library libvdpau_nvidia.so
W0215 15:09:07.863370 2695077 nvc_info.c:402] missing compat32 library libnvidia-encode.so
W0215 15:09:07.863374 2695077 nvc_info.c:402] missing compat32 library libnvidia-opticalflow.so
W0215 15:09:07.863378 2695077 nvc_info.c:402] missing compat32 library libnvcuvid.so
W0215 15:09:07.863381 2695077 nvc_info.c:402] missing compat32 library libnvidia-eglcore.so
W0215 15:09:07.863384 2695077 nvc_info.c:402] missing compat32 library libnvidia-glcore.so
W0215 15:09:07.863388 2695077 nvc_info.c:402] missing compat32 library libnvidia-tls.so
W0215 15:09:07.863392 2695077 nvc_info.c:402] missing compat32 library libnvidia-glsi.so
W0215 15:09:07.863396 2695077 nvc_info.c:402] missing compat32 library libnvidia-fbc.so
W0215 15:09:07.863399 2695077 nvc_info.c:402] missing compat32 library libnvidia-ifr.so
W0215 15:09:07.863402 2695077 nvc_info.c:402] missing compat32 library libnvidia-rtcore.so
W0215 15:09:07.863404 2695077 nvc_info.c:402] missing compat32 library libnvoptix.so
W0215 15:09:07.863408 2695077 nvc_info.c:402] missing compat32 library libGLX_nvidia.so
W0215 15:09:07.863412 2695077 nvc_info.c:402] missing compat32 library libEGL_nvidia.so
W0215 15:09:07.863415 2695077 nvc_info.c:402] missing compat32 library libGLESv2_nvidia.so
W0215 15:09:07.863418 2695077 nvc_info.c:402] missing compat32 library libGLESv1_CM_nvidia.so
W0215 15:09:07.863421 2695077 nvc_info.c:402] missing compat32 library libnvidia-glvkspirv.so
W0215 15:09:07.863425 2695077 nvc_info.c:402] missing compat32 library libnvidia-cbl.so
I0215 15:09:07.863506 2695077 nvc_info.c:298] selecting /usr/lib/nvidia/current/nvidia-smi
I0215 15:09:07.863532 2695077 nvc_info.c:298] selecting /usr/lib/nvidia/current/nvidia-debugdump
W0215 15:09:07.863622 2695077 nvc_info.c:424] missing binary nvidia-persistenced
W0215 15:09:07.863626 2695077 nvc_info.c:424] missing binary nv-fabricmanager
W0215 15:09:07.863629 2695077 nvc_info.c:424] missing binary nvidia-cuda-mps-control
W0215 15:09:07.863632 2695077 nvc_info.c:424] missing binary nvidia-cuda-mps-server
W0215 15:09:07.863647 2695077 nvc_info.c:348] missing firmware path /lib/firmware/nvidia/460.91.03/gsp.bin
I0215 15:09:07.863665 2695077 nvc_info.c:522] listing device /dev/nvidiactl
I0215 15:09:07.863668 2695077 nvc_info.c:522] listing device /dev/nvidia-uvm
I0215 15:09:07.863670 2695077 nvc_info.c:522] listing device /dev/nvidia-uvm-tools
I0215 15:09:07.863672 2695077 nvc_info.c:522] listing device /dev/nvidia-modeset
W0215 15:09:07.863687 2695077 nvc_info.c:348] missing ipc path /var/run/nvidia-persistenced/socket
W0215 15:09:07.863701 2695077 nvc_info.c:348] missing ipc path /var/run/nvidia-fabricmanager/socket
W0215 15:09:07.863712 2695077 nvc_info.c:348] missing ipc path /tmp/nvidia-mps
I0215 15:09:07.863716 2695077 nvc_info.c:815] requesting device information with ''
I0215 15:09:07.869784 2695077 nvc_info.c:706] listing device /dev/nvidia0 (GPU-80fc26fb-9db1-5b79-2372-23dfaf7cc99c at 00000000:01:00.0)
I0215 15:09:07.869800 2695077 nvc.c:430] shutting down library context
I0215 15:09:07.869859 2695086 rpc.c:95] terminating nvcgo rpc service
I0215 15:09:07.870174 2695077 rpc.c:135] nvcgo rpc service terminated successfully
I0215 15:09:07.891932 2695080 rpc.c:95] terminating driver rpc service
I0215 15:09:07.892048 2695077 rpc.c:135] driver rpc service terminated successfully
NVRM version: 460.91.03
CUDA version: 11.2
Device Index: 0
Device Minor: 0
Model: GeForce GT 720
Brand: GeForce
GPU UUID: GPU-80fc26fb-9db1-5b79-2372-23dfaf7cc99c
Bus Location: 00000000:01:00.0
Architecture: 3.5 $ sudo which ldconfig
/usr/sbin/ldconfig
$ l /sbin
lrwxrwxrwx 1 root root 8 Sep 28 16:08 /sbin -> usr/sbin $ sudo apt list --installed | grep libc-bin
libc-bin/stable,now 2.31-13+deb11u2 amd64 [installed]
$ dpkg -S /sbin/ldconfig
libc-bin: /sbin/ldconfig The workaround mentioned everywhere of editing I don't have a Is there a clear status on this issue ? Is a fix already exist ? Tanks for taking the time, 👌 Edit, add ref: NVIDIA/nvidia-container-toolkit#299 , #1163 , #1537 |
This issue is reported often, but as i mention here #1399 (comment), I have never been able to reproduce this bug myself. Until I am able to reproduce it, I will not be able to provide a fix. Even just now I spun up two fresh VMs on AWS -- one with Debian 10 and one with Debian 11 -- and was not able to reproduce the error. I followed the same procedure outlined in #1399 (comment) (with the only difference being that I pulled down a newer driver). Is there some hint you can give me on how to get a Debian 10 or Debian 11 system up and running (in a supported state) that exhibits this bug? |
Hello, I don't have the possibility to spawn gpu vm, unfortunatly I only have the bare metal debian 11 on which I'm testing... Here are more info from my debug:$ sudo execsnoop-bpfcc
docker 34242 21523 0 /usr/bin/docker run -it --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
systemd-sysctl 34260 34252 0 /lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/vethea26104 --prefix=/net/ipv4/neigh/vethea26104 --prefix=/net/ipv6/conf/vethea26104 --prefix=/net/ipv6/neigh/vethea26104
systemd-sysctl 34261 34253 0 /lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/vethaf1566b --prefix=/net/ipv4/neigh/vethaf1566b --prefix=/net/ipv6/conf/vethaf1566b --prefix=/net/ipv6/neigh/vethaf1566b
containerd-shim 34264 1102 0 /usr/bin/containerd-shim-runc-v2 -namespace moby -address /run/containerd/containerd.sock -publish-binary /usr/bin/containerd -id ac03866ef7f4d6e96a0a9d29ed400eaf92558c3fe72af82c44aad7fbfe331f98 start
containerd-shim 34272 34264 0 /usr/bin/containerd-shim-runc-v2 -namespace moby -id ac03866ef7f4d6e96a0a9d29ed400eaf92558c3fe72af82c44aad7fbfe331f98 -address /run/containerd/containerd.sock
runc 34281 34272 0 /usr/bin/runc --root /var/run/docker/runtime-runc/moby --log /run/containerd/io.containerd.runtime.v2.task/moby/ac03866ef7f4d6e96a0a9d29ed400eaf92558c3fe72af82c44aad7fbfe331f98/log.json --log-format json --systemd-cgroup create --bundle /run/containerd/io.containerd.runtime.v2.task/moby/ac03866ef7f4d6e96a0a9d29ed400eaf92558c3fe72af82c44aad7fbfe331f98 --pid-file /run/containerd/io.containerd.runtime.v2.task/moby/ac03866ef7f4d6e96a0a9d29ed400eaf92558c3fe72af82c44aad7fbfe331f98/init.pid --console-socket /tmp/pty885547821/pty.sock ac03866ef7f4d6e96a0a9d29ed400eaf92558c3fe72af82c44aad7fbfe331f98
exe 34289 34281 0 /proc/self/exe init
nvidia-containe 34298 34281 0 /usr/bin/nvidia-container-runtime-hook prestart
nvidia-containe 34298 34281 0 /usr/bin/nvidia-container-cli --load-kmods --debug=/var/log/nvidia-container-toolkit.log configure --ldconfig=@/usr/sbin/ldconfig --device=all --compute --utility --require=cuda>=11.0 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=440,driver<441 --pid=34291 /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged
exe 34317 34281 0 /proc/29870/exe -exec-root=/var/run/docker ac03866ef7f4d6e96a0a9d29ed400eaf92558c3fe72af82c44aad7fbfe331f98 15b15f95d697
exe 34324 29870 0 /proc/self/exe /var/run/docker/netns/b7dfbae12bcc all false
runc 34335 34272 0 /usr/bin/runc --root /var/run/docker/runtime-runc/moby --log /run/containerd/io.containerd.runtime.v2.task/moby/ac03866ef7f4d6e96a0a9d29ed400eaf92558c3fe72af82c44aad7fbfe331f98/log.json --log-format json --systemd-cgroup start ac03866ef7f4d6e96a0a9d29ed400eaf92558c3fe72af82c44aad7fbfe331f98
nvidia-smi 34291 34272 0 /usr/bin/nvidia-smi
runc 34341 34272 0 /usr/bin/runc --root /var/run/docker/runtime-runc/moby --log /run/containerd/io.containerd.runtime.v2.task/moby/ac03866ef7f4d6e96a0a9d29ed400eaf92558c3fe72af82c44aad7fbfe331f98/log.json --log-format json --systemd-cgroup delete ac03866ef7f4d6e96a0a9d29ed400eaf92558c3fe72af82c44aad7fbfe331f98
ifupdown-hotplu 34349 34253 0 /lib/udev/ifupdown-hotplug
ifquery 34351 34350 0 /sbin/ifquery --allow hotplug -l vethea26104
systemd-sysctl 34352 34253 0 /lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/vethea26104 --prefix=/net/ipv4/neigh/vethea26104 --prefix=/net/ipv6/conf/vethea26104 --prefix=/net/ipv6/neigh/vethea26104 $ sudo opensnoop-bpfcc | grep -i ldconfig
34082 nvidia-containe 8 0 /usr/sbin/ldconfig
34100 nvc:[ldconfig] 9 0 /proc/34075/ns/mnt
34100 nvc:[ldconfig] 9 0 /proc/sys/kernel/cap_last_cap
34100 nvc:[ldconfig] 9 0 /
34100 nvc:[ldconfig] 10 0 /mnt/data/docker/overlay2/808b3ab7d292be666290befdb0622aecd54b74e25f5350ba83c394107e1ef822/merged
34100 nvc:[ldconfig] 11 0 /proc/self/setgroups $ less /var/log/nvidia-container-toolkit.log
-- WARNING, the following logs are for debugging purposes only --
I0216 16:10:49.804494 34298 nvc.c:376] initializing library context (version=1.8.1, build=abd4e14d8cb923e2a70b7dcfee55fbc16bffa353)
I0216 16:10:49.804640 34298 nvc.c:350] using root /
I0216 16:10:49.804663 34298 nvc.c:351] using ldcache /etc/ld.so.cache
I0216 16:10:49.804681 34298 nvc.c:352] using unprivileged user 65534:65534
I0216 16:10:49.804730 34298 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0216 16:10:49.805036 34298 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment
I0216 16:10:49.806385 34304 nvc.c:278] loading kernel module nvidia
I0216 16:10:49.806920 34304 nvc.c:282] running mknod for /dev/nvidiactl
I0216 16:10:49.807019 34304 nvc.c:286] running mknod for /dev/nvidia0
I0216 16:10:49.807091 34304 nvc.c:290] running mknod for all nvcaps in /dev/nvidia-caps
I0216 16:10:49.813686 34304 nvc.c:218] running mknod for /dev/nvidia-caps/nvidia-cap1 from /proc/driver/nvidia/capabilities/mig/config
I0216 16:10:49.813739 34304 nvc.c:218] running mknod for /dev/nvidia-caps/nvidia-cap2 from /proc/driver/nvidia/capabilities/mig/monitor
I0216 16:10:49.814850 34304 nvc.c:296] loading kernel module nvidia_uvm
I0216 16:10:49.814877 34304 nvc.c:300] running mknod for /dev/nvidia-uvm
I0216 16:10:49.814908 34304 nvc.c:305] loading kernel module nvidia_modeset
I0216 16:10:49.814976 34304 nvc.c:309] running mknod for /dev/nvidia-modeset
I0216 16:10:49.815155 34305 rpc.c:71] starting driver rpc service
I0216 16:10:49.967757 34309 rpc.c:71] starting nvcgo rpc service
I0216 16:10:49.968383 34298 nvc_container.c:240] configuring container with 'compute utility supervised'
I0216 16:10:49.968567 34298 nvc_container.c:88] selecting /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/usr/local/cuda-11.0/compat/libcuda.so.450.51.06
I0216 16:10:49.968621 34298 nvc_container.c:88] selecting /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/usr/local/cuda-11.0/compat/libnvidia-ptxjitcompiler.so.450.51.06
I0216 16:10:49.969680 34298 nvc_container.c:262] setting pid to 34291
I0216 16:10:49.969694 34298 nvc_container.c:263] setting rootfs to /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged
I0216 16:10:49.969700 34298 nvc_container.c:264] setting owner to 0:0
I0216 16:10:49.969705 34298 nvc_container.c:265] setting bins directory to /usr/bin
I0216 16:10:49.969710 34298 nvc_container.c:266] setting libs directory to /usr/lib/x86_64-linux-gnu
I0216 16:10:49.969715 34298 nvc_container.c:267] setting libs32 directory to /usr/lib/i386-linux-gnu
I0216 16:10:49.969720 34298 nvc_container.c:268] setting cudart directory to /usr/local/cuda
I0216 16:10:49.969724 34298 nvc_container.c:269] setting ldconfig to @/usr/sbin/ldconfig (host relative)
I0216 16:10:49.969729 34298 nvc_container.c:270] setting mount namespace to /proc/34291/ns/mnt
I0216 16:10:49.969742 34298 nvc_container.c:272] detected cgroupv2
I0216 16:10:49.969747 34298 nvc_container.c:273] setting devices cgroup to /sys/fs/cgroup/system.slice/docker-ac03866ef7f4d6e96a0a9d29ed400eaf92558c3fe72af82c44aad7fbfe331f98.scope
I0216 16:10:49.969755 34298 nvc_info.c:759] requesting driver information with ''
I0216 16:10:49.970765 34298 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.460.91.03
I0216 16:10:49.970835 34298 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.460.91.03
I0216 16:10:49.970894 34298 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.460.91.03
I0216 16:10:49.970962 34298 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.460.91.03
I0216 16:10:49.970993 34298 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.460.91.03
I0216 16:10:49.971025 34298 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.460.91.03
I0216 16:10:49.971155 34298 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.460.91.03
I0216 16:10:49.971268 34298 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.460.91.03
I0216 16:10:49.971323 34298 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/nvidia/current/libEGL_nvidia.so.460.91.03
W0216 16:10:49.971355 34298 nvc_info.c:398] missing library libnvidia-cfg.so
W0216 16:10:49.971361 34298 nvc_info.c:398] missing library libnvidia-nscq.so
W0216 16:10:49.971367 34298 nvc_info.c:398] missing library libnvidia-opencl.so
W0216 16:10:49.971372 34298 nvc_info.c:398] missing library libnvidia-fatbinaryloader.so
W0216 16:10:49.971377 34298 nvc_info.c:398] missing library libnvidia-allocator.so
W0216 16:10:49.971383 34298 nvc_info.c:398] missing library libnvidia-compiler.so
W0216 16:10:49.971388 34298 nvc_info.c:398] missing library libnvidia-pkcs11.so
W0216 16:10:49.971393 34298 nvc_info.c:398] missing library libnvidia-ngx.so
W0216 16:10:49.971398 34298 nvc_info.c:398] missing library libvdpau_nvidia.so
W0216 16:10:49.971404 34298 nvc_info.c:398] missing library libnvidia-encode.so
W0216 16:10:49.971409 34298 nvc_info.c:398] missing library libnvidia-opticalflow.so
W0216 16:10:49.971414 34298 nvc_info.c:398] missing library libnvcuvid.so
W0216 16:10:49.971420 34298 nvc_info.c:398] missing library libnvidia-fbc.so
W0216 16:10:49.971425 34298 nvc_info.c:398] missing library libnvidia-ifr.so
W0216 16:10:49.971430 34298 nvc_info.c:398] missing library libnvidia-rtcore.so
W0216 16:10:49.971436 34298 nvc_info.c:398] missing library libnvoptix.so
W0216 16:10:49.971441 34298 nvc_info.c:398] missing library libGLESv2_nvidia.so
W0216 16:10:49.971446 34298 nvc_info.c:398] missing library libGLESv1_CM_nvidia.so
W0216 16:10:49.971452 34298 nvc_info.c:398] missing library libnvidia-glvkspirv.so
W0216 16:10:49.971457 34298 nvc_info.c:398] missing library libnvidia-cbl.so
W0216 16:10:49.971462 34298 nvc_info.c:402] missing compat32 library libnvidia-ml.so
W0216 16:10:49.971468 34298 nvc_info.c:402] missing compat32 library libnvidia-cfg.so
W0216 16:10:49.971473 34298 nvc_info.c:402] missing compat32 library libnvidia-nscq.so
W0216 16:10:49.971478 34298 nvc_info.c:402] missing compat32 library libcuda.so
W0216 16:10:49.971483 34298 nvc_info.c:402] missing compat32 library libnvidia-opencl.so
W0216 16:10:49.971489 34298 nvc_info.c:402] missing compat32 library libnvidia-ptxjitcompiler.so
W0216 16:10:49.971494 34298 nvc_info.c:402] missing compat32 library libnvidia-fatbinaryloader.so
W0216 16:10:49.971499 34298 nvc_info.c:402] missing compat32 library libnvidia-allocator.so
W0216 16:10:49.971505 34298 nvc_info.c:402] missing compat32 library libnvidia-compiler.so
W0216 16:10:49.971510 34298 nvc_info.c:402] missing compat32 library libnvidia-pkcs11.so
W0216 16:10:49.971515 34298 nvc_info.c:402] missing compat32 library libnvidia-ngx.so
W0216 16:10:49.971521 34298 nvc_info.c:402] missing compat32 library libvdpau_nvidia.so
W0216 16:10:49.971526 34298 nvc_info.c:402] missing compat32 library libnvidia-encode.so
W0216 16:10:49.971536 34298 nvc_info.c:402] missing compat32 library libnvidia-opticalflow.so
W0216 16:10:49.971542 34298 nvc_info.c:402] missing compat32 library libnvcuvid.so
W0216 16:10:49.971547 34298 nvc_info.c:402] missing compat32 library libnvidia-eglcore.so
W0216 16:10:49.971553 34298 nvc_info.c:402] missing compat32 library libnvidia-glcore.so
W0216 16:10:49.971558 34298 nvc_info.c:402] missing compat32 library libnvidia-tls.so
W0216 16:10:49.971563 34298 nvc_info.c:402] missing compat32 library libnvidia-glsi.so
W0216 16:10:49.971569 34298 nvc_info.c:402] missing compat32 library libnvidia-fbc.so
W0216 16:10:49.971574 34298 nvc_info.c:402] missing compat32 library libnvidia-ifr.so
W0216 16:10:49.971579 34298 nvc_info.c:402] missing compat32 library libnvidia-rtcore.so
W0216 16:10:49.971585 34298 nvc_info.c:402] missing compat32 library libnvoptix.so
W0216 16:10:49.971590 34298 nvc_info.c:402] missing compat32 library libGLX_nvidia.so
W0216 16:10:49.971595 34298 nvc_info.c:402] missing compat32 library libEGL_nvidia.so
W0216 16:10:49.971600 34298 nvc_info.c:402] missing compat32 library libGLESv2_nvidia.so
W0216 16:10:49.971606 34298 nvc_info.c:402] missing compat32 library libGLESv1_CM_nvidia.so
W0216 16:10:49.971611 34298 nvc_info.c:402] missing compat32 library libnvidia-glvkspirv.so
W0216 16:10:49.971616 34298 nvc_info.c:402] missing compat32 library libnvidia-cbl.so
I0216 16:10:49.971895 34298 nvc_info.c:298] selecting /usr/lib/nvidia/current/nvidia-smi
I0216 16:10:49.971937 34298 nvc_info.c:298] selecting /usr/lib/nvidia/current/nvidia-debugdump
W0216 16:10:49.972378 34298 nvc_info.c:424] missing binary nvidia-persistenced
W0216 16:10:49.972384 34298 nvc_info.c:424] missing binary nv-fabricmanager
W0216 16:10:49.972390 34298 nvc_info.c:424] missing binary nvidia-cuda-mps-control
W0216 16:10:49.972395 34298 nvc_info.c:424] missing binary nvidia-cuda-mps-server
W0216 16:10:49.972418 34298 nvc_info.c:348] missing firmware path /lib/firmware/nvidia/460.91.03/gsp.bin
I0216 16:10:49.972445 34298 nvc_info.c:522] listing device /dev/nvidiactl
I0216 16:10:49.972452 34298 nvc_info.c:522] listing device /dev/nvidia-uvm
I0216 16:10:49.972457 34298 nvc_info.c:522] listing device /dev/nvidia-uvm-tools
I0216 16:10:49.972462 34298 nvc_info.c:522] listing device /dev/nvidia-modeset
W0216 16:10:49.972486 34298 nvc_info.c:348] missing ipc path /var/run/nvidia-persistenced/socket
W0216 16:10:49.972509 34298 nvc_info.c:348] missing ipc path /var/run/nvidia-fabricmanager/socket
W0216 16:10:49.972525 34298 nvc_info.c:348] missing ipc path /tmp/nvidia-mps
I0216 16:10:49.972531 34298 nvc_info.c:815] requesting device information with ''
I0216 16:10:49.978491 34298 nvc_info.c:706] listing device /dev/nvidia0 (GPU-80fc26fb-9db1-5b79-2372-23dfaf7cc99c at 00000000:01:00.0)
I0216 16:10:49.978566 34298 nvc_mount.c:359] mounting tmpfs at /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/proc/driver/nvidia
I0216 16:10:49.978918 34298 nvc_mount.c:127] mounting /usr/lib/nvidia/current/nvidia-smi at /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/usr/bin/nvidia-smi
I0216 16:10:49.978980 34298 nvc_mount.c:127] mounting /usr/lib/nvidia/current/nvidia-debugdump at /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/usr/bin/nvidia-debugdump
I0216 16:10:49.979148 34298 nvc_mount.c:127] mounting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.460.91.03 at /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.460.91.03
I0216 16:10:49.979208 34298 nvc_mount.c:127] mounting /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.460.91.03 at /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/usr/lib/x86_64-linux-gnu/libcuda.so.460.91.03
I0216 16:10:49.979265 34298 nvc_mount.c:127] mounting /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.460.91.03 at /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.460.91.03
I0216 16:10:49.979295 34298 nvc_mount.c:520] creating symlink /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.1
I0216 16:10:49.979412 34298 nvc_mount.c:127] mounting /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/usr/local/cuda-11.0/compat/libcuda.so.450.51.06 at /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/usr/lib/x86_64-linux-gnu/libcuda.so.450.51.06
I0216 16:10:49.979472 34298 nvc_mount.c:127] mounting /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/usr/local/cuda-11.0/compat/libnvidia-ptxjitcompiler.so.450.51.06 at /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.450.51.06
I0216 16:10:49.979524 34298 nvc_mount.c:223] mounting /dev/nvidiactl at /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/dev/nvidiactl
I0216 16:10:49.980163 34298 nvc_mount.c:223] mounting /dev/nvidia-uvm at /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/dev/nvidia-uvm
I0216 16:10:49.980642 34298 nvc_mount.c:223] mounting /dev/nvidia-uvm-tools at /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/dev/nvidia-uvm-tools
I0216 16:10:49.981110 34298 nvc_mount.c:223] mounting /dev/nvidia0 at /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/dev/nvidia0
I0216 16:10:49.981205 34298 nvc_mount.c:433] mounting /proc/driver/nvidia/gpus/0000:01:00.0 at /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged/proc/driver/nvidia/gpus/0000:01:00.0
I0216 16:10:49.981773 34298 nvc_ldcache.c:362] executing /usr/sbin/ldconfig from host at /mnt/data/docker/overlay2/4f6f7535fff0b84c73cff3f7ee2cb8f581dcb69c871b23266e78559ccf16a8ae/merged
E0216 16:10:49.982844 1 nvc_ldcache.c:393] could not start /usr/sbin/ldconfig: process execution failed: no such file or directory
I0216 16:10:49.983136 34298 nvc.c:430] shutting down library context
I0216 16:10:49.983235 34309 rpc.c:95] terminating nvcgo rpc service
I0216 16:10:49.983646 34298 rpc.c:135] nvcgo rpc service terminated successfully
I0216 16:10:50.005337 34305 rpc.c:95] terminating driver rpc service
I0216 16:10:50.005550 34298 rpc.c:135] driver rpc service terminated successfully
Probably one off these given my opensnoop output if (adjust_privileges(&ctx->err, cnt->uid, cnt->gid, drop_groups) < 0)
goto fail;
if (limit_syscalls(&ctx->err) < 0)
goto fail; Maybe you have some more insight ? Mika |
Can you try running the following script as
|
@klueska I'm not sure this was the expected result but I'm getting: $ docker run -it --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr:: unknown. nvidia-container-cli.strace.txt Does this give us more info ? |
I would have expected the error to be the same, not something different. |
My bad, I had my setup in a weird state after changing to other kernels... line 4913:
|
Just to be sure -- you have an |
Likewise, on my debian11 system I have an |
Yes, I do have $ l /sbin
lrwxrwxrwx 1 root root 8 Sep 28 16:08 /sbin -> usr/sbin
$ l /usr/sbin/ldconfig
-rwxr-xr-x 1 root root 928K Oct 2 14:47 /usr/sbin/ldconfig
$ md5sum /usr/sbin/ldconfig /sbin/ldconfig
634a4cf316a25d01a21fba9baadcbb8c /usr/sbin/ldconfig
634a4cf316a25d01a21fba9baadcbb8c /sbin/ldconfig
$ dpkg -S /sbin/ldconfig
libc-bin: /sbin/ldconfig
$ sudo apt list --installed | grep libc-bin
libc-bin/stable,now 2.31-13+deb11u2 amd64 [installed] Can you know from which package your Edit: Debian dont know where this file come from: https://packages.debian.org/search?suite=bullseye&arch=any&searchon=contents&keywords=%2Fsbin%2Fldconfig.real |
Oops! I've been running my latest set of commands on an Ubuntu 21.10 system (not Debian 11). |
For reproductibily: I've installed debian from the official iso. Installed the nvidia-driver and cuda from the debian repo:
Installed docker with their repo: Installed nvidia-docker from your repo: |
I was finally able to reproduce the issue myself and track down the cause to produce a proper fix: I will report back here (and to the multitude of other bugs that exist for this issue) once this is reviewed / merged / released. |
@klueska Is there an estimated timeline for when the changes for this bug will be released? |
We had planned to release it 2 weeks ago, but were (and still are) blocked by some internal processes preventing us from pushing out the release. I will update here once we are unblocked. |
The newest version of Specifically this change in The latest release packages for the full
|
Hello @klueska, I confirm this is working with the versions you quoted ! Nice work, thanks. 🎉 |
@brycelelbach would you be able to confirm that the new version of the NVIDIA Container Toolkit (v1.9.0) addresses this issue for you? |
@brycelelbach I'm closing this issue as it should be resolved by 1.9.0. Please reopen if this does not address the issue. |
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
Also, before reporting a new issue, please make sure that:
1. Issue or feature description
On Debian 10 and Debian unstable, nvidia-docker fails to run programs that use CUDA inside of containers UNLESS
ldconfig
is run first in the container to rebuild the theldconfig
cache.Example failure:
If I run
ldconfig
within the container to rebuildld.so.cache
first, everything works:This seems related to:
2. Steps to reproduce the issue
3. Information to attach (optional if deemed irrelevant)
nvidia-container-cli -k -d /dev/tty info
uname -a
nvidia-smi -a
docker version
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
My installation of the display driver and CUDA is a local debug build from source and is rougly CUDA 11.0 / R455.
nvidia-container-cli -V
The text was updated successfully, but these errors were encountered: