-
Notifications
You must be signed in to change notification settings - Fork 2k
suse tumbleweed & nvidia-container-toolkit & could not select device driver "" #1377
Comments
This is an error from docker itself before it ever even tries to invoke the nvidia stack. |
To check that the nvidia stack is actually working, you can attempt to use the environment variable API instead of the
|
This is how the stack fits together: |
Thanks for guidelines. Above comment IMO should be part of README.md as is, it's just worth it. TL;DR sles15.1 repo works in Thumbleweed, and fixes my problem. In Thumbleweed nvidia container tooling comes from main repo but there is no nvidia-container-runtime. It works. I still feel like my oryginal question remains unresolved: how does docker "know" nvidia tooling is installed? At this point its purely academic problem. For any suse newbie encountering same problem, below is what I did.
After that:
This step is not necessary as it only adds nvidia runtime. It can be done also by modifying /etc/docker/daemon.json but i did it just for fun. In
Then:
After that:
|
@s4s0l I can NOT thank you enough. I have been delving down this rabbit hole for 2 days. Thank you |
1. Issue or feature description
On tumbleweed (i know it's not supported) I'm unable to run:
Generally I'm not expecting solution but rather would like to understand how all of this should work together. My current findings are that my docker is not using
/usr/share/containers/oci/hooks.d/oci-nvidia-hook.json
at all. I can put there any nonsense and I do not see anything complaining about it. I would like to understand why? As there is quite little documentation on docker side about how it uses oci hooks i do not know where to look for explanation. What should pick it up and under what circumstances? Is it docker itself or runc or ... ? I see in some packages on other distros hooks are placed in different paths like `/usr/share/containers/docker/...' or '/etc/containers/...'. I tried different versions of runc and some more random things but after reading documentation of docker, nvidia repos, oci specs i still cannot figure out how does it supposed to work. I would appreciate if someone could find a moment to write down how nvidia tools are integrated with docker. How does docker pick gpu "driver"? What makes nvidia hook to trigger only for containers started with '--gpus'? etc...Driver seems to be running fine as far as i can tell (games, cuda based ML, blender). Issues i could find relate to docker not restarted after installation of toolkit or docker installed via snap, not my case.
2. Steps to reproduce the issue
Install docker nvidia drivers and nvidia-container-toolkit run container with --gpus .
3. Information
nvidia-container-cli -k -d /dev/tty info
nvidia-container-cli.txt
uname -a
dmesg
nvidia-smi -a
nvidia-smi.txt
docker version
docker-version.txt
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
zypper-packages.txt
nvidia-container-cli -V
no logs created when running container
any with
--gpus
The text was updated successfully, but these errors were encountered: