-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rootless nvidia runtime not working as expected #3659
Comments
We had a bug about this before - I'll try and dig it up on Monday. However, I believe the conclusion was that |
That doesn't look right... |
@mheon which part do you think is wrong? |
No path for the log file - that doesn't seem correct |
I'm able to run with a few changes to a config and a custom hook. Of course since I'm non-root the system hooks won't used by default, so I have to add the --hook-dir option as well. Add/update these two sections of /etc/nvidia-container-runtime/config.toml
My quick-and-dirty hook, I put it in /usr/share/containers/oci/hooks.d/01-nvhook.json
Once that is in place, and I don't have any mysterious bits in /run/user/1000/vfs-layers from previously using sudo podman ...
Usual failures I see are when non-root is trying to open files in /var/log/ for writing. And the cgroups thing which was mentioned in the report at NVIDIA/nvidia-container-runtime#85 The above is only a work around. I goals for fully resolving this issue would be:
|
Could nvidia hook allow the use of --logfile param? To redirect --syslog So that the messages would end up going to journal/syslog rather then writing to a file. What does the --gpus flag do? |
In Docker, we added a --gpus flag which does a few things to trigger the nvidia runtime hooks to start. The parameter lets the user select which gpu(s) to expose to the container or all of them. (ex: --gpus 0,2,3). Low-level detail: these settings are communicated as environment variables along the execution to the hooks and in-container libraries, not elegant but working OK in production. I'm working out a prototype and proposal to add a similar option to podman because I feel that having similar command-line options on both podman and docker is easier on users. (next week I will send a PR to propose changes to cmd/podman/common.go, completions/bash/podman, etc...) I opened an issue for nvidia-container-runtime to improve their logging support. https://gitlab.com/nvidia/container-toolkit/nvidia-container-runtime/issues/5 |
@rhatdan Will it be acceptable if we had |
Sure, the question then would be what is the default. Since this is not something shipped/controlled by the distributions. Another option would be for you plugin to see that it is not running as root, so fall back to syslog logging. |
I would prefer to have everything controlled by the config file and command-line options rather than establish some drastically different behavior based on uid.
|
I am fine with either, my only goal is rootless podman does not suddenly blow up because it cannot write to a system log file. |
This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days. |
This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days. |
Are there any updates on this? Is the recommended way to do this to set no-cgroups = true in /etc/nvidia-container-runtime/config.toml? |
Hi @dagrayvid - I successfully built container with nvidia drivers and CUDA within it, this comment here helped me: #3155 (comment) Additionally I spotted weird behavior. I was using opencv from within the container and simple code like below was failing until I have executed it from the host first: import cv2
cv2.cuda.getCudaEnabledDeviceCount() |
I had the same problem in my deep learning server, someone can help me?
|
Please open new issues, do not just keep adding to existing closed issues. If you run podman as root does it work? |
Hi @rhatdan , no , same error when I do: |
Is there a nvidia hook that is attempting to launch runc? |
Actually I don't think podman looks for the executable in path. sudo podman run --runtime=/usr/bin/nvidia --privileged nvidia/cuda nvidia-smi |
@rhatdan
|
I wonder if we exec the OCI runtime and remove the settings of $PATH. |
(Disclaimer: I might be barking up the wrong tree here, I have no idea if this is even supposed to work yet.)
Problem
nvidia runtime does not work in rootless mode without root (see debug log: /sys/fs permission)
Expected result
Rainbows.
Debug log
Config
The text was updated successfully, but these errors were encountered: