Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

How to run test nvidia-smi with rootless docker? #1155

Closed
huyu398 opened this issue Dec 12, 2019 · 6 comments
Closed

How to run test nvidia-smi with rootless docker? #1155

huyu398 opened this issue Dec 12, 2019 · 6 comments

Comments

@huyu398
Copy link

huyu398 commented Dec 12, 2019

My system is set up docker with rootless mode.
I tried to install NVIDIA Container Toolkit by Quick start, and maybe it was successed.
But following error was occured to run test nvidia-smi with docker run --gpus all nvidia/cuda:10.0-base nvidia-smi.

docker: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: open failed: /sys/fs/cgroup/devices/user.slice/devices.allow: permission denied\\\\n\\\"\"": unknown. ERRO[0000] error waiting for container: context canceled

It seems docker doesn't have permission to devices.allow because of rootless mode.
Is there some solutions?

Notice

My system is running behind porxy environmnent.
So rootless docker needs some settings, but maybe it is not relevant.

@krishnadasari610
Copy link

command : docker run -e NVIDIA_VISIBLE_DEVICES=all nvidia/cuda:9.0-base nvidia-smi
output : docker: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected\\n\""": unknown.
ERRO[0003] error waiting for container: context canceled

Method 2
commandnvidia-docker run nvidia/cuda:9.0-base nvidia-smi
output
docker: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected\\n\""": unknown.
ERRO[0002] error waiting for container: context canceled

followed link:
https://github.com/NVIDIA/nvidia-docker
https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html#runcont

Please provide the resolution for this issue.
Screenshot from 2019-12-19 12-39-27

@krishnadasari610
Copy link

krishnadasari610 commented Dec 19, 2019

When I use Dockerfile.ubuntu for my pc I am seeing the below issue. can someone provide the fix for the same.

Step 20/21 : RUN sed -i "s;@Version@;${REVISION};" debian/changelog && sed -i "s;@Version@;${PKG_VERS};" $DIST_DIR/nvidia-docker && if [ "$REVISION" != "$(dpkg-parsechangelog --show-field=Version)" ]; then echo "$(dpkg-parsechangelog --show-field=Version)" && exit 1; fi
---> Running in f3fc7672310d
dpkg-parsechangelog: warning: debian/changelog(l1): version '-' is invalid: upstream version cannot be empty
LINE: nvidia-docker2 (-) UNRELEASED; urgency=medium
dpkg-parsechangelog: warning: debian/changelog(l1): version '-' is invalid: upstream version cannot be empty
LINE: nvidia-docker2 (-) UNRELEASED; urgency=medium
unknown
The command '/bin/sh -c sed -i "s;@Version@;${REVISION};" debian/changelog && sed -i "s;@Version@;${PKG_VERS};" $DIST_DIR/nvidia-docker && if [ "$REVISION" != "$(dpkg-parsechangelog --show-field=Vadmin@test-Precision-5820-Tower:~/Desktop/BIO_Docker_Project/nvidia-docker-master$ returned a non-zero code: 1
Screenshot from 2019-12-19 13-09-48

@huyu398
Copy link
Author

huyu398 commented Dec 19, 2019

I resolved this problem myself.
The cause of this problem is that Rootless mode can't use cgroups, which is refered in documents.
To avoid using cgroups for container i.e. Dockor is here.

@krishnadasari610 Is above patch helpful for you?

@huyu398 huyu398 closed this as completed Dec 19, 2019
@djhshih
Copy link

djhshih commented Jul 21, 2021

We encountered the same error. To summarize, we had to edit /etc/nvidia-container-runtime/config.toml as follows, in order to disable the use of cgroups by the NVIDIA container runtime, as noted above.

[nvidia-container-cli]
no-cgroups = true

@chm123
Copy link

chm123 commented Oct 26, 2021

We encountered the same error. To summarize, we had to edit /etc/nvidia-container-runtime/config.toml as follows, in order to disable the use of cgroups by the NVIDIA container runtime, as noted above.

[nvidia-container-cli]
no-cgroups = true

Thank you! I can use --gpus '"device=3"' for docker run after this change.

@leoribeiro
Copy link

We encountered the same error. To summarize, we had to edit /etc/nvidia-container-runtime/config.toml as follows, in order to disable the use of cgroups by the NVIDIA container runtime, as noted above.

[nvidia-container-cli]
no-cgroups = true

We still need to have access to root privileges in order to edit the file (/etc/nvidia-container-runtime/config.toml), right? Is there a way to avoid using cgroups without root access?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants