You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 27, 2023. It is now read-only.
Given how Issue #85 is diverging in different directions and is becoming a catchall for all things podman, thought I'd break the issue described in this comment out into its own issue... In certain situations (e.g. podman issue 8580), it is not practical to setup subuid / subgid for each user, so we'd like to try to get GPU acceleration working without having to do such, of which singularity is capable
Test System (using the container-tools:3.0 appstream):
$ cat /etc/redhat-release
CentOS Linux release 8.4.2105
$ uname -r
4.18.0-305.7.1.el8_4.x86_64
$ nvidia-smi -L
GPU 0: Tesla V100-PCIE-32GB
$ nvidia-smi | grep Version | awk '{print $3}'
470.42.01
$ nvidia-container-cli --version | head -1
version: 1.4.0
$ crun --version | grep version
crun version 0.18
$ runc --version | grep version
runc version spec: 1.0.2-dev
$ podman --version
podman version 3.0.2-dev
nvdia-container-runtime config (note that no-cgroups is now true and debug files are going to /tmp, per Issue #85):
Without subuid / subgid set, GPU acceleration fails, but non GPU acceleration works. Lots posted as nct_fails_log.txt
$ grep ${USER}: /etc/subuid | wc -l
0
$ podman run --rm docker.io/centos:8 cat /etc/redhat-release
CentOS Linux release 8.3.2011
$ podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ docker.io/nvidia/cuda:10.2-base-centos8 nvidia-smi -L
Error: OCI runtime error: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request
Per suggestions online, I added the account without subuid / subgid to the video group, that did not help. I'm also not clear on the implications of adding a user to the video group, so I asked over on the nvidia forums
The text was updated successfully, but these errors were encountered:
The above used runc, retried with crun, works fine without GPU acceleration, but still fails to run with it without subuid being set. Logs attached as nct_fails_crun_log.txt
Given how Issue #85 is diverging in different directions and is becoming a catchall for all things podman, thought I'd break the issue described in this comment out into its own issue... In certain situations (e.g. podman issue 8580), it is not practical to setup subuid / subgid for each user, so we'd like to try to get GPU acceleration working without having to do such, of which singularity is capable
Test System (using the container-tools:3.0 appstream):
nvdia-container-runtime config (note that
no-cgroups
is now true and debug files are going to/tmp
, per Issue #85):podman storage config (per Issue #85 and rootless podman guide):
With subuid / subgid set, things work fine, logs posted as nct_works_log.txt
Without subuid / subgid set, GPU acceleration fails, but non GPU acceleration works. Lots posted as nct_fails_log.txt
Per suggestions online, I added the account without subuid / subgid to the
video
group, that did not help. I'm also not clear on the implications of adding a user to thevideo
group, so I asked over on the nvidia forumsThe text was updated successfully, but these errors were encountered: