Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

could not select device driver "" with capabilities: [[gpu]]. [Gentoo] #1088

Closed
7 of 10 tasks
gronastech opened this issue Sep 25, 2019 · 3 comments
Closed
7 of 10 tasks

Comments

@gronastech
Copy link

1. Issue or feature description

Docker don't see nvidia gpu and in the same time nvidia-container-cli info output is good:
nvidia-container-cli info
NVRM version: 435.21
CUDA version: 10.1

Device Index: 0
Device Minor: 0
Model: GeForce GTX 1060
Brand: GeForce
GPU UUID: GPU-11e9a5b0-be0f-0466-4daf-34093d62645f
Bus Location: 00000000:01:00.0
Architecture: 6.1

2. Steps to reproduce the issue

Just try to launch docker with docker run --gpus all nvidia/cuda nvidia-smi

3. Information to attach (optional if deemed irrelevant)

nvidia-container-cli -V:
version: 1.0.6
build date: 2019-09-25T12:30+00:00
build revision: 8510a4b460fc9d4cb9ab759c0f9afcfe1979a2fb
build compiler: x86_64-pc-linux-gnu-gcc 8.3.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

nvidia-container-runtime -v
runc version 1.0.0-rc8
spec: 1.0.1-dev

nvidia-smi
Wed Sep 25 18:14:28 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1060 Off | 00000000:01:00.0 Off | N/A |
| N/A 56C P0 24W / N/A | 0MiB / 6078MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

  • Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I0925 15:07:12.146664 649192 nvc.c:281] initializing library context (version=1.0.6, build=8510a4b460fc9d4cb9ab759c0f9afcfe1979a2fb)
I0925 15:07:12.146732 649192 nvc.c:255] using root /
I0925 15:07:12.146735 649192 nvc.c:256] using ldcache /etc/ld.so.cache
I0925 15:07:12.146760 649192 nvc.c:257] using unprivileged user 1000:1000
W0925 15:07:12.147866 649193 nvc.c:186] failed to set inheritable capabilities
W0925 15:07:12.147891 649193 nvc.c:187] skipping kernel modules load due to failure
I0925 15:07:12.148300 649194 driver.c:133] starting driver service
I0925 15:07:12.405188 649192 nvc_info.c:437] requesting driver information with ''
I0925 15:07:12.406176 649192 nvc_info.c:151] selecting /usr/lib64/libvdpau_nvidia.so.435.21
I0925 15:07:12.406765 649192 nvc_info.c:151] selecting /usr/lib64/libnvoptix.so.435.21
I0925 15:07:12.407252 649192 nvc_info.c:151] selecting /usr/lib64/libnvidia-tls.so.435.21
I0925 15:07:12.407679 649192 nvc_info.c:151] selecting /usr/lib64/libnvidia-rtcore.so.435.21
I0925 15:07:12.408019 649192 nvc_info.c:151] selecting /usr/lib64/libnvidia-ptxjitcompiler.so.435.21
I0925 15:07:12.408602 649192 nvc_info.c:151] selecting /usr/lib64/libnvidia-opencl.so.435.21
I0925 15:07:12.409030 649192 nvc_info.c:151] selecting /usr/lib64/libnvidia-ml.so.435.21
I0925 15:07:12.409267 649192 nvc_info.c:151] selecting /usr/lib64/libnvidia-ifr.so.435.21
I0925 15:07:12.409508 649192 nvc_info.c:151] selecting /usr/lib64/libnvidia-glvkspirv.so.435.21
I0925 15:07:12.409780 649192 nvc_info.c:151] selecting /usr/lib64/libnvidia-glsi.so.435.21
I0925 15:07:12.410034 649192 nvc_info.c:151] selecting /usr/lib64/libnvidia-glcore.so.435.21
I0925 15:07:12.410358 649192 nvc_info.c:151] selecting /usr/lib64/libnvidia-fbc.so.435.21
I0925 15:07:12.410636 649192 nvc_info.c:151] selecting /usr/lib64/libnvidia-fatbinaryloader.so.435.21
I0925 15:07:12.410908 649192 nvc_info.c:151] selecting /usr/lib64/libnvidia-encode.so.435.21
I0925 15:07:12.411176 649192 nvc_info.c:151] selecting /usr/lib64/libnvidia-eglcore.so.435.21
I0925 15:07:12.411560 649192 nvc_info.c:151] selecting /usr/lib64/libnvidia-compiler.so.435.21
I0925 15:07:12.411844 649192 nvc_info.c:151] selecting /usr/lib64/libnvidia-cfg.so.435.21
I0925 15:07:12.412114 649192 nvc_info.c:151] selecting /usr/lib64/libnvcuvid.so.435.21
I0925 15:07:12.412609 649192 nvc_info.c:151] selecting /usr/lib64/libcuda.so.435.21
I0925 15:07:12.413294 649192 nvc_info.c:151] selecting /usr/lib32/libvdpau_nvidia.so.435.21
I0925 15:07:12.413612 649192 nvc_info.c:151] selecting /usr/lib32/libnvidia-tls.so.435.21
I0925 15:07:12.414025 649192 nvc_info.c:151] selecting /usr/lib32/libnvidia-ptxjitcompiler.so.435.21
I0925 15:07:12.414274 649192 nvc_info.c:151] selecting /usr/lib32/libnvidia-opencl.so.435.21
I0925 15:07:12.414518 649192 nvc_info.c:151] selecting /usr/lib32/libnvidia-ml.so.435.21
I0925 15:07:12.414757 649192 nvc_info.c:151] selecting /usr/lib32/libnvidia-ifr.so.435.21
I0925 15:07:12.415007 649192 nvc_info.c:151] selecting /usr/lib32/libnvidia-glvkspirv.so.435.21
I0925 15:07:12.415235 649192 nvc_info.c:151] selecting /usr/lib32/libnvidia-glsi.so.435.21
I0925 15:07:12.415501 649192 nvc_info.c:151] selecting /usr/lib32/libnvidia-glcore.so.435.21
I0925 15:07:12.415756 649192 nvc_info.c:151] selecting /usr/lib32/libnvidia-fbc.so.435.21
I0925 15:07:12.416003 649192 nvc_info.c:151] selecting /usr/lib32/libnvidia-fatbinaryloader.so.435.21
I0925 15:07:12.416248 649192 nvc_info.c:151] selecting /usr/lib32/libnvidia-encode.so.435.21
I0925 15:07:12.416498 649192 nvc_info.c:151] selecting /usr/lib32/libnvidia-eglcore.so.435.21
I0925 15:07:12.416749 649192 nvc_info.c:151] selecting /usr/lib32/libnvidia-compiler.so.435.21
I0925 15:07:12.417058 649192 nvc_info.c:151] selecting /usr/lib32/libnvcuvid.so.435.21
I0925 15:07:12.417447 649192 nvc_info.c:151] selecting /usr/lib32/libcuda.so.435.21
W0925 15:07:12.417878 649192 nvc_info.c:302] missing library libnvidia-opticalflow.so
W0925 15:07:12.417885 649192 nvc_info.c:302] missing library libGLX_nvidia.so
W0925 15:07:12.417888 649192 nvc_info.c:302] missing library libEGL_nvidia.so
W0925 15:07:12.417890 649192 nvc_info.c:302] missing library libGLESv2_nvidia.so
W0925 15:07:12.417909 649192 nvc_info.c:302] missing library libGLESv1_CM_nvidia.so
W0925 15:07:12.417913 649192 nvc_info.c:306] missing compat32 library libnvidia-cfg.so
W0925 15:07:12.417919 649192 nvc_info.c:306] missing compat32 library libnvidia-opticalflow.so
W0925 15:07:12.417923 649192 nvc_info.c:306] missing compat32 library libnvidia-rtcore.so
W0925 15:07:12.417942 649192 nvc_info.c:306] missing compat32 library libnvoptix.so
W0925 15:07:12.417946 649192 nvc_info.c:306] missing compat32 library libGLX_nvidia.so
W0925 15:07:12.417964 649192 nvc_info.c:306] missing compat32 library libEGL_nvidia.so
W0925 15:07:12.417969 649192 nvc_info.c:306] missing compat32 library libGLESv2_nvidia.so
W0925 15:07:12.417989 649192 nvc_info.c:306] missing compat32 library libGLESv1_CM_nvidia.so
I0925 15:07:12.421268 649192 nvc_info.c:232] selecting /opt/bin/nvidia-smi
I0925 15:07:12.421668 649192 nvc_info.c:232] selecting /opt/bin/nvidia-debugdump
I0925 15:07:12.421819 649192 nvc_info.c:232] selecting /opt/bin/nvidia-persistenced
I0925 15:07:12.421966 649192 nvc_info.c:232] selecting /opt/bin/nvidia-cuda-mps-control
I0925 15:07:12.422153 649192 nvc_info.c:232] selecting /opt/bin/nvidia-cuda-mps-server
I0925 15:07:12.422218 649192 nvc_info.c:369] listing device /dev/nvidiactl
I0925 15:07:12.422226 649192 nvc_info.c:369] listing device /dev/nvidia-uvm
I0925 15:07:12.422234 649192 nvc_info.c:369] listing device /dev/nvidia-uvm-tools
I0925 15:07:12.422242 649192 nvc_info.c:369] listing device /dev/nvidia-modeset
W0925 15:07:12.422307 649192 nvc_info.c:277] missing ipc /var/run/nvidia-persistenced/socket
W0925 15:07:12.422398 649192 nvc_info.c:277] missing ipc /tmp/nvidia-mps
I0925 15:07:12.422433 649192 nvc_info.c:493] requesting device information with ''
I0925 15:07:12.435585 649192 nvc_info.c:523] listing device /dev/nvidia0 (GPU-11e9a5b0-be0f-0466-4daf-34093d62645f at 00000000:01:00.0)
NVRM version: 435.21
CUDA version: 10.1

Device Index: 0
Device Minor: 0
Model: GeForce GTX 1060
Brand: GeForce
GPU UUID: GPU-11e9a5b0-be0f-0466-4daf-34093d62645f
Bus Location: 00000000:01:00.0
Architecture: 6.1
I0925 15:07:12.435737 649192 nvc.c:318] shutting down library context
I0925 15:07:12.437669 649194 driver.c:192] terminating driver service
I0925 15:07:12.472042 649192 driver.c:233] driver service terminated successfully

  • Kernel version from uname -a
    Linux xpuakyiw0034 5.2.13-gentoo Add CentOS images #2 SMP Tue Sep 17 17:58:48 EEST 2019 x86_64 Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz GenuineIntel GNU/Linux

  • Any relevant kernel output lines from dmesg

  • Driver information from nvidia-smi -a
    ==============NVSMI LOG==============

Timestamp : Wed Sep 25 18:09:28 2019
Driver Version : 435.21
CUDA Version : 10.1

Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : GeForce GTX 1060
Product Brand : GeForce
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-11e9a5b0-be0f-0466-4daf-34093d62645f
Minor Number : 0
VBIOS Version : 86.06.18.00.16
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : N/A
Inforom Version
Image Version : N/A
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x1C2010DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x11AC1462
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 1000 KB/s
Rx Throughput : 2000 KB/s
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 6078 MiB
Used : 0 MiB
Free : 6078 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 2 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Temperature
GPU Current Temp : 54 C
GPU Shutdown Temp : 104 C
GPU Slowdown Temp : 99 C
GPU Max Operating Temp : 96 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : N/A
Power Draw : 24.18 W
Power Limit : N/A
Default Power Limit : N/A
Enforced Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 1012 MHz
SM : 1227 MHz
Memory : 4006 MHz
Video : 1252 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1911 MHz
SM : 1911 MHz
Memory : 4004 MHz
Video : 1708 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None

  • Docker version from docker version
    Client:
    Version: 19.03.2
    API version: 1.40
    Go version: go1.12.9
    Git commit: 6a30dfc
    Built: Wed Sep 25 07:40:00 2019
    OS/Arch: linux/amd64
    Experimental: false

Server:
Engine:
Version: 19.03.2
API version: 1.40 (minimum version 1.12)
Go version: go1.12.9
Git commit: 6a30dfc
Built: Wed Sep 25 07:39:14 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.6
GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc:
Version: 1.0.0-rc8
GitCommit: 425e105d5a03fabd737a126ad93d62a9eeede87f
docker-init:
Version: 0.18.0
GitCommit: fec3683b971d9c3ef73f284f176672c44b448662

  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'

  • NVIDIA container library version from nvidia-container-cli -V
    version: 1.0.6
    build date: 2019-09-25T12:30+00:00
    build revision: 8510a4b460fc9d4cb9ab759c0f9afcfe1979a2fb
    build compiler: x86_64-pc-linux-gnu-gcc 8.3.0
    build platform: x86_64
    build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

  • NVIDIA container library logs (see troubleshooting)

  • Docker command, image and tag used
    docker run --gpus all nvidia/cuda nvidia-smi

  • Docker Info
    docker info
    Client:
    Debug Mode: false

Server:
Containers: 137
Running: 0
Paused: 0
Stopped: 137
Images: 204
Server Version: 19.03.2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683b971d9c3ef73f284f176672c44b448662
Security Options:
seccomp
Profile: default
Kernel Version: 5.2.13-gentoo
Operating System: Gentoo/Linux
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.55GiB
Name: xpuakyiw0034
ID: I6MN:IQIN:MYNT:JMAH:L3HZ:FX7Q:E4HQ:K2DA:VPAZ:PGOR:4AA6:IAIT
Docker Root Dir: /var/lib/docker
Debug Mode: false
Username: lordserenity
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

@ehegnes
Copy link

ehegnes commented Sep 27, 2019

I have ebuilds for the NVIDIA Container Runtime and have successfully configured Docker 19.03.2 with GPU acceleration, if you or anybody else needs assistance on Gentoo.

@skobkin
Copy link

skobkin commented Apr 1, 2020

@ehegnes I need.

I have the same error. Already installed app-admin/nvidia-docker from ssnb overlay but it didn't help.

Not sure but looks like that ebuild just installs /etc/docker/daemon.conf with added nvidia runtime section (which I didn't find on the disk really) and /usr/bin/nvidia-docker which is just a shell script which adds --runtime parameter to the docker command.

I think that ebuild depends on the package providing /usr/bin/nvidia-container-runtime but it's not defined in it.

@jasonbrent
Copy link

jasonbrent commented Apr 20, 2020

@ehegnes

I have ebuilds for the NVIDIA Container Runtime and have successfully configured Docker 19.03.2 with GPU acceleration, if you or anybody else needs assistance on Gentoo.

Mind sharing those ebuilds?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants