docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error #1017

lychenpan · 2019-07-20T05:46:16Z

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Also, before reporting a new issue, please make sure that:

You read carefully the documentation and frequently asked questions.
You searched for a similar issue and this is not a duplicate of an existing one.
This issue is not related to NGC, otherwise, please use the devtalk forums instead.
You went through the troubleshooting steps.

1. Issue or feature description

I'm tring to install nvidia-docker v2, and follow the steps
at the last step:
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
it will have error message:
docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/56ca0b73c5720021671123b7f44c885bb1e7b42957c9b18e7b509be26760b993/log.json: no such file or directory): nvidia-container-runtime did not terminate sucessfully: unknown.

2. Steps to reproduce the issue

Just the same as the readme here

3. Information to attach (optional if deemed irrelevant)

Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
`e3380-8b5c-cbf2-8bb2-1bcced59103d at 00000000:01:00.0)
NVRM version: 418.56
CUDA version: 10.1

Device Index: 0
Device Minor: 0
Model: GeForce GTX 1080 Ti
Brand: GeForce
GPU UUID: GPU-e17e3380-8b5c-cbf2-8bb2-1bcced59103d
Bus Location: 00000000:01:00.0
Architecture: 6.1
I0720 05:40:32.999897 19015 nvc.c:318] shutting down library context
I0720 05:40:33.000865 19017 driver.c:192] terminating driver service
I0720 05:40:33.010816 19015 driver.c:233] driver service terminated successfully`

Kernel version from uname -a
Linux cp 4.4.0-154-generic #181-Ubuntu SMP Tue Jun 25 05:29:03 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Any relevant kernel output lines from dmesg
Driver information from nvidia-smi -a
`Timestamp : Sat Jul 20 13:42:03 2019
Driver Version : 418.56
CUDA Version : 10.1

Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : GeForce GTX 1080 Ti`

Docker version from docker version
`Client:
Version: 18.06.1-ce
API version: 1.38
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:24:56 2018
OS/Arch: linux/amd64
Experimental: false

Server:
Engine:
Version: 18.06.1-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:23:21 2018
OS/Arch: linux/amd64
Experimental: false`

NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==============-============-============-================================= ii libnvidia-cont 1.0.2-1 amd64 NVIDIA container runtime library ii libnvidia-cont 1.0.2-1 amd64 NVIDIA container runtime library un nvidia-304 <none> <none> (no description available) un nvidia-340 <none> <none> (no description available) un nvidia-384 <none> <none> (no description available) un nvidia-common <none> <none> (no description available) ii nvidia-contain 3.0.0-1 amd64 NVIDIA container runtime ii nvidia-contain 1.4.0-1 amd64 NVIDIA container runtime hook un nvidia-docker <none> <none> (no description available) ii nvidia-docker2 2.1.0-1 all nvidia-docker CLI wrapper un nvidia-libopen <none> <none> (no description available) un nvidia-prime <none> <none> (no description available)
NVIDIA container library version from nvidia-container-cli -V
version: 1.0.2
NVIDIA container library logs (see troubleshooting)
Docker command, image and tag used
tensorflow/tensorflow:nightly-gpu-py3-jupyter

The text was updated successfully, but these errors were encountered:

lychenpan · 2019-07-20T05:46:58Z

Can anyone help me? Thanks.

jjacobelli · 2019-07-20T15:19:45Z

We need more information.

Can you do this:

In file /etc/nvidia-container-runtime/config.toml, add this:

[nvidia-container-runtime]
debug = "/var/log/nvidia-container-runtime"

Run docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
Give the content of /var/log/nvidia-container-runtime

lychenpan · 2019-07-21T11:48:12Z

@Ethyling thanks for your help.
2019/07/21 19:46:34 Running nvidia-container-runtime 2019/07/21 19:46:34 Using bundle file: /var/run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/ffb99321f8a60d6cccf1a461ce7782d24b00217e530138d8aa779d98c2ec27a0/config.json 2019/07/21 19:46:34 Prestart hook path: /usr/bin/nvidia-container-runtime-hook 2019/07/21 19:46:34 Prestart hook added, executing runc 2019/07/21 19:46:34 ERROR: find runc path: exec: "runc": executable file not found in $PATH
That is the content.

jjacobelli · 2019-07-22T12:17:34Z

Can you do this too, and give us the output:

Run whereis runc
Run whereis docker-runc

Thank you!

mikecouk · 2019-07-22T14:08:22Z

I'm getting EXACTLY the same error as the reporter, on all my dev, staging, and production systems, as soon as ElasticBeanstalk tries to create new servers. Code / Config hasn't been touched in 15 days, and everything has been stable for the last 2 weeks, so something referenced during EC2 startup is failing ( I hardcode my nvidia driver install etc into my baked AMI ). the only thing it could be really is the nvidia-docker install as that is installed on server startup.
Again, I've not changed anything !, it's just suddenly started happening ! Hope you guys can fix this asap or suggest a work-around !

mikeobr · 2019-07-22T14:11:38Z

I am also seeing this same issue @mikecouk. We use nvidia-docker on some AWS beanstalk environments. Starting last Friday, deploys stopped working due to this issue. It looks like some automatic updates trigger and update nvidia-docker, and now it no longer works.

jjacobelli · 2019-07-22T14:33:41Z

We are currently working on this, but first we need more information:

Can you do this too, and give us the output:
1. Run `whereis runc`

2. Run `whereis docker-runc`
Thank you!

mikeobr · 2019-07-22T14:38:39Z

@Ethyling :

`[ec2-user]$ whereis runc

runc:

[ec2-user]$ whereis docker-runc

docker-runc: /usr/bin/docker-runc

`

jjacobelli · 2019-07-22T14:40:09Z

@mikeobr Thank you, I will come back to you asap

mikecouk · 2019-07-22T14:54:43Z

We're currently testing a fix by aliasing docker-runc to runc, which as you've already worked out doesn't exist anymore :) Will get back to you asap.

kosuke55 · 2019-07-22T14:58:31Z

Hi, I am facing the same problem.
This is the output of command.

$ whereis runc
runc:
$ whereis docker-runc
docker-runc: /usr/bin/docker-runc /usr/bin/X11/docker-runc

thanks

brett-lempereur · 2019-07-22T15:03:25Z

Working with @mikecouk on this one. While yum provides runc still lists the AWS Docker packages, none of the published 18.06 versions actually installed the runc binary (we tried downgrading to all of them).

mikecouk · 2019-07-22T15:12:47Z

The "brett hack" to give it it's official name has fixed our problem, currently rolling it out to all servers before they have a chance to die and respawn ( we use spot instances ! ). The line you'll want for the moment in your scripts AFTER then nvidia-docker2 yum install is ;

if [ \! -f /usr/bin/runc -a -f /usr/bin/docker-runc ]; then ln -s /usr/bin/docker-runc /usr/bin/runc; else echo "DID NOT CREATE RUNC SYMLINK"; fi

Phew, panic over the for the moment, but could do with an explanation on this ( even if it's blame AWS ! )

rajatchopra · 2019-07-22T15:29:42Z

Please use the 'brett hack' for the time being. This is a combination of two underlying non-issues that make an issue because of incompatibility:

nvidia-container-runtime has a new release that uses the existing runc binary on host rather than compiling and shipping its own.
The host may not have the 'runc' binary after all. It is called docker-runc instead.

What next: a new release in the pend for nvidia-docker that will look for docker-runc if runc is not found. It is still better to chase the binary's new name rather than chase its version, so point number '1' from above is here to stay.

isildur7 · 2019-07-22T15:30:43Z

I am getting the same runc error, but I am using Ubuntu. Can someone tell me what is the "brett hack" version for this?

rajatchopra · 2019-07-22T15:34:55Z

I am getting the same runc error, but I am using Ubuntu. Can someone tell me what is the "brett hack" version for this?

Create a runc symlink to point to docker-runc.

As @mikecouk will do it:

if [ \! -f /usr/bin/runc -a -f /usr/bin/docker-runc ]; then ln -s /usr/bin/docker-runc /usr/bin/runc; else echo "DID NOT CREATE RUNC SYMLINK"; fi

mikeobr · 2019-07-22T15:51:28Z

The brett hack works, thanks for the quick workaround.

@mikecouk

( even if it's blame AWS ! )

I'm pretty sure AWS ElasticBeanstalk runs something like a yum update during install, I've had a similar issue before where stuff started breaking with no code changes. If you find a good way to lock versions or prevent this, I'd be happy to know 😃

ramab1988 · 2019-07-23T10:15:49Z

sudo ln -s /usr/bin/nvidia-container-toolkit /usr/bin/nvidia-container-runtime-hook
Solved the problem for me

mikeobr · 2019-07-23T13:50:01Z

Is anyone else having issues again today? I put the Brett hack into place yesterday and it resolved the issue. Today I am seeing the same problem again (unable to retrieve OCI runtime error) with the workaround in place.

This is on AWS Beanstalk, so I'm not sure if it pulled a newer version.

EDIT: @ramab1988 's alias works for resolving the issue. Did something change that caused the initial workaround to no longer worker?

jjacobelli · 2019-07-23T14:10:29Z

Hi @mikeobr, can you do this please:

In file /etc/nvidia-container-runtime/config.toml, add this:

[nvidia-container-runtime]
debug = "/var/log/nvidia-container-runtime"

Run docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
Give the content of /var/log/nvidia-container-runtime
Run whereis runc
Run whereis docker-runc
Give the output

It will help us a lot, thank you!

mikeobr · 2019-07-23T14:22:47Z

@Ethyling Here are results before I added the nvidia-container-toolkit symlink

Log file

2019/07/23 14:19:18 Running nvidia-container-runtime
2019/07/23 14:19:18 Command is not "create", executing runc doing nothing
2019/07/23 14:19:18 Looking for "docker-runc" binary
2019/07/23 14:19:18 Runc path: /usr/bin/docker-runc
2019/07/23 14:19:19 Running nvidia-container-runtime
2019/07/23 14:19:19 Command is not "create", executing runc doing nothing
2019/07/23 14:19:19 Looking for "docker-runc" binary
2019/07/23 14:19:19 Runc path: /usr/bin/docker-runc
2019/07/23 14:19:19 Running nvidia-container-runtime
2019/07/23 14:19:19 Command is not "create", executing runc doing nothing
2019/07/23 14:19:19 Looking for "docker-runc" binary
2019/07/23 14:19:19 Runc path: /usr/bin/docker-runc
2019/07/23 14:19:45 Running nvidia-container-runtime
2019/07/23 14:19:45 Using bundle file: /var/run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/ac752a82c25d6b5ce78fc7dbb5927a1179c6ea907f2de33f84c1558fff3521c3/config.json
2019/07/23 14:19:45 ERROR: inject NVIDIA hook: stat /usr/bin/nvidia-container-runtime-hook: no such file or directory
2019/07/23 14:19:45 Running nvidia-container-runtime
2019/07/23 14:19:45 Using bundle file: /var/run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/4c464f29fb58acd0195f6d3d6da165a2d96582a850410d65bc0f666f6a97c816/config.json
2019/07/23 14:19:45 ERROR: inject NVIDIA hook: stat /usr/bin/nvidia-container-runtime-hook: no such file or directory
2019/07/23 14:19:59 Running nvidia-container-runtime
2019/07/23 14:19:59 Command is not "create", executing runc doing nothing
2019/07/23 14:19:59 Looking for "docker-runc" binary
2019/07/23 14:19:59 Runc path: /usr/bin/docker-runc
2019/07/23 14:20:31 Running nvidia-container-runtime
2019/07/23 14:20:31 Using bundle file: /var/run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1a3c1edb94428e83740612ddfdd1fccd424ce384262a9cd3756a39c0aa7731c4/config.json
2019/07/23 14:20:31 ERROR: inject NVIDIA hook: stat /usr/bin/nvidia-container-runtime-hook: no such file or directory

Whereis results (this is with the Brett hack)
runc: /usr/bin/runc
docker-runc: /usr/bin/docker-runc

jjacobelli · 2019-07-23T15:12:32Z

Can you try to update nvidia-docker2 and nvidia-container-runtime? This problem should be fixed

mikeobr · 2019-07-23T15:41:29Z

Hey @Ethyling, I'm not as familiar with CentOS (where my stuff is hosted) but yum update and upgrade did not pull any new versions.

RenaudWasTaken · 2019-07-23T20:11:40Z

@mikeobr can you comment with the version of the packages?
rpm -qa '*nvidia*'

mikeobr · 2019-07-23T20:26:19Z

@RenaudWasTaken

$ rpm -qa 'nvidia'
libnvidia-container-tools-1.0.2-1.x86_64
nvidia-container-toolkit-1.0.0-2.amzn1.x86_64
nvidia-docker2-2.1.0-1.noarch
libnvidia-container1-1.0.2-1.x86_64
nvidia-container-runtime-3.0.1-1.x86_64

qhaas · 2019-07-23T21:46:27Z

I'm seeing a similar problem in CentOS 7, my system info / error is posted in my reply to this (possibly related) bug report: NVIDIA/nvidia-container-runtime#68

UPDATE: Appears to be working now after a few re-arrangements of packages deployed and changes to how docker run is called, see my comment in the bug report above

RenaudWasTaken · 2019-07-23T23:44:45Z

We are working on this, expect this to be fixed by End of day.

RenaudWasTaken · 2019-07-24T20:17:54Z

Hello!

We've released new packages yesterday, you should be good to upgrade now.

Thanks for reporting the issue, closing for now!

ahming · 2019-09-04T03:17:20Z

Hi!

How to use this workaround?

if [ ! -f /usr/bin/runc -a -f /usr/bin/docker-runc ]; then ln -s /usr/bin/docker-runc /usr/bin/runc; else echo "DID NOT CREATE RUNC SYMLINK"; fi

marchinho11 · 2019-09-15T14:27:04Z

Hi!

How to use this workaround?

if [ ! -f /usr/bin/runc -a -f /usr/bin/docker-runc ]; then ln -s /usr/bin/docker-runc /usr/bin/runc; else echo "DID NOT CREATE RUNC SYMLINK"; fi

sudo bash -c 'if [ ! -f /usr/bin/runc -a -f /usr/bin/docker-runc ]; then ln -s /usr/bin/docker-runc /usr/bin/runc; else echo "DID NOT CREATE RUNC SYMLINK"; fi'

solved problem for me, thanks

ghost · 2019-09-24T00:58:47Z

sudo apt install nvidia-container-runtime worked for me.

nagdevAmruthnath · 2019-10-18T17:36:16Z

sudo apt install nvidia-container-runtime worked for me.

Thank you Sir! this worked for me

acidfreako · 2019-12-25T13:25:54Z

Thanks @whillas that worked for me

Clayton-Klemm · 2020-01-02T05:55:22Z

sudo apt install nvidia-container-runtime worked for me.

@whillas holy !@#$ thank you :D that command seems really relevant and I'm surprised I didn't encounter it in the installation process.

litingzhou1 · 2020-02-14T19:53:50Z

sudo apt install nvidia-container-runtime

also worked for me

andrewssobral · 2020-04-03T12:21:49Z

I had the same problem in my deep learning server, someone can help me?

docker run --runtime=nvidia --privileged nvidia/cuda nvidia-smi works fine but
podman run --runtime=nvidia --privileged nvidia/cuda nvidia-smi crashes

$ podman run --runtime=nvidia --privileged nvidia/cuda nvidia-smi
2020/04/03 13:34:52 ERROR: /usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH
Error: `/usr/bin/nvidia-container-runtime start e3ccb660bf27ce0858ee56476e58b53cd3dc900e8de80f08d10f3f844c0e9f9a` failed: exit status 1

$ podman --version
podman version 1.8.2

$ cat ~/.config/containers/libpod.conf
# libpod.conf is the default configuration file for all tools using libpod to
# manage containers

# Default transport method for pulling and pushing for images
image_default_transport = "docker://"

# Paths to look for the conmon container manager binary.
# If the paths are empty or no valid path was found, then the `$PATH`
# environment variable will be used as the fallback.
conmon_path = [
            "/usr/libexec/podman/conmon",
            "/usr/local/libexec/podman/conmon",
            "/usr/local/lib/podman/conmon",
            "/usr/bin/conmon",
            "/usr/sbin/conmon",
            "/usr/local/bin/conmon",
            "/usr/local/sbin/conmon",
            "/run/current-system/sw/bin/conmon",
]

# Environment variables to pass into conmon
conmon_env_vars = [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
]

# CGroup Manager - valid values are "systemd" and "cgroupfs"
#cgroup_manager = "systemd"

# Container init binary
#init_path = "/usr/libexec/podman/catatonit"

# Directory for persistent libpod files (database, etc)
# By default, this will be configured relative to where containers/storage
# stores containers
# Uncomment to change location from this default
#static_dir = "/var/lib/containers/storage/libpod"

# Directory for temporary files. Must be tmpfs (wiped after reboot)
#tmp_dir = "/var/run/libpod"
tmp_dir = "/run/user/1000/libpod/tmp"

# Maximum size of log files (in bytes)
# -1 is unlimited
max_log_size = -1

# Whether to use chroot instead of pivot_root in the runtime
no_pivot_root = false

# Directory containing CNI plugin configuration files
cni_config_dir = "/etc/cni/net.d/"

# Directories where the CNI plugin binaries may be located
cni_plugin_dir = [
               "/usr/libexec/cni",
               "/usr/lib/cni",
               "/usr/local/lib/cni",
               "/opt/cni/bin"
]

# Default CNI network for libpod.
# If multiple CNI network configs are present, libpod will use the network with
# the name given here for containers unless explicitly overridden.
# The default here is set to the name we set in the
# 87-podman-bridge.conflist included in the repository.
# Not setting this, or setting it to the empty string, will use normal CNI
# precedence rules for selecting between multiple networks.
cni_default_network = "podman"

# Default libpod namespace
# If libpod is joined to a namespace, it will see only containers and pods
# that were created in the same namespace, and will create new containers and
# pods in that namespace.
# The default namespace is "", which corresponds to no namespace. When no
# namespace is set, all containers and pods are visible.
#namespace = ""

# Default infra (pause) image name for pod infra containers
infra_image = "k8s.gcr.io/pause:3.1"

# Default command to run the infra container
infra_command = "/pause"

# Determines whether libpod will reserve ports on the host when they are
# forwarded to containers. When enabled, when ports are forwarded to containers,
# they are held open by conmon as long as the container is running, ensuring that
# they cannot be reused by other programs on the host. However, this can cause
# significant memory usage if a container has many ports forwarded to it.
# Disabling this can save memory.
#enable_port_reservation = true

# Default libpod support for container labeling
# label=true

# The locking mechanism to use
lock_type = "shm"

# Number of locks available for containers and pods.
# If this is changed, a lock renumber must be performed (e.g. with the
# 'podman system renumber' command).
num_locks = 2048

# Directory for libpod named volumes.
# By default, this will be configured relative to where containers/storage
# stores containers.
# Uncomment to change location from this default.
#volume_path = "/var/lib/containers/storage/volumes"

# Selects which logging mechanism to use for Podman events.  Valid values
# are `journald` or `file`.
# events_logger = "journald"

# Specify the keys sequence used to detach a container.
# Format is a single character [a-Z] or a comma separated sequence of
# `ctrl-<value>`, where `<value>` is one of:
# `a-z`, `@`, `^`, `[`, `\`, `]`, `^` or `_`
#
# detach_keys = "ctrl-p,ctrl-q"

# Default OCI runtime
runtime = "runc"

# List of the OCI runtimes that support --format=json.  When json is supported
# libpod will use it for reporting nicer errors.
runtime_supports_json = ["crun", "runc"]

# List of all the OCI runtimes that support --cgroup-manager=disable to disable
# creation of CGroups for containers.
runtime_supports_nocgroups = ["crun"]

# Paths to look for a valid OCI runtime (runc, runv, etc)
# If the paths are empty or no valid path was found, then the `$PATH`
# environment variable will be used as the fallback.
[runtimes]
runc = [
            "/usr/bin/runc",
            "/usr/sbin/runc",
            "/usr/local/bin/runc",
            "/usr/local/sbin/runc",
            "/sbin/runc",
            "/bin/runc",
            "/usr/lib/cri-o-runc/sbin/runc",
            "/run/current-system/sw/bin/runc",
]

crun = [
                "/usr/bin/crun",
                "/usr/sbin/crun",
                "/usr/local/bin/crun",
                "/usr/local/sbin/crun",
                "/sbin/crun",
                "/bin/crun",
                "/run/current-system/sw/bin/crun",
]

nvidia = ["/usr/bin/nvidia-container-runtime"]

# Kata Containers is an OCI runtime, where containers are run inside lightweight
# Virtual Machines (VMs). Kata provides additional isolation towards the host,
# minimizing the host attack surface and mitigating the consequences of
# containers breakout.
# Please notes that Kata does not support rootless podman yet, but we can leave
# the paths below blank to let them be discovered by the $PATH environment
# variable.

# Kata Containers with the default configured VMM
kata-runtime = [
    "/usr/bin/kata-runtime",
]

# Kata Containers with the QEMU VMM
kata-qemu = [
    "/usr/bin/kata-qemu",
]

# Kata Containers with the Firecracker VMM
kata-fc = [
    "/usr/bin/kata-fc",
]

# The [runtimes] table MUST be the last thing in this file.
# (Unless another table is added)
# TOML does not provide a way to end a table other than a further table being
# defined, so every key hereafter will be part of [runtimes] and not the main
# config.

$ cat /etc/nvidia-container-runtime/config.toml
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"

[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig.real"

[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
debug = "/tmp/nvidia-container-runtime.log

$ cat /tmp/nvidia-container-runtime.log
2020/04/03 13:23:02 Running /usr/bin/nvidia-container-runtime
2020/04/03 13:23:02 Using bundle file: /home/andrews/.local/share/containers/storage/vfs-containers/614cb26f8f4719e3aba56be2e1a6dc29cd91ae760d9fe3bf83d6d1b24becc638/userdata/config.json
2020/04/03 13:23:02 prestart hook path: /usr/bin/nvidia-container-runtime-hook
2020/04/03 13:23:02 Prestart hook added, executing runc
2020/04/03 13:23:02 Looking for "docker-runc" binary
2020/04/03 13:23:02 "docker-runc" binary not found
2020/04/03 13:23:02 Looking for "runc" binary
2020/04/03 13:23:02 Runc path: /usr/bin/runc
2020/04/03 13:23:09 Running /usr/bin/nvidia-container-runtime
2020/04/03 13:23:09 Command is not "create", executing runc doing nothing
2020/04/03 13:23:09 Looking for "docker-runc" binary
2020/04/03 13:23:09 "docker-runc" binary not found
2020/04/03 13:23:09 Looking for "runc" binary
2020/04/03 13:23:09 ERROR: find runc path: exec: "runc": executable file not found in $PATH
2020/04/03 13:31:06 Running nvidia-container-runtime
2020/04/03 13:31:06 Command is not "create", executing runc doing nothing
2020/04/03 13:31:06 Looking for "docker-runc" binary
2020/04/03 13:31:06 "docker-runc" binary not found
2020/04/03 13:31:06 Looking for "runc" binary
2020/04/03 13:31:06 Runc path: /usr/bin/runc

$ nvidia-container-runtime --version
runc version 1.0.0-rc8
commit: 425e105d5a03fabd737a126ad93d62a9eeede87f
spec: 1.0.1-dev

NVRM version:   440.64.00
CUDA version:   10.2

Device Index:   0
Device Minor:   0
Model:          GeForce RTX 2070
Brand:          GeForce
GPU UUID:       GPU-22dfd02e-a668-a6a6-a90a-39d6efe475ee
Bus Location:   00000000:01:00.0
Architecture:   7.5

$ whereis runc
runc: /usr/bin/runc
$ whereis docker-runc
docker-runc:

$ docker version
Client:
 Version:           18.09.7
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        2d0083d
 Built:             Thu Jun 27 17:56:23 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       afacb8b7f0
  Built:            Wed Mar 11 01:24:19 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

qhaas · 2020-04-03T13:09:58Z

@andrewssobral rootless podman with GPU support is a separate beast, it is being discussed in nvidia-container runtime issue 85

andrewssobral · 2020-04-03T15:59:26Z

@qhaas Thanks!

frankruizhi · 2020-04-13T11:35:30Z

sudo ln -s /usr/bin/nvidia-container-toolkit /usr/bin/nvidia-container-runtime-hook
Solved the problem for me

This really works, with containerd+k8s running on our servers

arshvin · 2020-08-12T23:06:40Z

sudo ln -s /usr/bin/nvidia-container-toolkit /usr/bin/nvidia-container-runtime-hook
Solved the problem for me

I don't know why but sometimes the link /usr/bin/nvidia-container-runtime-hook are not created during installation process of rpm-package nvidia-container-toolkit. I see that this file is owned by mentioned package according to output command:

rpm -qf /usr/bin/nvidia-container-runtime-hook

Reinstallation of the package solves this issue too (i.e. creates the link):

yum reinstall nvidia-container-toolkit -y

KeithTt · 2023-06-18T14:59:43Z

It is really sad that I have tried all the methods above, but it still does not work.

$ whereis runc
runc: /usr/local/bin/runc
$ whereis docker-runc
docker-runc:

$ ll /usr/bin/nvidia-container-toolkit
lrwxrwxrwx 1 root root 38 Jun 18 22:52 /usr/bin/nvidia-container-toolkit -> /usr/bin/nvidia-container-runtime-hook

I have tried to installed nvidia-container-runtime package, and reinstall nvidia-container-toolkit.

$ rpm -qa | grep nvidia
nvidia-container-toolkit-1.13.1-1.x86_64
nvidia-container-toolkit-base-1.13.1-1.x86_64
libnvidia-container-tools-1.13.1-1.x86_64
nvidia-container-runtime-3.13.0-1.noarch
libnvidia-container1-1.13.1-1.x86_64

Is there anyone can help?

$ containerd -v
containerd github.com/containerd/containerd v1.6.6 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1

KeithTt · 2023-06-18T16:24:18Z

Hi @mikeobr, can you do this please:

1. In file `/etc/nvidia-container-runtime/config.toml`, add this:

[nvidia-container-runtime]
debug = "/var/log/nvidia-container-runtime"

2. Run `docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi`

3. Give the content of `/var/log/nvidia-container-runtime`

4. Run `whereis runc`

5. Run `whereis docker-runc`

6. Give the output

It will help us a lot, thank you!

I open the debug option,and exec command

$ nerdctl run --runtime=nvidia --rm nvidia/cuda:12.0.1-cudnn8-runtime-centos7 nvidia-smi
FATA[0000] failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/default/71217aa3e35ca15212f54d1f7ca6111d8faa6f7b8910d9ccfec60507ee0e633f/log.json: no such file or directory): exec: "nvidia": executable file not found in $PATH: unknown

But there is no /var/log/nvidia-container-runtime.log such file, I think maybe it need to restart containerd, but the log file still not exists...

Here is the config:

# grep -v '^#' /etc/nvidia-container-runtime/config.toml
disable-require = false

[nvidia-container-cli]
environment = []
load-kmods = true
ldconfig = "@/sbin/ldconfig"

[nvidia-container-runtime]
debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"

runtimes = [
    "docker-runc",
    "runc",
]

mode = "auto"
    [nvidia-container-runtime.modes.csv]
    mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

elezar · 2023-06-19T09:21:56Z

This issue is that the nvidia-container-runtime is not invoked and as such the log is not generated. If I recall correctly, nerdctl requires the full path to the --runtime (unless it is configured in some other way).

Please try with:

nerdctl run --runtime=/usr/bin/nvidia-container-runtime --rm nvidia/cuda:12.0.1-cudnn8-runtime-centos7 nvidia-smi

This was referenced Jul 24, 2019

Support for NVIDIA GPUs under Docker Compose docker/compose#6691

Closed

Run CARLA in RED HAT. carla-simulator/carla#1917

Closed

RenaudWasTaken closed this as completed Jul 24, 2019

Geobm mentioned this issue Jul 22, 2020

Service 'vehicle_counting' failed to build: OCI runtime create failed ahmetozlu/vehicle_counting_tensorflow#70

Open

peng-yq mentioned this issue Sep 7, 2023

There is no runtime output in dockerd's logs moby/moby#45178

Open

docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error #1017

docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error #1017

Comments

lychenpan commented Jul 20, 2019

1. Issue or feature description

2. Steps to reproduce the issue

3. Information to attach (optional if deemed irrelevant)

lychenpan commented Jul 20, 2019

jjacobelli commented Jul 20, 2019

lychenpan commented Jul 21, 2019

jjacobelli commented Jul 22, 2019

mikecouk commented Jul 22, 2019

mikeobr commented Jul 22, 2019

jjacobelli commented Jul 22, 2019

mikeobr commented Jul 22, 2019 • edited Loading

jjacobelli commented Jul 22, 2019

mikecouk commented Jul 22, 2019

kosuke55 commented Jul 22, 2019

brett-lempereur commented Jul 22, 2019

mikecouk commented Jul 22, 2019

rajatchopra commented Jul 22, 2019 • edited Loading

isildur7 commented Jul 22, 2019

rajatchopra commented Jul 22, 2019

mikeobr commented Jul 22, 2019

ramab1988 commented Jul 23, 2019 • edited Loading

mikeobr commented Jul 23, 2019 • edited Loading

jjacobelli commented Jul 23, 2019

mikeobr commented Jul 23, 2019

jjacobelli commented Jul 23, 2019

mikeobr commented Jul 23, 2019

RenaudWasTaken commented Jul 23, 2019

mikeobr commented Jul 23, 2019

qhaas commented Jul 23, 2019 • edited Loading

RenaudWasTaken commented Jul 23, 2019

RenaudWasTaken commented Jul 24, 2019

ahming commented Sep 4, 2019

marchinho11 commented Sep 15, 2019 • edited Loading

ghost commented Sep 24, 2019

nagdevAmruthnath commented Oct 18, 2019

acidfreako commented Dec 25, 2019

Clayton-Klemm commented Jan 2, 2020

litingzhou1 commented Feb 14, 2020

andrewssobral commented Apr 3, 2020

qhaas commented Apr 3, 2020

andrewssobral commented Apr 3, 2020

frankruizhi commented Apr 13, 2020

arshvin commented Aug 12, 2020

KeithTt commented Jun 18, 2023

KeithTt commented Jun 18, 2023 • edited Loading

elezar commented Jun 19, 2023

mikeobr commented Jul 22, 2019 •

edited

Loading

rajatchopra commented Jul 22, 2019 •

edited

Loading

ramab1988 commented Jul 23, 2019 •

edited

Loading

mikeobr commented Jul 23, 2019 •

edited

Loading

qhaas commented Jul 23, 2019 •

edited

Loading

marchinho11 commented Sep 15, 2019 •

edited

Loading

KeithTt commented Jun 18, 2023 •

edited

Loading