-
Notifications
You must be signed in to change notification settings - Fork 2k
Docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]] #1470
Comments
Hi @btalb This should hopefully help to clear up some of the confusion about how the stack is organized: Regarding the issue you linked. Everything seems normal except that he is running docker version Can you have him try either:
|
Also -- to be clear -- the exact error you are seeing is coming from Docker 19.03 added code to call directly out to our container stack at the |
Thanks @klueska, that linked post is wonderful, and helps clear up a lot of confusion! I don't know if it's feasible, but it would be nice to have that sort of information at least in the README of this repo. When I google "github nvidia-container-toolkit", this repo is the first result. As to the actual issue, we've got a number of machines in the lab running When you say "call directly out to our container stack at the I've done some digging into Docker's source and think this part points to this as the command being run:
Is that correct? When I run that manually on my machine it sits there blocking with no output. Is that to be expected? Any further tips would be greatly appreciated. I am just trying to get to a point where we get some more meaningful output about what is going on. |
One thing to note -- the snap installation of docker on Ubuntu is known to have this issue. |
1. Issue or feature description
Docker containers fail to start with any of the
--gpu
options, reporting the following error:2. Steps to reproduce the issue
Cannot be reproduced on any of our machines, but happens consistently for a user of our software (which relies on GPU access within Docker containers).
Users with the issue are: @gmuraleekrishna, @hagianga21
We have tried all of the common suggestions like restarting Docker services, rebooting, fiddling with GPU arguments, reverting to base CUDA images, etc.
3. Information to attach (optional if deemed irrelevant)
Please see this issue where we've gathered whatever information I thought may be relevant. We have tried a number of things, but can't successfully start a GPU-enabled Docker container under any circumstances.
We also can't seem to get any extra relevant information besides that non-descript Docker daemon response error.
Notes
To be fair, I'm confused by all of the conflicting information I'm finding about how the toolkit works (or even what it is):
daemon.json
file or runtime, and all the debugging tips have suggestions that work with the runtimenvidia-docker2
), an entire set of packages, a package within that set. What exactly is it? And should it work by itself??nvidia-container-runtime
when the apt package tree suggests it isn't used / installed??TL;DR: I've spent a week digging through this trying to understand what's causing this bug, and I'm still no clearer on how this system works, what's legacy & what isn't, how to get debugging information, or where the issue could be coming from. Any help / tips would be greatly appreciated.
The text was updated successfully, but these errors were encountered: