-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0/2 nodes are available: 2 Insufficient nvidia.com/gpu #159
Comments
I am facing the same issue , going through the container logs it is throwing below error , which I assume is something wrong with the image itself:
|
I think you need to use as base Dockerfile image the nvidia one: |
You mean in the pod spec file ? Even after I use the above image I am seeing error :
|
So I skipped that example pod, and tried this deployment with less no, of replicas and it worked fine https://github.com/NVIDIA/k8s-device-plugin/blob/examples/workloads/deployment.yml |
Hello! Sorry for the lag, can you fill the default issue template, this is usually super helpful and it's easier to help :)
|
@RenaudWasTaken I think the issue is Docker default runtime is unable to set "nvidia" for Docker 19.03, runtime : nvidia has been deprecated, we need a fix on that |
Removed my previous comment with a link to this one so that there is one canonical place with a response to this issue: |
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed. |
This issue was automatically closed due to inactivity. |
Facing this old issue.
I have gone through all the relevant workarounds, although still the issue persists.
Kubernetes version: 1.14
Docker version on GPU node: 19.03.6
GPU node: 4 x GTX1080Ti
I am trying to deploy this example:
And I am getting the following error:
0/2 nodes are available: 2 Insufficient nvidia.com/gpu
Specifying the GPU node explicitly on the deployment yaml I am getting the following error:
Update plugin resources failed due to requested number of devices unavailable for nvidia.com/gpu. Requested: 1, Available: 0, which is unexpected.
/etc/docker/daemon.json on GPU node:
I have restarted docker and kubelet.
I am using this nvidia daemon:
https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml
Should I
-label somehow the GPU node that has nvidia gpu?
-restart master node?
Any help here is more than welcome !
The text was updated successfully, but these errors were encountered: