You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have enabled GPU plugin on our Nomad client node, however GPU jobs are not getting allocated on this node. Furthermore running a 'nomad node status' on this node does not give us any info on number of GPUs available.
I can see the plugin installed under the plugins directory. And i am able to run nvidia/cuda docker container on this node directly plus nvidia-smi command gives me the correct info about the number of GPUS available.
I suspect that this is a plugin issue, how can i verify if the plugin has been installed correctly ?
Or could the issue be something else ?
Thanks!
The text was updated successfully, but these errors were encountered:
bettaps
changed the title
Is there a way to verify if plugin is installed and running successfully ?
Is there a way to verify if plugin is installed and running as it should be ?
Feb 7, 2024
Do your Nomad client logs say that your GPU is failing to get fingerprinted? If so, that was the same issue I was having. The only fix I found was to re-compile the nomad-device-nvidia plugin with an updated version of the go-nvml library (specifically, v0.12.0-2)
Hello,
We have enabled GPU plugin on our Nomad client node, however GPU jobs are not getting allocated on this node. Furthermore running a 'nomad node status' on this node does not give us any info on number of GPUs available.
I can see the plugin installed under the plugins directory. And i am able to run nvidia/cuda docker container on this node directly plus nvidia-smi command gives me the correct info about the number of GPUS available.
I suspect that this is a plugin issue, how can i verify if the plugin has been installed correctly ?
Or could the issue be something else ?
Thanks!
The text was updated successfully, but these errors were encountered: