-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nomad agent dev fails to start with 'undefined symbol: nvmlDeviceGetPciInfo_v3' #8303
Comments
Sorry for the slow response. Looks like nomad requires a more recent driver than the ones that are bundled with Linux kernel. Can you try upgrading your driver and let us know if that works? It seems that Linux is bundling legacy drivers by default. Nomad currently requires a more recent versions like the ones bundled with CUDA 9 or 10. I've tested nomad against driver 484.11 (bundled with CUDA 9) and that worked:
|
The way I installed the drivers was |
FWIW, these drivers are not that old: when I search the official website, I get this: https://www.nvidia.com/object/product_quadro_nvs_295_us.html
I will try to follow the wizard here: https://developer.nvidia.com/cuda-downloads to get the latest |
Update: looks this won't happen anytime soon ... way too much download ~ 2 GiB ... 😢 EDIT: I cancelled this operation and tried installing from PPA |
It's strange - when I looked for 340.108, I noticed it was marked legacy even though it was released in 2019 - e.g. https://forums.developer.nvidia.com/t/linux-solaris-and-freebsd-driver-340-108-legacy-for-geforce-8-and-9-series/109520#5414137 . Stepping back a bit - let me clarify the use case. Are you actually planning to use this GPU with nomad for machine learning/CUDA-like workloads? Or is it that you are trying to start nomad on a server that just happened to have GPU though it's not critical to the nomad case? Also, mind if you try running the nomad agent found in https://79969-36653430-gh.circle-artifacts.com/0/builds/nomad_linux_amd64.zip |
This is not a critical system. I just happened to have an old display card and decided to set it up on an old desktop (which was already running Elementary Linux) I wouldn't be really using this for any real word CUDA workloads, though it would be good to have, as I could run trivial CUDA things on my desktop. That said, if this doesn't fit on the roadmap due to it's "non real" use case, I am fine with that. (1) Though, in that case, what would be the proper way to disable nvidia detection altogether during Nomad startup. |
Update: I tried adding the nvidia ppa and manually installing the "latest" available driver. this broke the nvidia driver altogether, I am down to VESA mode, but the agent starts now 🙄 .
|
I'm very sorry that I have your system borked :(. Also, I fully agree that nomad agent should function with legacy nvidia drivers - the agent should start normally but without nvidia support. We'll follow up. One odd thing is in my testing, I noticed that Ubuntu 18.04 offers nvidia-driver-440 (and other versions as well):
|
This brings up a new question in my mind |
That's OK, always ready for trial-and-error to get Nomad working! 😄 🛩️ BTW, did you try with the For me this is not "Ubuntu" Ubuntu, it is Elementary Linux (a desktop oriented) distro, hence I would prefer having the display driver working, higher resolution etc. OK, after refining my apt search ...
I will try with 440 now. |
This doesn't work with my correct working display driver v 340. I will try with v 440 and try again
|
Newer drivers are not working. Though, I wish there was a clean fix for this! :) |
@shantanugadgil We just merged an option for disabling the nvidia driver and it should be out in 0.12.1. Thanks for raising the issue. |
: waiting eagerly for 0.12.1 to test on my machine : 😁 |
For basic testing, you can try the binaries found in https://app.circleci.com/pipelines/github/hashicorp/nomad/10642/workflows/1ff98cc1-e847-434f-aff4-05acfbb6f993/jobs/84842/artifacts along with the config from the PR: plugin "nvidia-gpu" {
config {
enabled = false
}
} Please try it and let me know how it goes! |
The Nomad agent is starting with the mentioned config above. |
Perfect - thanks for letting us know! |
Any chances of getting older drivers to work with Nomad in the foreseeable future? |
I suspect we'll unlikely try to support older drivers without strong demand; we'd be happy to link to community drivers if one exists ;-). |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.12.0-beta2 (5b80d4e)
Same with Nomad 0.11.3 GA
Operating system and Environment details
Elementary Linux 5.x (based off Ubuntu 18.04)
Issue
Nomad agent fails to start with the following error:
Reproduction steps
run "nomad agent -dev"
Job file (if appropriate)
n/a
Nomad Client logs (if appropriate)
Additional information that might be useful:
Steps to install the drivers was:
Output of nvidia-smi
The text was updated successfully, but these errors were encountered: