-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad 1.0.3 nvml issues. #10012
Comments
Hi @remus-corneliu!
This is not a supported environment for running Nomad clients. It's technically possible but it's fraught with very low-level details that you'll need to get right, including running the container with
This looks like Nomad somehow managed to fingerprint an Nvidia GPU... not sure what PCI pass-thru it going to look like in the "turducken" environment you have there of nested virtualization layers. But it also looks like the container doesn't meet the installation requirements. This would have worked in pre-0.9.0 versions of Nomad that didn't have the device drivers. You can disable the plugin via the following config: plugin "nvidia-gpu" {
config {
enabled = false
}
} |
@remus-corneliu
|
Looks like this one has been answered and we haven't heard back, so I'm going to close this issue. |
@tgross this is required for the alpine containers if you are using nomad binary in alpine container. I'm not sure how consul terraform consul-template binary works fine except nomad. |
Because Consul, Terraform, and consul-template don't run GPU workloads, so they don't have dependencies on the nvidia driver (that's what nvml is), and nvidia is dependent on glibc, which doesn't really belong in Alpine. (Take that up with Alpine or nvidia 😁 ) #10796 externalizes the driver, but that PR hasn't been merged. Also, just a heads up that I'm not at HashiCorp anymore and this issue has been closed since February. You almost certainly shouldn't be running a Nomad client in a container anyways, but if you want to, you'll want to open a new issue and bring it to the attention of the maintainers. |
I believe running a nomad client in a container is a very valid use case when you consider GitOps and CI where you may need to run a job in nomad during a CI. Same reason you may want to run vault, terraform, packer, consul or any of the other HashiCorp tools. No, you are not going to run the server or agents, but the client is still a valid use case. Anyway, this is where I'm at, trying to get all of the HashiCorp tools we use, or may want to use, in a docker container that is used as the CI run container. Seems, since it is written in Go like all (most?) HashiCorp tools it should just work as a single binary statically linked. What I'm getting is: |
Apparently there is a community version, and you can just do a apk add nomad. This is a community version though, not straight from HashiCorp. |
I'm trying to run nomad in a docker container. version 0.8.7 results in a correct startup. while the latest version 1.0.3 results in the following
Error relocating /bin/nomad: nvmlDeviceGetGraphicsRunningProcesses: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlEventSetWait: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetPerformanceState: symbol not found
nomad-server | Error relocating /bin/nomad: __snprintf_chk: symbol not found
nomad-server | Error relocating /bin/nomad: __vfprintf_chk: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlSystemGetDriverVersion: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetPowerManagementLimit: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetMemoryErrorCounter: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetUUID: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlInit_v2: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetMaxClockInfo: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetUtilizationRates: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetMinorNumber: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlEventSetCreate: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetHandleByIndex_v2: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetMemoryInfo: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetAccountingMode: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetPciInfo_v3: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlSystemGetProcessName: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetPersistenceMode: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetClockInfo: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetComputeRunningProcesses: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetMaxPcieLinkWidth: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetAccountingBufferSize: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetCount_v2: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceRegisterEvents: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetPowerUsage: symbol not found
nomad-server | Error relocating /bin/nomad: __vsnprintf_chk: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetDisplayActive: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetEncoderUtilization: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetTemperature: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlEventSetFree: symbol not found
nomad-server | Error relocating /bin/nomad: __longjmp_chk: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetDisplayMode: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetBAR1MemoryInfo: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlShutdown: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetCurrentClocksThrottleReasons: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlErrorString: symbol not found
nomad-server | Error relocating /bin/nomad: __dprintf_chk: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetDecoderUtilization: symbol not found
nomad-server | Error relocating /bin/nomad: __fprintf_chk: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetMaxPcieLinkGeneration: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetPcieThroughput: symbol not found
nomad-server | Error relocating /bin/nomad: nvmlDeviceGetName: symbol not found
A link to the repository https://github.com/remus-corneliu/nomad
local environment setup :
docker for windows v20.10.2 with wsl2 support and running ubuntu18.04 distro
The text was updated successfully, but these errors were encountered: