Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbol not found in build #1

Closed
ex0ns opened this issue Sep 17, 2021 · 5 comments
Closed

Symbol not found in build #1

ex0ns opened this issue Sep 17, 2021 · 5 comments
Assignees
Labels

Comments

@ex0ns
Copy link

ex0ns commented Sep 17, 2021

Hello,
I wanted to try this docker to run nomad, but I ended up having some issue, when the container start, I get the following log:

Error relocating /bin/nomad: nvmlDeviceGetGraphicsRunningProcesses: symbol not found
Error relocating /bin/nomad: nvmlEventSetWait: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetPerformanceState: symbol not found
Error relocating /bin/nomad: nvmlSystemGetDriverVersion: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetPowerManagementLimit: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetMemoryErrorCounter: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetUUID: symbol not found
Error relocating /bin/nomad: nvmlInit_v2: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetMaxClockInfo: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetUtilizationRates: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetMinorNumber: symbol not found
Error relocating /bin/nomad: nvmlEventSetCreate: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetHandleByIndex_v2: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetMemoryInfo: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetAccountingMode: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetPciInfo_v3: symbol not found
Error relocating /bin/nomad: nvmlSystemGetProcessName: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetPersistenceMode: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetClockInfo: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetComputeRunningProcesses: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetMaxPcieLinkWidth: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetAccountingBufferSize: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetCount_v2: symbol not found
Error relocating /bin/nomad: nvmlDeviceRegisterEvents: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetPowerUsage: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetDisplayActive: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetEncoderUtilization: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetTemperature: symbol not found
Error relocating /bin/nomad: nvmlEventSetFree: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetDisplayMode: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetBAR1MemoryInfo: symbol not found
Error relocating /bin/nomad: nvmlShutdown: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetCurrentClocksThrottleReasons: symbol not found
Error relocating /bin/nomad: nvmlErrorString: symbol not found
Error relocating /bin/nomad: __dprintf_chk: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetDecoderUtilization: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetMaxPcieLinkGeneration: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetPcieThroughput: symbol not found
Error relocating /bin/nomad: nvmlDeviceGetName: symbol not found

I think it has to do with nvidia deps
Are you still using this image ? do you have any idea on when this was introduced ?

Thanks

@mgiaccone
Copy link

I'm experiencing the same issue. It looks like it was introduced in the last build as the docker containers have been updated about 18 hours ago.

@ex0ns
Copy link
Author

ex0ns commented Sep 17, 2021

I found a solution (really ugly) inside this thread: hashicorp/nomad#10012 and this one hashicorp/nomad#5535

RUN objdump -T  nomad  | grep -o "nvml.*" | sort -u | sed 's/\(nvml.*\)/extern int \1(){ return 1; }/g' > nvml.c
RUN gcc -c nvml.c -fpic
RUN gcc -shared -o nvml.so nvml.o
RUN rm nvml.o nvml.c

... REST OF THE BUILDER

# Start of the second step

ENV LD_PRELOAD /bin/nvml.so

It's working fine with this solution, but I'm pretty sure that, if it was introduced in the latest build, it might be nice to raise an issue to Hashicorp, however as this repo has daily built, I did not want to test every single version of it.
I think that adding a

RUN /bin/nomad version

At the end of the docker could help spot this issue

@hendrikmaus hendrikmaus self-assigned this Sep 19, 2021
@hendrikmaus
Copy link
Owner

Hey, I will look into it.

I recently moved away from Travis and notices that builds had been failing for a long time. I tried to simplify the Dockerfile and I reckon I introduced this issue by doing that.

"I wanted to try this docker to run nomad (...)" @ex0ns

Please mind that this image is not intended to run nomad agents; it is only intended to be used in CI/CD pipelines as a containerized cli client.

@ex0ns
Copy link
Author

ex0ns commented Sep 19, 2021

yes I'm aware of that, thanks for making it clear, I wanted to use this image as a CLI client, not a node client 👌

@hendrikmaus
Copy link
Owner

hendrikmaus commented Sep 19, 2021

Fixed using:

 # per https://github.com/hashicorp/nomad/issues/5535#issuecomment-651888183
 && export -n LD_BIND_NOW 

and

 # per https://github.com/sgerrand/alpine-pkg-glibc/issues/51#issuecomment-302530493
 && apk del libc6-compat

Also added a call to nomad version as last statement (thanks @ex0ns).

Builder pipeline already ran and rebuilt all images >= 1.0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants