-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The first client of the cluster don't get to run a docker task #3159
Comments
I have a server and three clients, configurations bellow. Server Configuration:
Clients Configuration:
Docker Version:
|
Hey, Do all docker allocations on that node fail? Can you run a docker container on that node outside of Nomad binding to the same address? Also are the other clients using IPv6 as well? |
Hi @dadgar, After some tests, all works fine. I just deleted the Nomad and Consul data from nodes and restarted all my cluster, now there are allocations on all nodes running perfectly. Maybe, during configuration changes, some data became out of sync or corrupted, I don't know. Thank you. |
@rodrigozc I am experiencing this same error. I did try completely reinstalling nomad and consul and I still see the errors. Maybe I am not clearing out consul completely? |
@dadgar Had the same issue today. Nomad is trying to bind every container on the host to a ipv6 link-local address, this address does not exist on the host and the container isn't running ipv6. (Host) (Sample from another container running the same job) Some allocations on the host: Task "nginx-exporter" is "dead" 10/10/17 08:10:06 UTC Driver Failure failed to start task "nginx-exporter" for alloc "e70c5e05-59ed-fe32-197c-ba30a509adbf": Failed to start container 6395902fbb771a70e52f26dd944c1d2767e519276aafa239f6c8620f70f1ad19: API error (500): {"message":"driver failed programming external connectivity on endpoint nginx-exporter-e70c5e05-59ed-fe32-197c-ba30a509adbf (34281a156b777a7c8b1146e2a422e346d54e24ed1317eca5bd90c66c4fb944b1): Error starting userland proxy: listen tcp [fe80::f014:7aff:fee6:11cc]:23272: bind: invalid argument"} Issue resolved after restarting nomad on the host. Hope this helps. |
@lovwal Did the IPs/interface change on that machine since Nomad started? I can't see how Nomad would pick a non-existent address. As mentioned originally, when there are many routable interfaces Nomad will just pick one. You should specify the interface to use if this is the case. |
There's only one routeable interface on the machine, the address did not change on it. |
I just ran into this problem as well. My nomad configuration specifies
and as you might guess I have multiple such bridges. They have IP addresses, and my mistaken IP address does in fact come from that interface. Except in my case, this is true even after re-installing the entire operating system: |
Turns out this can be triggered with link-local addresses, which in my setup multiple interfaces happened to share. I moved on to the next error when I applied https://www.nomadproject.io/docs/configuration/client/#fingerprint-network-disallow_link_local |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
If you have a question, prepend your issue with
[question]
or preferably use the nomad mailing list.If filing a bug please include the following:
Nomad version
Nomad v0.6.2
Operating system and Environment details
Issue
The first client of the cluster, that runs a server too, don't get to run a docker task, the other clients work fine.
The allocation log shows the output bellow:
Reproduction steps
Run the job and obserbe that first none can't get to run a task.
Nomad Client/Server logs
Job file (if appropriate)
The text was updated successfully, but these errors were encountered: