Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bind: address already in use - 0.8.0 #4360

Closed
djenriquez opened this issue May 31, 2018 · 7 comments
Closed

bind: address already in use - 0.8.0 #4360

djenriquez opened this issue May 31, 2018 · 7 comments

Comments

@djenriquez
Copy link

djenriquez commented May 31, 2018

Reference #2402, rarely our deployments run into the bind: address already in use issue, but they do. Copying + pasting from my comment in #2402:

Not sure if this is fixed. Running Nomad v0.8.0 saw this in a deployment of a docker task with a count of 70:

nomad -v
Nomad v0.8.0 (751b42aaa60f1bc679e5062e16db7d15f8ff9654)

docker -v
Docker version 17.12.1-ce, build 3dfb8343b139d6342acfd9975d7f1068b5b1c3d3
Failed to start container 25d554b3682d850580bb41d223cb82db891932f5511e191bad822b11c11379c9: API error (500): {"message":"driver failed programming external connectivity on endpoint <REDACTED>-9b51d619-f7ea-105d-c6a0-40fec309c38e (4c0681fe8b3a36ae62bdf225a2d2531bf6b494268852763534585860e78e0544): Error starting userland proxy: listen tcp 10.7.10.154:28268: bind: address already in use

I can't get this thing to get out of the pending state =/. All future deployments for this job fail because of the allocation that is stuck pending. I had to drain the node it was on then disable the drain.

Any ideas?

Reproduction steps

This issue is very hard to reproduce, as I have only seen it 2-3 times in the past month for our environment which perform deploys few times a day.

Nomad Client logs (if appropriate)

May 31 20:36:57 ip-10-7-10-154 nomad[3582]: driver.docker: created container 150809332ccbe5de92ee870c68228f4f701dfef3a61d0a42ba6765aa096de1c9
May 31 20:36:58 ip-10-7-10-154 nomad[3582]: driver.docker: failed to start container "150809332ccbe5de92ee870c68228f4f701dfef3a61d0a42ba6765aa096de1c9" (attempt 1): API error (500): {"message":"driver failed programming external connectivity on endpoint core-api-9b51d619-f7ea-105d-c6a0-40fec309c38e (b0d70c8b056571c67dadd538e841f4ce7e54d69be5dab339ce5d2e967befbac8): Error starting userland proxy: listen tcp 10.7.10.154:28268: bind: address already in use"}
May 31 20:36:59 ip-10-7-10-154 nomad[3582]: driver.docker: failed to start container "150809332ccbe5de92ee870c68228f4f701dfef3a61d0a42ba6765aa096de1c9" (attempt 2): API error (500): {"message":"driver failed programming external connectivity on endpoint core-api-9b51d619-f7ea-105d-c6a0-40fec309c38e (d7febb401927db5f291bd759cf095c98de09aa8e76e2def1d9f16c1f5587d159): Error starting userland proxy: listen tcp 10.7.10.154:28268: bind: address already in use"}
May 31 20:37:00 ip-10-7-10-154 nomad[3582]: driver.docker: failed to start container "150809332ccbe5de92ee870c68228f4f701dfef3a61d0a42ba6765aa096de1c9" (attempt 3): API error (500): {"message":"driver failed programming external connectivity on endpoint core-api-9b51d619-f7ea-105d-c6a0-40fec309c38e (69d932c819959b3d948d8dda5e9651e6786b0975a0a5dadbe684b442f9e0ae71): Error starting userland proxy: listen tcp 10.7.10.154:28268: bind: address already in use"}
May 31 20:37:01 ip-10-7-10-154 nomad[3582]: driver.docker: failed to start container "150809332ccbe5de92ee870c68228f4f701dfef3a61d0a42ba6765aa096de1c9" (attempt 4): API error (500): {"message":"driver failed programming external connectivity on endpoint core-api-9b51d619-f7ea-105d-c6a0-40fec309c38e (0f803d9a4ea0a6000ab611df27c8b8fc5e76361eff2a2d7a0a483876c1a40c5d): Error starting userland proxy: listen tcp 10.7.10.154:28268: bind: address already in use"}
May 31 20:37:02 ip-10-7-10-154 nomad[3582]: driver.docker: failed to start container "150809332ccbe5de92ee870c68228f4f701dfef3a61d0a42ba6765aa096de1c9" (attempt 5): API error (500): {"message":"driver failed programming external connectivity on endpoint core-api-9b51d619-f7ea-105d-c6a0-40fec309c38e (185adaf485f7536f6e86cf6d3a86f7476137b84072ee574cb8efd89e2d34d039): Error starting userland proxy: listen tcp 10.7.10.154:28268: bind: address already in use"}
May 31 20:37:03 ip-10-7-10-154 nomad[3582]: driver.docker: failed to start container "150809332ccbe5de92ee870c68228f4f701dfef3a61d0a42ba6765aa096de1c9" (attempt 6): API error (500): {"message":"driver failed programming external connectivity on endpoint core-api-9b51d619-f7ea-105d-c6a0-40fec309c38e (ae99cb80720f3f881e51bf8771e37156eceb1b634ad943f274e52408cf90bd9a): Error starting userland proxy: listen tcp 10.7.10.154:28268: bind: address already in use"}
May 31 20:37:03 ip-10-7-10-154 nomad[3582]: driver.docker: failed to start container 150809332ccbe5de92ee870c68228f4f701dfef3a61d0a42ba6765aa096de1c9: API error (500): {"message":"driver failed programming external connectivity on endpoint core-api-9b51d619-f7ea-105d-c6a0-40fec309c38e (ae99cb80720f3f881e51bf8771e37156eceb1b634ad943f274e52408cf90bd9a): Error starting userland proxy: listen tcp 10.7.10.154:28268: bind: address already in use"}
May 31 20:37:03 ip-10-7-10-154 nomad[3582]: client: failed to start task "core-api" for alloc "9b51d619-f7ea-105d-c6a0-40fec309c38e": Failed to start container 150809332ccbe5de92ee870c68228f4f701dfef3a61d0a42ba6765aa096de1c9: API error (500): {"message":"driver failed programming external connectivity on endpoint core-api-9b51d619-f7ea-105d-c6a0-40fec309c38e (ae99cb80720f3f881e51bf8771e37156eceb1b634ad943f274e52408cf90bd9a): Error starting userland proxy: listen tcp 10.7.10.154:28268: bind: address already in use"}

Maybe a quick solution is if the address was not released by the OS or docker for some reason, to have Nomad try a different port?

@djenriquez
Copy link
Author

Also, all other 69 allocations deployed to healthy, it was just this one.

@djenriquez
Copy link
Author

djenriquez commented May 31, 2018

Nope, the node drain --> drain disable did not work, had to completely terminate the box to get nomad to throw the allocation elsewhere...

@shantanugadgil
Copy link
Contributor

Hi, on a side note, do you have a reschedule block. (This could help reallocating the allocation elsewhere)

For me restarting the Docker service is usually adequate to make Nomad reschedule the jobs.

@djenriquez
Copy link
Author

Ah, I do not have a reschedule stanza! I think that will suffice my needs. This happens rare enough that I think that is good enough for a resolution for me, though I don't know if we should still spend time to figure this rare occasion out? I'm good w/ the reschedule stanza solution.

@dadgar
Copy link
Contributor

dadgar commented Jun 1, 2018

@djenriquez I am going to close since there isn't enough information but we can definitely reopen as you gather more next time you hit this. A few suggestions:

  1. Ensure your OS's ephemeral port range doesn't overlap with Nomads: https://www.nomadproject.io/guides/cluster/requirements.html#ports-used
  2. Are you sure your jobs aren't binding to ports Nomad hasn't given them?
  3. Next time this happens determine what is bound to that port and then check if Nomad double allocated the port (that would be a real bug but don't believe so), see if it is one of your jobs that is binding to ports it shouldn't etc.
  4. Nomad 0.8.4 will make this better since you can have rescheduling during your deployments so you can just lets that alloc fail and it will get rescheduled.

@dadgar dadgar closed this as completed Jun 1, 2018
@djenriquez
Copy link
Author

Thanks @dadgar! Yea just realized 0.8.3 doesn't run reschedules during deploys. Thanks for the info.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants