-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduling Jobs with Docker Fails with Error "bind: address already in use" #2402
Comments
+1 to this, happening time-to-time in my test cluster too. Was not lucky enough yet to see what was wrong but suspect that Nomad is trying to allocate a TCP port in a TIME_WAIT state. I see this issue in v0.5.2 |
@xytis Can you show the output of |
|
@xytis It might be too late but is anything bound to that port? |
I can not be certain. But host is running docker and nomad. I suspected they could have made the collision happen on their own, due to some race condition. At the time of reporting, the port was free. |
Happened to me again. This time after an allocation was unsuccessful in the first attempt for some other reason and then continuing in the second attempt. Second attempt resulted in with this issue. |
@gokhansengun can you share the output of |
@dadgar I am running a test cluster where I test lots of things. Sorry cleaned everything for this time but happening nearly daily, will update here with the info. |
@dadgar did not need to wait until tomorrow, happened again, logs below.
|
@gokhansengun If the allocations are all still there or next time it happens can you run:
|
@dadgar added logs to secret Gist and shared with you in Gitter. |
We see this issue about once a week in our production env. I would like to see Nomad intercept this error and attempt to restart the container on another random port. Is this possible? |
@gokhansengun Did changing the available port range fix this for you? We talked about that on Gitter. |
@dadgar sorry Alex, I did not update you guys on this. I waited 0.5.6 before doing the change you suggested. Then all of a sudden, issue has not occurred again. I am not sure if it was < 0.5.6 or the refactoring I did all over the job dispatching. |
@gokhansengun Thanks for the update. @grobinson-blockchain What version of Nomad are you on? |
We are on Nomad v0.5.6. |
We are facing the same issue on Nomad v0.5.6. A job with 120 task-groups with one task each. One of the tasks did not start for the same reason. |
Not sure if this is fixed. Running Nomad v0.8.0 saw this in a deployment of a docker task with a count of 70:
I can't get this thing to get out of the pending state =/. Had to drain the node it was on then disable the drain. Any ideas? |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.5.4
Operating system and Environment details
Issue
Reproduced #1814
Reproduction steps
Happens in production. Not reproduced in test environment, suspect that nomad performs allocation on a port which was just released from previous allocation.
Logs
Note that nomad client is not responsive during the deploy attempts (that might be a separate bug)
The text was updated successfully, but these errors were encountered: