Scheduling Jobs with Docker Fails with Error "bind: address already in use" #1814

vrenjith · 2016-10-14T16:46:26Z

Nomad version

0.3.2

Operating system and Environment details

Ubuntu 14.04 LTS

Issue

Job run using nomad runs into the following issue from the docker driver

network docker_gwbridge: Error starting userland proxy: listen tcp 10.9.45.103:40246: bind: address already in use\n"

Reproduction steps

We do continuous deployment in the clusters (every hour) and we see this error once in 1-2 days.

This raises a question on how nomad is allocating a free port and supplying that to the docker driver. In this process it appears as if it is allocating a port that is already used by one of the already deployed containers.

The text was updated successfully, but these errors were encountered:

dadgar · 2016-10-14T17:17:45Z

Hey,

Can you check if any other allocation was assigned that port?

vrenjith · 2016-10-20T04:49:38Z

@dadgar
All the processes are managed by nomad and isn't that something already known to nomad?

dadgar · 2016-10-20T19:23:50Z

It is something known to nomad, I was more asking if you could see by listing the allocs if multiple were assigned that port. It would help determine if there was a bug

vrenjith · 2016-10-25T18:15:14Z

Okay let me see I can check this when it occurs again. It's there an easy
way to look at these ports being allocated by 'nomad or do we need to list
all allocations, find corresponding docker containers and then do a docker
inspect to find the port?

Regards,
Renjith

dadgar · 2016-11-04T22:14:40Z

PR #1938 fixes an issue around this. Given the detail it is hard to tell if this fixes the particular bug reported but I think it may. If it comes up post 0.5 we will re-open

camerondavison · 2017-02-01T16:08:26Z

FYI I am running 0.5.2 and saw

Driver Failure  failed to start task '----' for alloc '51f0c2e1-dc72-a767-cb7d-3766c4730482': Failed to start container e30a46ccefe8ca4280a15bad9f11e3604a2d4377ee912802b9413bfb184bd331: API error (500): {"message":"driver failed programming external connectivity on endpoint -------51f0c2e1-dc72-a767-cb7d-3766c4730482 (a60b5e26389e61f06f3c1693fb66cd0fb9c00b029fba3e5f5427884b1d0cefe4): Error starting userland proxy: listen tcp 172.31.25.182:49918: bind: address already in use"}

docker version 1.12.3

sevenT · 2017-08-16T13:04:02Z

still find this using Nomad v0.6.0, when i scheduling 5 job at same ip

Nomad Version

Nomad v0.6.0

Docker

Docker version 1.9.1

Job Summary

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
echo        0       0         3        4       1         0

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created At
0536201a  c4d676aa  echo        4        run      failed   08/16/17 20:42:02 CST
0dccafae  68cd8383  echo        4        run      running  08/16/17 20:42:02 CST
1ff7a960  68cd8383  echo        4        run      running  08/16/17 20:42:02 CST
5662b866  c4d676aa  echo        4        run      failed   08/16/17 20:42:02 CST
db1634eb  c4d676aa  echo        4        run      running  08/16/17 20:38:52 CST

alloc-status

ID                  = 0536201a
Eval ID             = bc64482e
Name                = simple.echo[4]
Node ID             = c4d676aa
Job ID              = simple
Job Version         = 4
Client Status       = failed
Client Description  = <none>
Desired Status      = run
Desired Description = <none>
Created At          = 08/16/17 20:42:02 CST
Deployment ID       = 6f3d6638
Deployment Health   = unhealthy

Task "server" is "dead"
Task Resources
CPU      Memory   Disk     IOPS  Addresses
500 MHz  256 MiB  300 MiB  0     simple: 127.0.0.1:23041

Task Events:
Started At     = N/A
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                   Type            Description
08/16/17 20:42:11 CST  Not Restarting  Error was unrecoverable
08/16/17 20:42:11 CST  Driver Failure  failed to start task "server" for alloc "0536201a-e3b8-6066-54a4-8ff0546ac6bb": Failed to start container f9fbef82ca395a7f02943fa3268e812fb97664b6d6e5fb20143ed29e4ce37c0e: API error (500): Cannot start container f9fbef82ca395a7f02943fa3268e812fb97664b6d6e5fb20143ed29e4ce37c0e: failed to create endpoint server-0536201a-e3b8-6066-54a4-8ff0546ac6bb on network bridge: Error starting userland proxy: listen tcp 127.0.0.1:23041: bind: address already in use
08/16/17 20:42:02 CST  Task Setup      Building Task Directory
08/16/17 20:42:02 CST  Received        Task received by client

github-actions · 2022-12-10T02:15:24Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

vrenjith changed the title ~~Scheduling Jobs with Docker Fails with Error "network docker_gwbridge: Error starting userland proxy: listen tcp 10.9.45.103:40246: bind: address already in use"~~ Scheduling Jobs with Docker Fails with Error "bind: address already in use" Oct 14, 2016

dadgar added the stage/waiting-reply label Oct 14, 2016

dadgar mentioned this issue Nov 4, 2016

Fix Docker container creation and task runner updating #1938

Merged

dadgar closed this as completed in #1938 Nov 5, 2016

xytis mentioned this issue Mar 6, 2017

Scheduling Jobs with Docker Fails with Error "bind: address already in use" #2402

Closed

github-actions bot locked as resolved and limited conversation to collaborators Dec 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduling Jobs with Docker Fails with Error "bind: address already in use" #1814

Scheduling Jobs with Docker Fails with Error "bind: address already in use" #1814

vrenjith commented Oct 14, 2016

dadgar commented Oct 14, 2016

vrenjith commented Oct 20, 2016

dadgar commented Oct 20, 2016

vrenjith commented Oct 25, 2016

dadgar commented Nov 4, 2016

camerondavison commented Feb 1, 2017

sevenT commented Aug 16, 2017

github-actions bot commented Dec 10, 2022

Scheduling Jobs with Docker Fails with Error "bind: address already in use" #1814

Scheduling Jobs with Docker Fails with Error "bind: address already in use" #1814

Comments

vrenjith commented Oct 14, 2016

Nomad version

Operating system and Environment details

Issue

Reproduction steps

dadgar commented Oct 14, 2016

vrenjith commented Oct 20, 2016

dadgar commented Oct 20, 2016

vrenjith commented Oct 25, 2016

dadgar commented Nov 4, 2016

camerondavison commented Feb 1, 2017

sevenT commented Aug 16, 2017

Nomad Version

Docker

Job Summary

alloc-status

github-actions bot commented Dec 10, 2022