Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduling Jobs with Docker Fails with Error "bind: address already in use" #1814

Closed
vrenjith opened this issue Oct 14, 2016 · 8 comments
Closed

Comments

@vrenjith
Copy link
Contributor

Nomad version

0.3.2

Operating system and Environment details

Ubuntu 14.04 LTS

Issue

Job run using nomad runs into the following issue from the docker driver

network docker_gwbridge: Error starting userland proxy: listen tcp 10.9.45.103:40246: bind: address already in use\n"

Reproduction steps

We do continuous deployment in the clusters (every hour) and we see this error once in 1-2 days.

This raises a question on how nomad is allocating a free port and supplying that to the docker driver. In this process it appears as if it is allocating a port that is already used by one of the already deployed containers.

@vrenjith vrenjith changed the title Scheduling Jobs with Docker Fails with Error "network docker_gwbridge: Error starting userland proxy: listen tcp 10.9.45.103:40246: bind: address already in use" Scheduling Jobs with Docker Fails with Error "bind: address already in use" Oct 14, 2016
@dadgar
Copy link
Contributor

dadgar commented Oct 14, 2016

Hey,

Can you check if any other allocation was assigned that port?

@vrenjith
Copy link
Contributor Author

@dadgar
All the processes are managed by nomad and isn't that something already known to nomad?

@dadgar
Copy link
Contributor

dadgar commented Oct 20, 2016

It is something known to nomad, I was more asking if you could see by listing the allocs if multiple were assigned that port. It would help determine if there was a bug

@vrenjith
Copy link
Contributor Author

Okay let me see I can check this when it occurs again. It's there an easy
way to look at these ports being allocated by 'nomad or do we need to list
all allocations, find corresponding docker containers and then do a docker
inspect to find the port?

Regards,
Renjith

@dadgar
Copy link
Contributor

dadgar commented Nov 4, 2016

PR #1938 fixes an issue around this. Given the detail it is hard to tell if this fixes the particular bug reported but I think it may. If it comes up post 0.5 we will re-open

@camerondavison
Copy link
Contributor

FYI I am running 0.5.2 and saw

Driver Failure  failed to start task '----' for alloc '51f0c2e1-dc72-a767-cb7d-3766c4730482': Failed to start container e30a46ccefe8ca4280a15bad9f11e3604a2d4377ee912802b9413bfb184bd331: API error (500): {"message":"driver failed programming external connectivity on endpoint -------51f0c2e1-dc72-a767-cb7d-3766c4730482 (a60b5e26389e61f06f3c1693fb66cd0fb9c00b029fba3e5f5427884b1d0cefe4): Error starting userland proxy: listen tcp 172.31.25.182:49918: bind: address already in use"}

docker version 1.12.3

@sevenT
Copy link

sevenT commented Aug 16, 2017

still find this using Nomad v0.6.0, when i scheduling 5 job at same ip

Nomad Version

Nomad v0.6.0

Docker

Docker version 1.9.1

Job Summary

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
echo        0       0         3        4       1         0

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created At
0536201a  c4d676aa  echo        4        run      failed   08/16/17 20:42:02 CST
0dccafae  68cd8383  echo        4        run      running  08/16/17 20:42:02 CST
1ff7a960  68cd8383  echo        4        run      running  08/16/17 20:42:02 CST
5662b866  c4d676aa  echo        4        run      failed   08/16/17 20:42:02 CST
db1634eb  c4d676aa  echo        4        run      running  08/16/17 20:38:52 CST

alloc-status

ID                  = 0536201a
Eval ID             = bc64482e
Name                = simple.echo[4]
Node ID             = c4d676aa
Job ID              = simple
Job Version         = 4
Client Status       = failed
Client Description  = <none>
Desired Status      = run
Desired Description = <none>
Created At          = 08/16/17 20:42:02 CST
Deployment ID       = 6f3d6638
Deployment Health   = unhealthy

Task "server" is "dead"
Task Resources
CPU      Memory   Disk     IOPS  Addresses
500 MHz  256 MiB  300 MiB  0     simple: 127.0.0.1:23041

Task Events:
Started At     = N/A
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                   Type            Description
08/16/17 20:42:11 CST  Not Restarting  Error was unrecoverable
08/16/17 20:42:11 CST  Driver Failure  failed to start task "server" for alloc "0536201a-e3b8-6066-54a4-8ff0546ac6bb": Failed to start container f9fbef82ca395a7f02943fa3268e812fb97664b6d6e5fb20143ed29e4ce37c0e: API error (500): Cannot start container f9fbef82ca395a7f02943fa3268e812fb97664b6d6e5fb20143ed29e4ce37c0e: failed to create endpoint server-0536201a-e3b8-6066-54a4-8ff0546ac6bb on network bridge: Error starting userland proxy: listen tcp 127.0.0.1:23041: bind: address already in use
08/16/17 20:42:02 CST  Task Setup      Building Task Directory
08/16/17 20:42:02 CST  Received        Task received by client

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants