Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The first client of the cluster don't get to run a docker task #3159

Closed
rzcastilho opened this issue Sep 4, 2017 · 10 comments
Closed

The first client of the cluster don't get to run a docker task #3159

rzcastilho opened this issue Sep 4, 2017 · 10 comments

Comments

@rzcastilho
Copy link

rzcastilho commented Sep 4, 2017

If you have a question, prepend your issue with [question] or preferably use the nomad mailing list.

If filing a bug please include the following:

Nomad version

Nomad v0.6.2

Operating system and Environment details

$ nomad node-status -verbose a60cdc6f
ID      = a60cdc6f-b10c-d9d7-c0d6-792721b83183
Name    = hlgvalsvhmg1.pitagoras.apollo.br
Class   = <none>
DC      = dc1
Drain   = false
Status  = ready
Drivers = docker,exec,java
Uptime  = 113h7m53s

Allocated Resources
CPU          Memory      Disk        IOPS
0/28800 MHz  0 B/31 GiB  0 B/22 GiB  0/0

Allocation Resource Utilization
CPU          Memory
0/28800 MHz  0 B/31 GiB

Host Resource Utilization
CPU            Memory         Disk
895/28800 MHz  28 GiB/31 GiB  106 GiB/134 GiB

Allocations
ID                                    Eval ID                               Node ID                               Task Group  Version  Desired  Status    Created At
156091ca-adaf-8078-548a-a8d4ef8bd15a  85161110-c280-e081-3f62-dca6c79b5e2a  a60cdc6f-b10c-d9d7-c0d6-792721b83183  info01      10       run      pending   09/04/17 16:48:29 BRT
86b5f480-4c20-3f5b-c2c6-4262799b1917  5aae8489-f129-2ba1-0aaa-808dedf8ffac  a60cdc6f-b10c-d9d7-c0d6-792721b83183  info01      6        stop     complete  09/04/17 15:38:56 BRT
4db45b44-1558-ce67-135d-c841c10507ce  5aae8489-f129-2ba1-0aaa-808dedf8ffac  a60cdc6f-b10c-d9d7-c0d6-792721b83183  info01      6        stop     complete  09/04/17 15:34:02 BRT
884d8089-7e69-6399-f600-3da2fdfc7354  21378496-5523-e13d-df17-3f8b10db4a00  a60cdc6f-b10c-d9d7-c0d6-792721b83183  info02      7        stop     complete  09/04/17 15:34:02 BRT
6739907e-0039-93a0-01ea-fe31944fe548  5aae8489-f129-2ba1-0aaa-808dedf8ffac  a60cdc6f-b10c-d9d7-c0d6-792721b83183  info02      6        stop     complete  09/04/17 15:32:59 BRT
15e5cfad-b88b-5673-670c-b90882c1b2c2  5aae8489-f129-2ba1-0aaa-808dedf8ffac  a60cdc6f-b10c-d9d7-c0d6-792721b83183  info02      6        stop     complete  09/04/17 15:30:39 BRT

Attributes
consul.datacenter             = dc1
consul.revision               = 75ca2ca
consul.server                 = true
consul.version                = 0.9.2
cpu.arch                      = amd64
cpu.frequency                 = 2400
cpu.modelname                 = Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
cpu.numcores                  = 12
cpu.totalcompute              = 28800
driver.docker                 = 1
driver.docker.bridge_ip       = 172.17.0.1
driver.docker.version         = 17.05.0-ce
driver.docker.volumes.enabled = 1
driver.exec                   = 1
driver.java                   = 1
driver.java.runtime           = Java(TM) SE Runtime Environment (build 1.8.0_73-b02)
driver.java.version           = 1.8.0_73
driver.java.vm                = Java HotSpot(TM) 64-Bit Server VM (build 25.73-b02, mixed mode)
kernel.name                   = linux
kernel.version                = 4.1.12-61.1.6.el6uek.x86_64
memory.totalbytes             = 33720918016
nomad.revision                = <none>
nomad.version                 = 0.6.2
os.name                       = oracle
os.signals                    = SIGTRAP,SIGTTOU,SIGUSR1,SIGUSR2,SIGIOT,SIGHUP,SIGQUIT,SIGSYS,SIGTSTP,SIGCONT,SIGURG,SIGCHLD,SIGALRM,SIGPROF,SIGABRT,SIGILL,SIGINT,SIGIO,SIGKILL,SIGPIPE,SIGTERM,SIGBUS,SIGSTOP,SIGXFSZ,SIGSEGV,SIGXCPU,SIGTTIN,SIGFPE
os.version                    = 6.8
unique.cgroup.mountpoint      = /cgroup
unique.consul.name            = consul-server
unique.hostname               = hlgvalsvhmg1
unique.network.ip-address     = fe80::4896:bff:fe01:516f
unique.storage.bytesfree      = 23487995904
unique.storage.bytestotal     = 143595053056
unique.storage.volume         = /dev/mapper/rootvg-varlv

Issue

The first client of the cluster, that runs a server too, don't get to run a docker task, the other clients work fine.
The allocation log shows the output bellow:

$ nomad alloc-status 156091ca
ID                  = 156091ca
Eval ID             = 85161110
Name                = infojob.info01[1]
Node ID             = a60cdc6f
Job ID              = infojob
Job Version         = 10
Client Status       = pending
Client Description  = <none>
Desired Status      = run
Desired Description = <none>
Created At          = 09/04/17 16:48:29 BRT
Deployment ID       = 9b16e76d
Deployment Health   = unhealthy

Task "infoapi01" is "pending"
Task Resources
CPU      Memory   Disk     IOPS  Addresses
500 MHz  1.0 GiB  300 MiB  0     http: fe80::4896:bff:fe01:516f:26713

Task Events:
Started At     = N/A
Finished At    = N/A
Total Restarts = 8
Last Restart   = 09/04/17 16:53:21 BRT

Recent Events:
Time                   Type             Description
09/04/17 16:53:21 BRT  Restarting       Task restarting in 30.581775086s
09/04/17 16:53:21 BRT  Driver Failure   failed to start task "infoapi01" for alloc "156091ca-adaf-8078-548a-a8d4ef8bd15a": Failed to start container 8f68ce9336520401dc1f93f7babe2c6009e09812ceb8bb4c8aa7974158b07677: API error (500): {"message":"driver failed programming external connectivity on endpoint infoapi01-156091ca-adaf-8078-548a-a8d4ef8bd15a (900c37a625dbe348015aae46c7ccfd53edc0e29d19185f532c6e96ce1db9ea12): Error starting userland proxy: listen tcp [fe80::4896:bff:fe01:516f]:26713: bind: invalid argument"}
09/04/17 16:53:14 BRT  Driver           Downloading image rodrigozc/jdk8-application:latest
09/04/17 16:52:43 BRT  Restarting       Task restarting in 30.953542808s
09/04/17 16:52:43 BRT  Driver Failure   failed to start task "infoapi01" for alloc "156091ca-adaf-8078-548a-a8d4ef8bd15a": Failed to start container 8f68ce9336520401dc1f93f7babe2c6009e09812ceb8bb4c8aa7974158b07677: API error (500): {"message":"driver failed programming external connectivity on endpoint infoapi01-156091ca-adaf-8078-548a-a8d4ef8bd15a (58bbdf9a5ced8ab8375f97613bd6b2b6d95bd03d6163fb16dc1f57cc9dd85c75): Error starting userland proxy: listen tcp [fe80::4896:bff:fe01:516f]:26713: bind: invalid argument"}
09/04/17 16:52:35 BRT  Driver           Downloading image rodrigozc/jdk8-application:latest
09/04/17 16:52:04 BRT  Restarting       Task restarting in 31.791189517s
09/04/17 16:52:04 BRT  Driver Failure   failed to start task "infoapi01" for alloc "156091ca-adaf-8078-548a-a8d4ef8bd15a": Failed to start container 8f68ce9336520401dc1f93f7babe2c6009e09812ceb8bb4c8aa7974158b07677: API error (500): {"message":"driver failed programming external connectivity on endpoint infoapi01-156091ca-adaf-8078-548a-a8d4ef8bd15a (741e46147a46d2d329d579a9aa1d360e0022fd39ff6622cb5ee13c7acafd7f94): Error starting userland proxy: listen tcp [fe80::4896:bff:fe01:516f]:26713: bind: invalid argument"}
09/04/17 16:51:56 BRT  Driver           Downloading image rodrigozc/jdk8-application:latest
09/04/17 16:51:29 BRT  Alloc Unhealthy  Task not running by deadline

Reproduction steps

Run the job and obserbe that first none can't get to run a task.

Nomad Client/Server logs

    2017/09/04 16:56:32.306952 [DEBUG] http: Request /v1/status/peers (113.481µs)
    2017/09/04 16:56:33.179874 [DEBUG] http: Request /v1/agent/servers (46.107µs)
    2017/09/04 16:56:35 [DEBUG] memberlist: TCP connection from=127.0.0.1:52997
    2017/09/04 16:56:39.917300 [DEBUG] client: driver event for alloc "156091ca-adaf-8078-548a-a8d4ef8bd15a": Downloading image rodrigozc/jdk8-application:latest
    2017/09/04 16:56:40.007250 [DEBUG] client: updated allocations at index 5667 (total 6) (pulled 4) (filtered 2)
    2017/09/04 16:56:40.007320 [DEBUG] client: allocs: (added 0) (removed 0) (updated 4) (ignore 2)
    2017/09/04 16:56:40.007338 [DEBUG] client: dropping update to terminal alloc '86b5f480-4c20-3f5b-c2c6-4262799b1917'
    2017/09/04 16:56:40.007348 [DEBUG] client: dropping update to terminal alloc '15e5cfad-b88b-5673-670c-b90882c1b2c2'
    2017/09/04 16:56:40.007357 [DEBUG] client: dropping update to terminal alloc '4db45b44-1558-ce67-135d-c841c10507ce'
    2017/09/04 16:56:40.007367 [DEBUG] client: dropping update to terminal alloc '6739907e-0039-93a0-01ea-fe31944fe548'
    2017/09/04 16:56:42.015864 [DEBUG] driver.docker: docker pull rodrigozc/jdk8-application:latest succeeded
    2017/09/04 16:56:42.039829 [DEBUG] driver.docker: Setting default logging options to syslog and unix:///tmp/plugin747295477
    2017/09/04 16:56:42.039894 [DEBUG] driver.docker: Using config for logging: {Type:syslog ConfigRaw:[] Config:map[syslog-address:unix:///tmp/plugin747295477]}
    2017/09/04 16:56:42.039911 [DEBUG] driver.docker: using 1073741824 bytes memory for infoapi01
    2017/09/04 16:56:42.039919 [DEBUG] driver.docker: using 500 cpu shares for infoapi01
    2017/09/04 16:56:42.039953 [DEBUG] driver.docker: binding directories []string{"/var/lib/nomad/alloc/156091ca-adaf-8078-548a-a8d4ef8bd15a/alloc:/alloc", "/var/lib/nomad/alloc/156091ca-adaf-8078-548a-a8d4ef8bd15a/infoapi01/local:/local", "/var/lib/nomad/alloc/156091ca-adaf-8078-548a-a8d4ef8bd15a/infoapi01/secrets:/secrets", "/var/lib/nomad/alloc/156091ca-adaf-8078-548a-a8d4ef8bd15a/infoapi01/local/application.yml:/app/application.yml"} for infoapi01
    2017/09/04 16:56:42.039967 [DEBUG] driver.docker: networking mode not specified; defaulting to bridge
    2017/09/04 16:56:42.039979 [DEBUG] driver.docker: allocated port fe80::4896:bff:fe01:516f:26713 -> 8080 (mapped)
    2017/09/04 16:56:42.039998 [DEBUG] driver.docker: exposed port 8080
    2017/09/04 16:56:42.040021 [DEBUG] driver.docker: setting container name to: infoapi01-156091ca-adaf-8078-548a-a8d4ef8bd15a
    2017/09/04 16:56:42.042587 [DEBUG] driver.docker: failed to create container "infoapi01-156091ca-adaf-8078-548a-a8d4ef8bd15a" from image "rodrigozc/jdk8-application:latest" (ID: "sha256:2aab75cc2a58422a70a12a8f7389b8ddeca8d90092ac5d7670d49fbd79f29ece") (attempt 1): container already exists
    2017/09/04 16:56:42.047753 [DEBUG] driver.docker: searching for container name "/infoapi01-156091ca-adaf-8078-548a-a8d4ef8bd15a" to purge
    2017/09/04 16:56:42.047789 [DEBUG] driver.docker: listed container <nil>
    2017/09/04 16:56:42.048913 [INFO] driver.docker: created container 8f68ce9336520401dc1f93f7babe2c6009e09812ceb8bb4c8aa7974158b07677
    2017/09/04 16:56:42.112515 [DEBUG] driver.docker: failed to start container "8f68ce9336520401dc1f93f7babe2c6009e09812ceb8bb4c8aa7974158b07677" (attempt 1): API error (500): {"message":"driver failed programming external connectivity on endpoint infoapi01-156091ca-adaf-8078-548a-a8d4ef8bd15a (02e409203fe349ac5ebf54f276e5fec95a70b79fded54e60ccc66f42b3c084a9): Error starting userland proxy: listen tcp [fe80::4896:bff:fe01:516f]:26713: bind: invalid argument"}
    2017/09/04 16:56:42.308663 [DEBUG] http: Request /v1/status/peers (434.038µs)
    2017/09/04 16:56:43.167370 [DEBUG] driver.docker: failed to start container "8f68ce9336520401dc1f93f7babe2c6009e09812ceb8bb4c8aa7974158b07677" (attempt 2): API error (500): {"message":"driver failed programming external connectivity on endpoint infoapi01-156091ca-adaf-8078-548a-a8d4ef8bd15a (eb1421e45b8d02f6b3c5a197ad4859ac7448f982f8b5cf30ac1f911b08cf5fe6): Error starting userland proxy: listen tcp [fe80::4896:bff:fe01:516f]:26713: bind: invalid argument"}
    2017/09/04 16:56:43.181462 [DEBUG] http: Request /v1/agent/servers (366.706µs)
    2017/09/04 16:56:44.245544 [DEBUG] driver.docker: failed to start container "8f68ce9336520401dc1f93f7babe2c6009e09812ceb8bb4c8aa7974158b07677" (attempt 3): API error (500): {"message":"driver failed programming external connectivity on endpoint infoapi01-156091ca-adaf-8078-548a-a8d4ef8bd15a (0089bebd1ca279154b96f935a351c0ed52c18a563919e829ccf2444455ac24ce): Error starting userland proxy: listen tcp [fe80::4896:bff:fe01:516f]:26713: bind: invalid argument"}
    2017/09/04 16:56:45.309688 [DEBUG] driver.docker: failed to start container "8f68ce9336520401dc1f93f7babe2c6009e09812ceb8bb4c8aa7974158b07677" (attempt 4): API error (500): {"message":"driver failed programming external connectivity on endpoint infoapi01-156091ca-adaf-8078-548a-a8d4ef8bd15a (10cfaab7724871bb05d91f5ad0e417825fdaefdef7d7f23f833db631d99f632f): Error starting userland proxy: listen tcp [fe80::4896:bff:fe01:516f]:26713: bind: invalid argument"}
    2017/09/04 16:56:45 [DEBUG] memberlist: TCP connection from=127.0.0.1:53045
    2017/09/04 16:56:46.385395 [DEBUG] driver.docker: failed to start container "8f68ce9336520401dc1f93f7babe2c6009e09812ceb8bb4c8aa7974158b07677" (attempt 5): API error (500): {"message":"driver failed programming external connectivity on endpoint infoapi01-156091ca-adaf-8078-548a-a8d4ef8bd15a (ff79c7161bd7c5d884ae1b29869745eaca5c60dfbc1eef89552d482ba8193921): Error starting userland proxy: listen tcp [fe80::4896:bff:fe01:516f]:26713: bind: invalid argument"}
    2017/09/04 16:56:47.450411 [DEBUG] driver.docker: failed to start container "8f68ce9336520401dc1f93f7babe2c6009e09812ceb8bb4c8aa7974158b07677" (attempt 6): API error (500): {"message":"driver failed programming external connectivity on endpoint infoapi01-156091ca-adaf-8078-548a-a8d4ef8bd15a (40d4806e078b4b5fdab020bff00cd440b02098a3e00356632e7a3c48cd910d43): Error starting userland proxy: listen tcp [fe80::4896:bff:fe01:516f]:26713: bind: invalid argument"}
    2017/09/04 16:56:47.450444 [ERR] driver.docker: failed to start container 8f68ce9336520401dc1f93f7babe2c6009e09812ceb8bb4c8aa7974158b07677: API error (500): {"message":"driver failed programming external connectivity on endpoint infoapi01-156091ca-adaf-8078-548a-a8d4ef8bd15a (40d4806e078b4b5fdab020bff00cd440b02098a3e00356632e7a3c48cd910d43): Error starting userland proxy: listen tcp [fe80::4896:bff:fe01:516f]:26713: bind: invalid argument"}
    2017/09/04 16:56:47.452999 [WARN] client: failed to start task "infoapi01" for alloc "156091ca-adaf-8078-548a-a8d4ef8bd15a": Failed to start container 8f68ce9336520401dc1f93f7babe2c6009e09812ceb8bb4c8aa7974158b07677: API error (500): {"message":"driver failed programming external connectivity on endpoint infoapi01-156091ca-adaf-8078-548a-a8d4ef8bd15a (40d4806e078b4b5fdab020bff00cd440b02098a3e00356632e7a3c48cd910d43): Error starting userland proxy: listen tcp [fe80::4896:bff:fe01:516f]:26713: bind: invalid argument"}
    2017/09/04 16:56:47.453067 [INFO] client: Restarting task "infoapi01" for alloc "156091ca-adaf-8078-548a-a8d4ef8bd15a" in 36.989089767s
    2017/09/04 16:56:47.454197 [DEBUG] consul.sync: registered 0 services, 0 checks; deregistered 0 services, 0 checks
    2017/09/04 16:56:47.608041 [DEBUG] client: updated allocations at index 5668 (total 6) (pulled 4) (filtered 2)
    2017/09/04 16:56:47.608112 [DEBUG] client: allocs: (added 0) (removed 0) (updated 4) (ignore 2)
    2017/09/04 16:56:47.608141 [DEBUG] client: dropping update to terminal alloc '86b5f480-4c20-3f5b-c2c6-4262799b1917'
    2017/09/04 16:56:47.608152 [DEBUG] client: dropping update to terminal alloc '15e5cfad-b88b-5673-670c-b90882c1b2c2'
    2017/09/04 16:56:47.608163 [DEBUG] client: dropping update to terminal alloc '4db45b44-1558-ce67-135d-c841c10507ce'
    2017/09/04 16:56:47.608172 [DEBUG] client: dropping update to terminal alloc '6739907e-0039-93a0-01ea-fe31944fe548'
    2017/09/04 16:56:52.310241 [DEBUG] http: Request /v1/status/peers (363.034µs)
    2017/09/04 16:56:53.182542 [DEBUG] http: Request /v1/agent/servers (30.76µs)

Job file (if appropriate)

job "infojob" {
  datacenters = ["dc1"]
  type = "service"
  update {
    max_parallel = 1
    min_healthy_time = "10s"
    healthy_deadline = "3m"
    auto_revert = false
    canary = 0
  }

  group "info01" {
    count = 2
    restart {
      attempts = 10
      interval = "5m"
      delay = "30s"
      mode = "fail"
    }

    ephemeral_disk {
      size = 300
    }

    task "infoapi01" {
      driver = "docker"
      config {
        image = "rodrigozc/jdk8-application:latest"
        volumes = ["local/application.yml:/app/application.yml"]
        port_map {
          http = 8080
        }
      }

      resources {
        cpu    = 500 # 500 MHz
        memory = 1024 # 1GB
        network {
          mbits = 1
          port "http" {}
        }
      }

      service {
        name = "infoapi01"
        tags = ["sample","jdk8-application", "info01"]
        port = "http"
        check {
          type     = "http"
          port     = "http"
          path     = "/info"
          interval = "10s"
          timeout  = "2s"
        }
      }

      artifact {
        source = "https://raw.githubusercontent.com/rodrigozc/jdk8-application/master/application.ctmpl"
        destination = "local"
      }

      template {
        source        = "local/application.ctmpl"
        destination   = "local/application.yml"
        change_mode   = "restart"
      }

    }

  }

  group "info02" {
    count = 3
    restart {
      attempts = 10
      interval = "5m"
      delay = "25s"
      mode = "fail"
    }

    ephemeral_disk {
      size = 300
    }

    task "infoapi02" {
      driver = "docker"
      config {
        image = "rodrigozc/jdk8-application:latest"
        volumes = ["local/application.yml:/app/application.yml"]
        port_map {
          http = 8080
        }
      }

      resources {
        cpu    = 500 # 500 MHz
        memory = 1024 # 1GB
        network {
          mbits = 1
          port "http" {}
        }
      }

      service {
        name = "infoapi02"
        tags = ["sample","jdk8-application", "info02"]
        port = "http"
        check {
          type     = "http"
          port     = "http"
          path     = "/info"
          interval = "10s"
          timeout  = "2s"
        }
      }

      artifact {
        source = "https://raw.githubusercontent.com/rodrigozc/jdk8-application/master/application.ctmpl"
        destination = "local"
      }

      template {
        source        = "local/application.ctmpl"
        destination   = "local/application.yml"
        change_mode   = "restart"
      }

    }
  }
}
@rzcastilho
Copy link
Author

rzcastilho commented Sep 4, 2017

I have a server and three clients, configurations bellow.

Server Configuration:

bind_addr  = "0.0.0.0" # the default
data_dir   = "/var/lib/nomad"
log_level  = "DEBUG"

server {
  enabled          = true
  bootstrap_expect = 1
}

client {
  enabled       = true
  servers = ["10.100.2.53:4647"]
  options {
    "docker.auth.config"     = "/root/.docker/config.json"
    "docker.cleanup.image"   = "0"
  }
}

consul {
  address = "127.0.0.1:8500"
}

Clients Configuration:

bind_addr  = "0.0.0.0" # the default
data_dir   = "/var/lib/nomad"
log_level  = "DEBUG"

client {
  enabled       = true
  servers = ["10.100.2.53:4647"]
  options {
    "docker.auth.config"     = "/root/.docker/config.json"
    "docker.cleanup.image"   = "0"
  }
}

consul {
  address = "127.0.0.1:8500"
}

Docker Version:

$ docker version
Client:
 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:09:44 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:09:44 2017
 OS/Arch:      linux/amd64
 Experimental: true

@dadgar
Copy link
Contributor

dadgar commented Sep 7, 2017

Hey,

Do all docker allocations on that node fail? Can you run a docker container on that node outside of Nomad binding to the same address?

Also are the other clients using IPv6 as well?

@rzcastilho
Copy link
Author

Hi @dadgar,

After some tests, all works fine.

I just deleted the Nomad and Consul data from nodes and restarted all my cluster, now there are allocations on all nodes running perfectly.

Maybe, during configuration changes, some data became out of sync or corrupted, I don't know.

Thank you.

@lgfausak
Copy link

@rodrigozc I am experiencing this same error. I did try completely reinstalling nomad and consul and I still see the errors. Maybe I am not clearing out consul completely?

@lovwal
Copy link

lovwal commented Oct 10, 2017

@dadgar Had the same issue today. Nomad is trying to bind every container on the host to a ipv6 link-local address, this address does not exist on the host and the container isn't running ipv6.

(Host)
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
inet6 fe80::5054:ff:fe3f:bf9a/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 state DOWN
inet6 fe80::42:27ff:fea4:7ae0/64 scope link
valid_lft forever preferred_lft forever
(State down since no container was running on the host)

(Sample from another container running the same job)
/ # ip -6 addr
/ #

Some allocations on the host:
Task "cmsapi" is "dead"
Task Resources
CPU Memory Disk IOPS Addresses
500 MHz 1.8 GiB 300 MiB 0 http: fe80::f014:7aff:fee6:11cc:31479
....
0/10/17 07:49:34 UTC Restarting Task restarting in 18.200454832s
10/10/17 07:49:34 UTC Driver Failure failed to start task "cmsapi" for alloc "525aeb11-deae-9c9f-a6c1-c714bc4278d2": Failed to start container b85216162536c84f023c771de3c84d1935370997cdbdcd45163e29df79aa1d74: API error (500): {"message":"driver failed programming external connectivity on endpoint cmsapi-525aeb11-deae-9c9f-a6c1-c714bc4278d2 (e0cb0fb052fcd5113852eec7b656f481381233a5eb48c3941fe85e641bd3682b): Error starting userland proxy: listen tcp [fe80::f014:7aff:fee6:11cc]:31479: bind: invalid argument"}

Task "nginx-exporter" is "dead"
Task Resources
CPU Memory Disk IOPS Addresses
50 MHz 30 MiB 300 MiB 0 http: fe80::f014:7aff:fee6:11cc:23272

10/10/17 08:10:06 UTC Driver Failure failed to start task "nginx-exporter" for alloc "e70c5e05-59ed-fe32-197c-ba30a509adbf": Failed to start container 6395902fbb771a70e52f26dd944c1d2767e519276aafa239f6c8620f70f1ad19: API error (500): {"message":"driver failed programming external connectivity on endpoint nginx-exporter-e70c5e05-59ed-fe32-197c-ba30a509adbf (34281a156b777a7c8b1146e2a422e346d54e24ed1317eca5bd90c66c4fb944b1): Error starting userland proxy: listen tcp [fe80::f014:7aff:fee6:11cc]:23272: bind: invalid argument"}

Issue resolved after restarting nomad on the host. Hope this helps.

@dadgar
Copy link
Contributor

dadgar commented Oct 13, 2017

@lovwal Did the IPs/interface change on that machine since Nomad started? I can't see how Nomad would pick a non-existent address. As mentioned originally, when there are many routable interfaces Nomad will just pick one. You should specify the interface to use if this is the case.

@lovwal
Copy link

lovwal commented Oct 18, 2017

There's only one routeable interface on the machine, the address did not change on it.

@Vaelatern
Copy link
Contributor

I just ran into this problem as well. My nomad configuration specifies

client {
  enabled = true
  network_interface = "bridge-vlan-3"
}

and as you might guess I have multiple such bridges. They have IP addresses, and my mistaken IP address does in fact come from that interface. Except in my case, this is true even after re-installing the entire operating system: 60153f4d28609a7c8f6d36912fb95f53911e23e4870f8a83515c7326d2a17654: API error (500): driver failed programming external connectivity on endpoint name-6901a1e3-2559-4bad-071a-51f12c52e23f (dfd7f909e101ed282233e3cd044b3cd20b048db5a6b6d961696e47c93a300f51): Error starting userland proxy: listen tcp [fe80::aaa1:59ff:fe16:9832]:27817: bind: invalid argument

@Vaelatern
Copy link
Contributor

Vaelatern commented Jun 17, 2020

Turns out this can be triggered with link-local addresses, which in my setup multiple interfaces happened to share.

I moved on to the next error when I applied https://www.nomadproject.io/docs/configuration/client/#fingerprint-network-disallow_link_local

@github-actions
Copy link

github-actions bot commented Nov 6, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants