Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad worker nodes not joining via consul #2264

Closed
stevenscg opened this issue Feb 1, 2017 · 4 comments · Fixed by #2278
Closed

Nomad worker nodes not joining via consul #2264

stevenscg opened this issue Feb 1, 2017 · 4 comments · Fixed by #2278

Comments

@stevenscg
Copy link

Issue

Nomad worker nodes will not automatically join the cluster using consul as configured.

The only way I can get a client-only nomad agent to join the cluster is to add at least 1 server address to the client.servers configuration option.

Nomad version

Nomad v0.5.4. I may also have observed this back on v0.5.3 and v0.5.2.

Operating system and Environment details

CentOS 7 on AWS.
Security group ingress allowed on ports 4646, 4647, 4648 for the worker node.
Connectivity between worker and servers good.
Consul agent running locally on the worker node and connected to an existing consul cluster.
Nomad agent configured to use client_auto_join.

Nomad agent configuration:

region = "us"
datacenter = "us1"
name = "i-0ac1cf890221be861"
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0"
log_level = "DEBUG"
enable_debug = true

ports {
  http = 4646
  rpc = 4647
  serf = 4648
}

addresses {
  http = "0.0.0.0"
  rpc = "0.0.0.0"
  serf = "0.0.0.0"
}

advertise {
  http = "10.101.25.17:4646"
  rpc = "10.101.25.17:4647"
  serf = "10.101.25.17:4648"
}

consul {
  address = "127.0.0.1:8500"
  auto_advertise = true
  client_auto_join = true
}

leave_on_interrupt = true
leave_on_terminate = false
enable_syslog = true
syslog_facility = "LOCAL0"
disable_update_check = true
disable_anonymous_signature = true

client {
  enabled = true
  state_dir = "/opt/nomad/data/client"
  alloc_dir = "/opt/nomad/data/alloc"
  options {
    driver.exec = "1"
    docker.auth.config = "/home/nomad/.docker/config.json"
    driver.raw_exec.enable = "1"
  }
}

server {
  enabled = false
}

Nomad Client logs

Feb  1 16:52:38 ip-10-101-25-17 nomad[25949]: client: registration waiting on servers
Feb  1 16:52:44 ip-10-101-25-17 nomad[25949]: http: Request /v1/agent/servers (255.761µs)
Feb  1 16:52:53 ip-10-101-25-17 nomad[25949]: client: registration waiting on servers
Feb  1 16:52:54 ip-10-101-25-17 nomad[25949]: http: Request /v1/agent/servers (1.22914ms)
Feb  1 16:53:04 ip-10-101-25-17 nomad[25949]: http: Request /v1/agent/servers (275.012µs)
Feb  1 16:53:14 ip-10-101-25-17 nomad[25949]: http: Request /v1/agent/servers (249.899µs)
Feb  1 16:53:17 ip-10-101-25-17 nomad[25949]: client: registration waiting on servers
Feb  1 16:53:24 ip-10-101-25-17 nomad[25949]: http: Request /v1/agent/servers (93.891µs)
Feb  1 16:53:34 ip-10-101-25-17 nomad[25949]: http: Request /v1/agent/servers (131.48µs)
Feb  1 16:53:44 ip-10-101-25-17 nomad[25949]: http: Request /v1/agent/servers (280.405µs)
Feb  1 16:53:47 ip-10-101-25-17 nomad[25949]: client: registration waiting on servers
Feb  1 16:53:54 ip-10-101-25-17 nomad[25949]: http: Request /v1/agent/servers (103.681µs)
Feb  1 16:54:04 ip-10-101-25-17 nomad[25949]: http: Request /v1/agent/servers (1.28103ms)
Feb  1 16:54:12 ip-10-101-25-17 nomad[25949]: client: registration waiting on servers
Feb  1 16:54:14 ip-10-101-25-17 nomad[25949]: http: Request /v1/agent/servers (92.169µs)
Feb  1 16:54:24 ip-10-101-25-17 nomad[25949]: http: Request /v1/agent/servers (93.182µs)
Feb  1 16:54:28 ip-10-101-25-17 nomad[25949]: client: registration waiting on servers

No servers are returned from the local node API:

curl -s http://127.0.0.1:4646/v1/agent/servers | jq .
[]
@stevenscg
Copy link
Author

stevenscg commented Feb 1, 2017

FWIW, this is a new test cluster. I'm actively working to set it up and start using it and can gather additional data if needed.

As a workaround, I'm deploying workers with consul.servers = [ "nomad.service.consul" ] for now.

@dadgar
Copy link
Contributor

dadgar commented Feb 1, 2017

@stevenscg Can you curl http://127.0.0.1:8500/v1/agent/checks?pretty=true. Are the servers registered in Consul?

@stevenscg
Copy link
Author

@dadgar Yes, I believe that the servers are registered and healthy in Consul....

From the same worker node I captured logs from earlier (WITH consul.servers set):

curl http://127.0.0.1:8500/v1/agent/checks?pretty=true
{
    "289cc7e1737904489a71a4705d50e2dea3a55881": {
        "Node": "i-0ac1cf890221be861",
        "CheckID": "289cc7e1737904489a71a4705d50e2dea3a55881",
        "Name": "Nomad Client HTTP Check",
        "Status": "passing",
        "Notes": "",
        "Output": "HTTP GET http://0.0.0.0:4646/v1/agent/servers: 200 OK Output: [\"10.101.25.219:4647\",\"10.101.34.7:4647\",\"10.101.27.97:4647\"]",
        "ServiceID": "_nomad-client-nomad-client-http",
        "ServiceName": "nomad-client",
        "CreateIndex": 0,
        "ModifyIndex": 0
    }
}

From the same worker node I captured logs from earlier (WITHOUT consul.servers set):

{
    "289cc7e1737904489a71a4705d50e2dea3a55881": {
        "Node": "i-0ac1cf890221be861",
        "CheckID": "289cc7e1737904489a71a4705d50e2dea3a55881",
        "Name": "Nomad Client HTTP Check",
        "Status": "passing",
        "Notes": "",
        "Output": "HTTP GET http://0.0.0.0:4646/v1/agent/servers: 200 OK Output: []",
        "ServiceID": "_nomad-client-nomad-client-http",
        "ServiceName": "nomad-client",
        "CreateIndex": 0,
        "ModifyIndex": 0
    }
}

Catalog from the same node (truncated):

curl http://127.0.0.1:8500/v1/catalog/services?pretty=true
{
    "consul": [],
    "nomad": [
        "http",
        "rpc",
        "serf"
    ],
    "nomad-client": [
        "http"
    ],
    "vault": [
        "standby",
        "active"
    ]
}

dadgar added a commit that referenced this issue Feb 2, 2017
This PR fixes config merging/copying code.

Fixes #2264
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants