Improve consul integration user experience. #490

F21 · 2015-11-23T05:21:35Z

I spent most of my morning trying to build a test nomad + consul cluster using vagrant.

I am finding that as it stands, consul integration is very difficult to run on top of nomad. I am sure the issues outlined here will spawn child issues that are more specific, but I think having a general issue will help provide discussion improve the user experience before we break it down to specific tasks.

Here is a quick background of my investigations to narrow down the scope:

Nomad 0.2.0 is used.
Consul 0.6.0 RC2 is used (using my docker image f21global/consul on docker hub).
Currently only using the docker task driver.
I plan am running consul as a system task as recommended in the documentation.
- Consul runs as a docker container (f21global/consul).
- Consul runs with host networking. This is so that every nomad agent should be able to easily access consul via localhost:8500.
- In addition to the above, docker containers can access the consul agent or server running on the host though the gateway address within the docker container.
I want to have consul running on all nomad client nodes. up to 3 of the nodes in each datacenter should be a consul server. The rest should be agents.

This is currently the nomad task config I am using:

# Define a job called my-service
job "consul" {
    # Job should run in the US region
    region = "global"

    # Spread tasks between us-west-1 and us-east-1
    datacenters = ["dc1"]

    # run this job globally
    type = "system"

    # Rolling updates should be sequential
    update {
        stagger = "30s"
        max_parallel = 1
    }

    constraint{
        distinct_hosts = true
    }

    group "consul-server" {
        # Create a web front end using a docker image
        task "consul-server" {
            driver = "docker"
            config {
                image = "f21global/consul"
                network_mode = "host"
                args = ["agent", "-server", "-bootstrap-expect", "1", "-data-dir", "/tmp/consul"]
            }
            resources {
                cpu = 500
                memory = 64
                network {
                    # Request for a static port
                    port "consul_8300" {
                        static = 8300
                    }

                    port "consul_8301" {
                        static = 8301
                    }

                    port "consul_8302" {
                        static = 8302
                    }

                    port "consul_8400" {
                        static = 8400
                    }

                    port "consul_8500" {
                        static = 8500
                    }

                    port "consul_8600" {
                        static = 8600
                    }
                }
            }
        }
    }

    group "consul-agent" {
        # Create a web front end using a docker image
        task "consul-agent" {
            driver = "docker"
            config {
                image = "f21global/consul"
                network_mode = "host"
                args = ["agent", "-data-dir", "/tmp/consul", "-node=agent-twi"]
            }
            resources {
                cpu = 500
                memory = 64
                network {
                    # Request for a static port
                    port "consul_8300" {
                        static = 8300
                    }

                    port "consul_8301" {
                        static = 8301
                    }

                    port "consul_8302" {
                        static = 8302
                    }

                    port "consul_8400" {
                        static = 8400
                    }

                    port "consul_8500" {
                        static = 8500
                    }

                    port "consul_8600" {
                        static = 8600
                    }
                }
            }
        }
    }
}

Problems I ran into:

distinct_hosts causes nomad to panic: Setting distinct_hosts to a boolean causes panic #489.
When launching the nomad clients, it tries to look for a consul agent, but fails to find one, because the consul agent needs to be run as a nomad task. It is unclear whether the nomad client will keep retrying to access the consul agent when it is launched by the nomad cluster.
Because the consul servers are launched by nomad, it is impossible automate the consul agents so that they can automatically join using start-join and start-join-wan.
This requires a lot of manual intervention, because we would need to start the servers, check where the servers are, update the consul.nomad file with the ip address and then send it to nomad as an update.
It doesn't seem to be possible to break the task files up into 2 (1 for the servers and 1 for the agents), because there doesn't seem to be a way to assign constraints base on task names. For example, in the agents task file, I would like to have a constraint that says: do not run on nodes that already have the consul-server task running.

The text was updated successfully, but these errors were encountered:

diptanu · 2015-11-23T07:09:21Z

@F21 The Nomad client continues to keep trying to the Consul agent. The moment it connects to the agent it syncs all the service definitions of Tasks running on that node.

Also, I am wondering why does Consul servers need to be run as system jobs? It makes perfect sense to run the consul agents as system jobs. We shouldn't probably need to use the distinct host constraint in case the job uses the system scheduler.

F21 · 2015-11-23T08:31:10Z

@diptanu I agree that the Consul servers probably won't need to use the system scheduler.

Having said that, my initial rationale was to use distinct_host as a way to prevent the consul agent from being scheduled onto clients where the consul server was running.

However, even after setting the consul server count to 1, I am still getting scheduling errors:

# Define a job called my-service
job "consul" {
    # Job should run in the US region
    region = "global"

    # Spread tasks between us-west-1 and us-east-1
    datacenters = ["dc1"]

    # run this job globally
    type = "system"

    # Rolling updates should be sequential
    update {
        stagger = "30s"
        max_parallel = 1
    }

    constraint{
        distinct_hosts = "true"
    }

    group "consul-server" {
        count = 1

        # Create a web front end using a docker image
        task "consul-server" {
            driver = "docker"
            config {
                image = "f21global/consul"
                network_mode = "host"
                args = ["agent", "-server", "-bootstrap-expect", "1", "-data-dir", "/tmp/consul"]
            }
            resources {
                cpu = 500
                memory = 64
                network {
                    # Request for a static port
                    port "consul_8300" {
                        static = 8300
                    }

                    port "consul_8301" {
                        static = 8301
                    }

                    port "consul_8302" {
                        static = 8302
                    }

                    port "consul_8400" {
                        static = 8400
                    }

                    port "consul_8500" {
                        static = 8500
                    }

                    port "consul_8600" {
                        static = 8600
                    }
                }
            }
        }
    }

    group "consul-agent" {
        # Create a web front end using a docker image
        task "consul-agent" {
            driver = "docker"
            config {
                image = "f21global/consul"
                network_mode = "host"
                args = ["agent", "-data-dir", "/tmp/consul", "-node=agent-twi"]
            }
            resources {
                cpu = 500
                memory = 64
                network {
                    # Request for a static port
                    port "consul_8300" {
                        static = 8300
                    }

                    port "consul_8301" {
                        static = 8301
                    }

                    port "consul_8302" {
                        static = 8302
                    }

                    port "consul_8400" {
                        static = 8400
                    }

                    port "consul_8500" {
                        static = 8500
                    }

                    port "consul_8600" {
                        static = 8600
                    }
                }
            }
        }
    }
}

$ sudo nomad run -address=http://192.168.33.10:4646 consul.nomad
==> Monitoring evaluation "d659de31-4f88-95bc-8fe0-32098d1ce3f6"
    Evaluation triggered by job "consul"
    Scheduling error for group "consul-agent" (failed to find a node for placement)
    Allocation "22b47eea-b799-e56b-3622-af7aa9c97a78" status "failed" (0/1 nodes filtered)
      * Resources exhausted on 1 nodes
      * Dimension "network: reserved port collision" exhausted on 1 nodes
    Allocation "2447ee21-feb4-37eb-9ed5-001d00846d05" created: node "69278cbc-37c4-cf17-2de1-586c3589cfa9", group "consul-server"
    Allocation "714b1913-acd5-2189-5ca3-cc150910cacb" created: node "c0e42790-44b2-a729-5a8f-1742fb503999", group "consul-server"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "d659de31-4f88-95bc-8fe0-32098d1ce3f6" finished with status "complete"
$ sudo nomad status -address=http://192.168.33.10:4646 consul
ID          = consul
Name        = consul
Type        = system
Priority    = 50
Datacenters = dc1
Status      = <none>

==> Evaluations
ID                                    Priority  TriggeredBy   Status
d659de31-4f88-95bc-8fe0-32098d1ce3f6  50        job-register  complete

==> Allocations
ID                                    EvalID                                NodeID
   TaskGroup      Desired  Status
22b47eea-b799-e56b-3622-af7aa9c97a78  d659de31-4f88-95bc-8fe0-32098d1ce3f6  <none>
   consul-agent   failed   failed
2447ee21-feb4-37eb-9ed5-001d00846d05  d659de31-4f88-95bc-8fe0-32098d1ce3f6  69278cbc-37c4-cf17-2de1-586c3589cfa9  consul-server  run      dead
714b1913-acd5-2189-5ca3-cc150910cacb  d659de31-4f88-95bc-8fe0-32098d1ce3f6  c0e42790-44b2-a729-5a8f-1742fb503999  consul-server  run      dead
$ sudo nomad status -address=http://192.168.33.10:4646 consul
ID          = consul
Name        = consul
Type        = system
Priority    = 50
Datacenters = dc1
Status      = <none>

==> Evaluations
ID                                    Priority  TriggeredBy   Status
d659de31-4f88-95bc-8fe0-32098d1ce3f6  50        job-register  complete

==> Allocations
ID                                    EvalID                                NodeID
   TaskGroup      Desired  Status
22b47eea-b799-e56b-3622-af7aa9c97a78  d659de31-4f88-95bc-8fe0-32098d1ce3f6  <none>
   consul-agent   failed   failed
2447ee21-feb4-37eb-9ed5-001d00846d05  d659de31-4f88-95bc-8fe0-32098d1ce3f6  69278cbc-37c4-cf17-2de1-586c3589cfa9  consul-server  run      dead
714b1913-acd5-2189-5ca3-cc150910cacb  d659de31-4f88-95bc-8fe0-32098d1ce3f6  c0e42790-44b2-a729-5a8f-1742fb503999  consul-server  run      dead

diptanu · 2015-11-24T17:11:56Z

@F21 It looks like the agent and server are getting scheduled on the same machine, that's why you're getting the port collision. distinct_hosts at the job level just means that all the task groups are going to be running on distinct machines. But a system job still would run on every single machine and that's the consul agent is getting scheduled with the consul server. We might need a way to exclude system jobs from running on machines with certain label.

rothgar · 2015-11-25T20:56:21Z

I'm glad I'm not the only one running into this problem. There really should be a recommended way in the docs to run consul because having that service running is crucial for a production ready cluster.

rothgar · 2015-11-25T21:14:14Z

I tried splitting up consul-server and consul-agent into different services (instead of system) to let them run independently but that doesn't appear to be the right solution (or I missed something in the config)

The nomad server running consul-server keeps restarting the service

2015/11/25 13:01:13 [ERR] client: failed to complete task 'consul-server' for alloc '85aa78be-e03b-33e0-6bc5-e6424b8eabdb': Wait returned exit code 1, signal 0, and error Docker container exited with non-zero exit code: 1
    2015/11/25 13:01:13 [INFO] client: Restarting Task: consul-server
2015/11/25 13:01:29 [INFO] driver.docker: a container with the name consul-server-85aa78be-e03b-33e0-6bc5-e6424b8eabdb already exists; will attempt to purge and re-create
2015/11/25 13:01:32 [INFO] driver.docker: purged container consul-server-85aa78be-e03b-33e0-6bc5-e6424b8eabdb
    2015/11/25 13:01:32 [INFO] driver.docker: created container dc3978f31f16a1ab2994679f33466ca2e21feaea4fe7f9975d2c3a3aa61c61ad
    2015/11/25 13:01:32 [INFO] driver.docker: started container dc3978f31f16a1ab2994679f33466ca2e21feaea4fe7f9975d2c3a3aa61c61ad
    2015/11/25 13:01:32 [ERR] client: failed to complete task 'consul-server' for alloc '85aa78be-e03b-33e0-6bc5-e6424b8eabdb': Wait returned exit code 1, signal 0, and error Docker container exited with non-zero exit code: 1
    2015/11/25 13:01:32 [INFO] client: Restarting Task: consul-server

And the server running consul-agent restarts a couple of times and then gets stuck

2015/11/25 13:05:14 [INFO] driver.docker: purged container consul-agent-519193f5-6744-f3e7-3fd2-cb4a06ec4cb6
    2015/11/25 13:05:14 [INFO] driver.docker: created container 57347baa7465137dc6aea463ae939539043852a0058e9e642f2daf360128a77a
    2015/11/25 13:05:14 [ERR] driver.docker: failed to start container 57347baa7465137dc6aea463ae939539043852a0058e9e642f2daf360128a77a: API error (500): Cannot start container 57347baa7465137dc6aea463ae939539043852a0058e9e642f2daf360128a77a: [8] System error: write /sys/fs/cgroup/devices/system.slice/docker-57347baa7465137dc6aea463ae939539043852a0058e9e642f2daf360128a77a.scope/cgroup.procs: no such device
    2015/11/25 13:05:14 [ERR] client: failed to start task 'consul-agent' for alloc '519193f5-6744-f3e7-3fd2-cb4a06ec4cb6': Failed to start container 57347baa7465137dc6aea463ae939539043852a0058e9e642f2daf360128a77a: API error (500): Cannot start container 57347baa7465137dc6aea463ae939539043852a0058e9e642f2daf360128a77a: [8] System error: write /sys/fs/cgroup/devices/system.slice/docker-57347baa7465137dc6aea463ae939539043852a0058e9e642f2daf360128a77a.scope/cgroup.procs: no such device
    2015/11/25 13:05:46 [ERR] http: Request /v1/allocation/consul-client, error: rpc error: alloc lookup failed: index error: UUID must be 36 characters

I'm still digging into it but just wanted to echo the need for a recommended way to run consul on nomad.

ketzacoatl · 2015-12-25T13:49:23Z

FWIW, I have avoided these issues by making the consul network/service the thing that is setup and completed first, and which nomad then uses. CM manages consul and nomad on all nodes, and init for each node works out the details of forming/joining a cluster with consensus and a leader. No chicken/egg issues here.

F21 · 2015-12-30T22:30:01Z

Has any process been made to get consul running on nomad? While it's possible to run consul by itself along side nomad, it poses the following problems (this is assuming we only have 1 datacenter and want to have 3 consul servers with the rest of the servers running the consul client):

If a physical node where a consul server is running on dies, we need to to make sure a new node is provisioned with the consul server and not the consul agent. Perhaps some configuration management tool can be used to do this, but it seems rather inefficient to have to poll all the servers in the DC to work out how many consul servers are running.
Furthermore, this use case seems to be something that's quite suitable for nomad to manage. For example, if a node running the consul server dies, nomad can just kill a client on another node and replace it with a server.
Bootstrapping consul on nomad might be a problem as we do need to know the ip addresses of a few nodes to allow consul to find each other, however I think this can be alleviated by using Atlas for discovery, manually including the ip addresses of a few nodes that are guaranteed to exist (and have gossip discover everyone) or using mdns (when and if it lands in consul).

diptanu · 2015-12-31T00:55:07Z

@F21 You can definitely run Consul servers with Nomad. What is not possible today is to run both the Consul servers(using service scheduler) and clients(using system scheduler) via Nomad. The reason is that the system scheduler currently schedules all the system jobs in all the machines in a Nomad cluster, there is no way for the system scheduler to currently exclude running the Consul clients on the machines where Nomad is running the Consul servers. And on the same machine, the client and server can't run simultaenously because of port collisions.

But if you just want to run the Consul servers on Nomad it's definitely possible and as you said, you could use Atlas's auto-join functionality to have the Consul servers find each other when they are dynamically scheduled by Nomad.

F21 · 2015-12-31T01:43:48Z

@diptanu What are the nomad team's plans to fully bring consul scheduling to nomad?

In terms of the service scheduler colliding with the system scheduler, maybe a key called a service_key or something more suitable can be added to the task definitions. We can then set the service_key for both the system task and the service task to something like consul. Then, nomad will give the service task priority when scheduling. If a node running a service task goes offline, nomad can evict the system task and replace it with a service task so that the count is maintained.

Or maybe, the metadata feature can be improved so that it's exposed on a node level. For example, a task could add a piece of metadata setting service-type=consul-server and the system scheduler can be constrained using constraints to not schedule on nodes where the metadata exists. However, a priority system would still be required to so that the service tasks have precedence.

I've recently built a bunch of docker images to run HDFS, having this feature would also be very useful. For example, I want to run the namenodes on distinct nodes. I also want to use the system scheduler to schedule datanodes on all nodes in the cluster except for nodes where the namenodes are running.

Another possible way to deal with consul discovery without using Atlas would be to exploit the all_at_once feature when scheduling the tasks. Since the scheduler knows which nodes the tasks will run on atomically, it could also pass an array (maybe as json) of the ip addresses of the nodes it's scheduled on and pass that as an environment variable to the task. I think this should deal with the issue of consul servers discovering each other. However, We would also need some way to pass that information to the system task so that the consul clients can connect to the servers.

F21 · 2016-02-03T02:48:16Z

Has any progress been made to run both consul agents and servers on nomad?

ketzacoatl · 2016-02-03T02:51:14Z

@F21, from what I can tell, it's not impossible, it just takes work. With that said, I have had a lot of success with consul as the primary core service that runs outside nomad, and I would recommend considering this route too.

F21 · 2016-02-03T02:53:49Z

@ketzacoatl That's what I am currently doing with a virtualized test cluster. However, if you have say 3 nodes running nomad servers and consul servers, how are you recovering if 1 of those nodes goes down or experiences a hardware failure?

ketzacoatl · 2016-02-03T03:38:30Z

I have my nomad servers and consul leaders running together on one auto-scaling group, 3 to 5 servers. If one node goes down, AWS follows the auto-scaling group setup and creates a replacement for the node(s) that are not present.

F21 · 2016-02-03T03:47:57Z

Ah, that makes sense. I am not using AWS but will probably be running on a set of dedicated servers and a public cloud provider without auto-scaling, so a machine going down will need manual intervention.

memelet · 2016-02-03T09:37:02Z

@ketzacoatl How are your ASG instances joining the cluster when started? That is, how do you "know" the other servers to join to?

ketzacoatl · 2016-02-03T13:45:51Z

@memelet, for the consul leaders themselves, I use some AWS hackery - Terraform creates the ASG and puts it in the smallest subnet possible (limiting the IP range). We have 2 AZ for failover on the ASG, so there are two subnets, and the list of "possible IPs" is computed (eg, those that the leaders might actually have, but we don't know, because it's ASG), and that list is used to create a DNS entry for "all leader IPs". Note that the list of possible IPs is huge (~25) compared to the leader nodes (3 - 5). Consul agents can then be pointed at that DNS record for the leaders, and configured with retry_join, so they will eventually find one of the "right" leader IPs, and get connected to the network. The time it takes the agents to find the leaders is dependent on the retry_interval. The first goal of the leaders is to get consul up, then nomad. Nomad relies on consul in my setup, and I think it's goofy to make Consul rely on Nomad (at least in my setup, and that is because I use consul as part of "distributed Configuration Management" for the cluster, and it forms the foundation for the whole shebang - you gotta pick either the egg, chicken or farmer in your story..), so Nomad leader/servers comes online, and then publish their "service" in the Consul catalog. The service check is simple, and so long as the service is running, the server is listed in consul, and that lets the nomad servers find each other for their quorum. Even with the DNS hackery, this has worked very reliably, albeit the agent nodes can be a little slow to find the leaders and join. I plan on addressing this with some code on lambda that updates the DNS record as the nodes in the leader ASG come and go.

ghost · 2016-05-05T03:15:31Z

Why not Consul running with nomad by default? every node running nomad can be running consul as well.
Please make them into one box . so we can easier to use nomad to manager cluster as well.

ketzacoatl · 2016-05-05T03:23:32Z

Running the two on the same hosts is trivial. The documentation for each app is clear in how to configure and run the software. There is a learning curve to understanding all the details you need to master in order to be effective.

themalkolm · 2016-08-04T13:00:35Z

@ketzacoatl wow, thank you for describing your AWS hackery! Didn't think that we could simply brute force to find a leader in some subnet 👍

ketzacoatl · 2016-08-04T13:45:49Z

You could also use lambda to update a DNS record when nodes in the ASG change - see https://objectpartners.com/2015/07/07/aws-tricks-updating-route53-dns-for-autoscalinggroup-using-lambda/ for an example.

dadgar · 2017-02-14T20:53:45Z

Hey I am going to close this since we recommend running Consul outside of Nomad.

When restoring a snapshot (on startup, installed from the leader, or during recovery) the logs are extremely terse. There are typically bookend messages indicating that a restore is going to happen, and that it is complete, but there's a big dead space in the middle. For small snapshots this is probably fine, but for larger multi-GB snapshots this can stretch out and can be unnerving as an operator to know if it's stuck or still making progress. This PR adjusts the logging to indicate a simple progress log message every 10s about overall completion in bytes-consumed.

github-actions · 2022-12-15T02:18:06Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

F21 changed the title ~~Consul integration difficult to use~~ Improve consul integration user experience. Nov 23, 2015

diptanu closed this as completed Nov 23, 2015

diptanu reopened this Nov 23, 2015

diptanu added theme/core theme/discovery labels Nov 23, 2015

dadgar closed this as completed Feb 14, 2017

github-actions bot locked as resolved and limited conversation to collaborators Dec 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve consul integration user experience. #490

Improve consul integration user experience. #490

F21 commented Nov 23, 2015

diptanu commented Nov 23, 2015

F21 commented Nov 23, 2015

diptanu commented Nov 24, 2015

rothgar commented Nov 25, 2015

rothgar commented Nov 25, 2015

ketzacoatl commented Dec 25, 2015

F21 commented Dec 30, 2015

diptanu commented Dec 31, 2015

F21 commented Dec 31, 2015

F21 commented Feb 3, 2016

ketzacoatl commented Feb 3, 2016

F21 commented Feb 3, 2016

ketzacoatl commented Feb 3, 2016

F21 commented Feb 3, 2016

memelet commented Feb 3, 2016

ketzacoatl commented Feb 3, 2016

ghost commented May 5, 2016

ketzacoatl commented May 5, 2016 •

edited

Loading

themalkolm commented Aug 4, 2016

ketzacoatl commented Aug 4, 2016

dadgar commented Feb 14, 2017

github-actions bot commented Dec 15, 2022

Improve consul integration user experience. #490

Improve consul integration user experience. #490

Comments

F21 commented Nov 23, 2015

diptanu commented Nov 23, 2015

F21 commented Nov 23, 2015

diptanu commented Nov 24, 2015

rothgar commented Nov 25, 2015

rothgar commented Nov 25, 2015

ketzacoatl commented Dec 25, 2015

F21 commented Dec 30, 2015

diptanu commented Dec 31, 2015

F21 commented Dec 31, 2015

F21 commented Feb 3, 2016

ketzacoatl commented Feb 3, 2016

F21 commented Feb 3, 2016

ketzacoatl commented Feb 3, 2016

F21 commented Feb 3, 2016

memelet commented Feb 3, 2016

ketzacoatl commented Feb 3, 2016

ghost commented May 5, 2016

ketzacoatl commented May 5, 2016 • edited Loading

themalkolm commented Aug 4, 2016

ketzacoatl commented Aug 4, 2016

dadgar commented Feb 14, 2017

github-actions bot commented Dec 15, 2022

ketzacoatl commented May 5, 2016 •

edited

Loading