From b0f1a7d9aad5dae10302c7c90ea8c7b229e0e66a Mon Sep 17 00:00:00 2001 From: Diptanu Choudhury Date: Mon, 27 Jun 2016 16:23:32 -0700 Subject: [PATCH 1/3] Added docs around bootstrapping a Nomad cluster --- .../source/docs/cluster/bootstrapping.html.md | 132 ++++++++++++++++++ website/source/layouts/docs.erb | 4 + 2 files changed, 136 insertions(+) create mode 100644 website/source/docs/cluster/bootstrapping.html.md diff --git a/website/source/docs/cluster/bootstrapping.html.md b/website/source/docs/cluster/bootstrapping.html.md new file mode 100644 index 00000000000..9a9d78e386d --- /dev/null +++ b/website/source/docs/cluster/bootstrapping.html.md @@ -0,0 +1,132 @@ +--- +layout: "docs" +page_title: "Creating a cluster" +sidebar_current: "docs-cluster-bootstrap" +description: |- + Learn how to bootstrap a Nomad cluster. +--- + +# Creating a cluster + +Nomad clusters in production comprises of a few Nomad servers, clients and +optionally Consul servers and clients.Before we start discussing the specifics +around bootstrapping clusters we should discuss the network topology. Nomad +models infrastructure as regions and datacenters. Regions may contain multiple +datacenters. Servers are assigned to a single region and hence a region is a +single scheduling domain in Nomad. Multiple regions can be connected together +and users can target a different region when they query the Nomad APIs or submit +a job to servers in a specific region. + + +## Nomad Servers + +Nomad servers are expected to have network latencies no more than 10 +milliseconds between them. Nomad servers could be spread across multiple +datacenters, if they have low latency connections between them, to achieve high +availability. For example, on AWS every region comprises of multiple zones which +have very low latency links between them, so every zone can be modeled as a +Nomad Data Center and every Zone can have a single Nomad server which could be +connected to form a quorum and form a Region. Nomad servers uses raft for +replicating state between them and raft being highly consistent needs a quorum +of servers to function, therefore we recommend running an odd number of Nomad +servers in a region. Usually running 3-5 servers in a region is recommended. The +cluster can withstand a failure of one server in a cluster of three +servers and two failures in a cluster of five servers. Adding more servers to +the quorum adds more time to replicate state and hence throughput decreases so +we don't recommend having more than seven servers in a region. + +During the bootstrapping phase Nomad servers need to know the addresses of other +servers, which can be achieved by using the `-reconnect-join` cli command or by +pointing Nomad servers to a Consul Service. + + +## Nomad Clients + +Nomad clients are organized in DCs and they have to be seeded by the list of +servers that they will have to connect. + +Operators can either place the addresses of the Nomad servers in the client +configuration or point Nomad client to the Nomad server service in Consul. Once +a client establishes connection with a Nomad servers, if new servers are added +to the cluster the addresses are propagated down to the clients along with +heartbeat. + + +## Consul Cluster + +Bootstrapping a Nomad cluster becomes significantly easier if operators use +Consul and registers Nomad servers with Consul. Network topology of a Consul +cluster is slightly different than Nomad. Consul models infrastructures as Data +Centers, and each Consul Data Center can have up to ~10,000 nodes and multiple +Consul Data Centres can be connected over the WAN so that clients can discover +nodes in other Data Centres. We recommend running a Consul Cluster in every +Nomad Data Center and connecting them over the WAN. Please refer to the Consul +documentation to learn more about bootstrapping Consul and connecting multiple +Consul clusters over the WAN. + +Also, Nomad clusters can be significantly larger than Consul clusters, so +sharding the Consul clusters per ~10,000 nodes organized in individual DCs helps +scale Consul as the Nomad clusters scale. + +### Bootstrapping a Nomad cluster without Consul + +At least one Nomad server's address (also known as the seed node) needs to be +known ahead of time and a running agent can be joined to a cluster by running +the `server-join` cli command. + +For example, once a Nomad agent starts in the server mode it can be joined to an +existing cluster with a server whose IP is known. Once the agent joins the other +node in the cluster, it can discover the other nodes via the gossip protocol. + +``` +nomad server-join -retry-join 10.0.0.1 +``` + +The `-retry-join` parameter indicates that the agent should keep trying to join +the server even if the first attempt fails. This is essential when the other +address is going to be eventually available after some time as nodes might take +a variable amount of time to boot up in a cluster. + +On the client side, the addresses of the servers are expected to be specified +via the client configuration. + +``` +client { + ... + servers = ["10.10.11.2:4646", "10.10.11.3:4646", "10.10.11.4:4646"] + ... +} +``` + +In the above example we are specifying three servers for the clients to +connect. If servers are added or removed, clients know about them via the +heartbeat of a server which is alive. + + +### Bootstrapping a Nomad cluster with Consul + +Bootstrapping a Nomad cluster is significantly easier if Consul is used along +with Nomad. If a local Consul cluster is bootstrapped before Nomad, the +following configuration would register the Nomad agent with Consul and look up +the addresses of the other Nomad server addresses and join with them +automatically. + +``` + +{ + "server_service_name": "nomad-server", + "server_auto_join": true, + "client_service_name": "nomad-client", + "client_auto_join": true +} +``` + +With the above configuration Nomad agent is going to look up Consul for +addresses of agents in the `nomad-server` service and join them automatically. +In addition, if the `auto-advertise` option is set Nomad is going to register +the agents with Consul automatically too. + +Please refer to the documentation for the complete set of configuration options. + + + diff --git a/website/source/layouts/docs.erb b/website/source/layouts/docs.erb index 999b85dda86..54e021b7391 100644 --- a/website/source/layouts/docs.erb +++ b/website/source/layouts/docs.erb @@ -31,6 +31,10 @@ Installation + > + Bootstrapping + + > Upgrading