Remove todos

hashicorp · Feb 10, 2017 · 0c67c8b · 0c67c8b
1 parent 17ea89d
commit 0c67c8b
Show file tree

Hide file tree

Showing 8 changed files with 308 additions and 5 deletions.
diff --git a/nomad/server.go b/nomad/server.go
@@ -1058,7 +1058,6 @@ func (s *Server) GetConfig() *Config {
 	return s.config
 }
 
-// TODO(alex) we need a outage guide
 // peersInfoContent is used to help operators understand what happened to the
 // peers.json file. This is written to a file called peers.info in the same
 // location.
@@ -1082,5 +1081,5 @@ creating the peers.json file, and that all servers receive the same
 configuration. Once the peers.json file is successfully ingested and applied, it
 will be deleted.
 
-Please see https://www.consul.io/docs/guides/outage.html for more information.
+Please see https://www.nomadproject.io/guides/outage.html for more information.
 `
diff --git a/website/source/docs/commands/operator-index.html.md.erb b/website/source/docs/commands/operator-index.html.md.erb
@@ -16,9 +16,9 @@ as interacting with the Raft subsystem. This was added in Nomad 0.5.5.
 ~> Use this command with extreme caution, as improper use could lead to a Nomad
 outage and even loss of data.
 
-See the [Outage Recovery](TODO alexdadgar) guide for some examples of how
+See the [Outage Recovery](/guides/outage.html) guide for some examples of how
 this command is used. For an API to perform these operations programatically,
-please see the documentation for the [Operator](/docs/agent/http/operator.html)
+please see the documentation for the [Operator](/guides/outage.html)
 endpoint.
 
 ## Usage

diff --git a/website/source/docs/http/operator.html.md b/website/source/docs/http/operator.html.md
@@ -15,7 +15,7 @@ as interacting with the Raft subsystem. This was added in Nomad 0.5.5
 ~> Use this interface with extreme caution, as improper use could lead to a
   Nomad outage and even loss of data.
 
-See the [Outage Recovery](/docs/guides/outage.html) guide for some examples of how
+See the [Outage Recovery](/guides/outage.html) guide for some examples of how
 these capabilities are used. For a CLI to perform these operations manually, please
 see the documentation for the [`nomad operator`](/docs/commands/operator-index.html)
 command.

diff --git a/website/source/guides/cluster/automatic.html.md b/website/source/guides/cluster/automatic.html.md
@@ -0,0 +1,116 @@
+---
+layout: "guides"
+page_title: "Automatically Bootstrapping a Nomad Cluster"
+sidebar_current: "guides-cluster-automatic"
+description: |-
+  Learn how to automatically bootstrap a Nomad cluster using Consul. By having
+  a Consul agent installed on each host, Nomad can automatically discover other
+  clients and servers to bootstrap the cluster without operator involvement.
+---
+
+# Automatic Bootstrapping
+
+To automatically bootstrap a Nomad cluster, we must leverage another HashiCorp
+open source tool, [Consul](https://www.consul.io/). Bootstrapping Nomad is
+easiest against an existing Consul cluster. The Nomad servers and clients
+will become informed of each other's existence when the Consul agent is
+installed and configured on each host. As an added benefit, integrating Consul
+with Nomad provides service and health check registration for applications which
+later run under Nomad.
+
+Consul models infrastructures as datacenters and multiple Consul datacenters can
+be connected over the WAN so that clients can discover nodes in other
+datacenters. Since Nomad regions can encapsulate many datacenters, we recommend
+running a Consul cluster in every Nomad datacenter and connecting them over the
+WAN. Please refer to the Consul guide for both
+[bootstrapping](https://www.consul.io/docs/guides/bootstrapping.html) a single
+datacenter and [connecting multiple Consul clusters over the
+WAN](https://www.consul.io/docs/guides/datacenters.html).
+
+If a Consul agent is installed on the host prior to Nomad starting, the Nomad
+agent will register with Consul and discover other nodes.
+
+For servers, we must inform the cluster how many servers we expect to have. This
+is required to form the initial quorum, since Nomad is unaware of how many peers
+to expect. For example, to form a region with three Nomad servers, you would use
+the following Nomad configuration file:
+
+```hcl
+# /etc/nomad.d/server.hcl
+
+server {
+  enabled          = true
+  bootstrap_expect = 3
+}
+```
+
+This configuration would be saved to disk and then run:
+
+```shell
+$ nomad agent -config=/etc/nomad.d/server.hcl
+```
+
+A similar configuration is available for Nomad clients:
+
+```hcl
+# /etc/nomad.d/client.hcl
+
+datacenter = "dc1"
+
+client {
+  enabled = true
+}
+```
+
+The agent is started in a similar manner:
+
+```shell
+$ nomad agent -config=/etc/nomad.d/client.hcl
+```
+
+As you can see, the above configurations include no IP or DNS addresses between
+the clients and servers. This is because Nomad detected the existence of Consul
+and utilized service discovery to form the cluster.
+
+## Internals
+
+~> This section discusses the internals of the Consul and Nomad integration at a
+very high level. Reading is only recommended for those curious to the
+implementation.
+
+As discussed in the previous section, Nomad merges multiple configuration files
+together, so the `-config` may be specified more than once:
+
+```shell
+$ nomad agent -config=base.hcl -config=server.hcl
+```
+
+In addition to merging configuration on the command line, Nomad also maintains
+its own internal configurations (called "default configs") which include sane
+base defaults. One of those default configurations includes a "consul" block,
+which specifies sane defaults for connecting to and integrating with Consul. In
+essence, this configuration file resembles the following:
+
+```hcl
+# You do not need to add this to your configuration file. This is an example
+# that is part of Nomad's internal default configuration for Consul integration.
+consul {
+  # The address to the Consul agent.
+  address = "127.0.0.1:8500"
+
+  # The service name to register the server and client with Consul.
+  server_service_name = "nomad"
+  client_service_name = "nomad-client"
+
+  # Enables automatically registering the services.
+  auto_advertise = true
+
+  # Enabling the server and client to bootstrap using Consul.
+  server_auto_join = true
+  client_auto_join = true
+}
+```
+
+Please refer to the [Consul
+documentation](/docs/agent/configuration/consul.html) for the complete set of
+configuration options.
diff --git a/website/source/guides/cluster/bootstrapping.html.md b/website/source/guides/cluster/bootstrapping.html.md
@@ -0,0 +1,24 @@
+---
+layout: "guides"
+page_title: "Bootstrapping a Nomad Cluster"
+sidebar_current: "guides-cluster-bootstrap"
+description: |-
+  Learn how to bootstrap a Nomad cluster.
+---
+
+# Bootstrapping a Nomad Cluster
+
+Nomad models infrastructure into regions and datacenters. Servers reside at the
+regional layer and manage all state and scheduling decisions for that region.
+Regions contain multiple datacenters, and clients are registered to a single
+datacenter (and thus a region that contains that datacenter). For more details on
+the architecture of Nomad and how it models infrastructure see the [architecture
+page](/docs/internals/architecture.html).
+
+There are two strategies for bootstrapping a Nomad cluster:
+
+1. <a href="/guides/cluster/automatic.html">Automatic bootstrapping</a>
+1. <a href="/guides/cluster/manual.html">Manual bootstrapping</a>
+
+Please refer to the specific documentation links above or in the sidebar for
+more detailed information about each strategy.
diff --git a/website/source/guides/cluster/federation.md b/website/source/guides/cluster/federation.md
@@ -0,0 +1,28 @@
+---
+layout: "guides"
+page_title: "Federating a Nomad Cluster"
+sidebar_current: "guides-cluster-federation"
+description: |-
+  Learn how to join Nomad servers across multiple regions so users can submit
+  jobs to any server in any region using global federation.
+---
+
+# Federating a Cluster
+
+Because Nomad operates at a regional level, federation is part of Nomad core.
+Federation enables users to submit jobs or interact with the HTTP API targeting
+any region, from any server, even if that server resides in a different region.
+
+Federating multiple Nomad clusters is as simple as joining servers. From any
+server in one region, issue a join command to a server in a remote region:
+
+```shell
+$ nomad server-join 1.2.3.4:4648
+```
+
+Note that only one join command is required per region. Servers across regions
+discover other servers in the cluster via the gossip protocol and hence it's
+enough to join just one known server.
+
+If bootstrapped via Consul and the Consul clusters in the Nomad regions are
+federated, then federation occurs automatically.
diff --git a/website/source/guides/cluster/manual.html.md b/website/source/guides/cluster/manual.html.md
@@ -0,0 +1,65 @@
+---
+layout: "guides"
+page_title: "Manually Bootstrapping a Nomad Cluster"
+sidebar_current: "guides-cluster-manual"
+description: |-
+  Learn how to manually bootstrap a Nomad cluster using the server-join
+  command. This section also discusses Nomad federation across multiple
+  datacenters and regions.
+---
+
+# Manual Bootstrapping
+
+Manually bootstrapping a Nomad cluster does not rely on additional tooling, but
+does require operator participation in the cluster formation process. When
+bootstrapping, Nomad servers and clients must be started and informed with the
+address of at least one Nomad server.
+
+As you can tell, this creates a chicken-and-egg problem where one server must
+first be fully bootstrapped and configured before the remaining servers and
+clients can join the cluster. This requirement can add additional provisioning
+time as well as ordered dependencies during provisioning.
+
+First, we bootstrap a single Nomad server and capture its IP address. After we
+have that nodes IP address, we place this address in the configuration.
+
+For Nomad servers, this configuration may look something like this:
+
+```hcl
+server {
+  enabled          = true
+  bootstrap_expect = 3
+
+  # This is the IP address of the first server we provisioned
+  retry_join = ["<known-address>:4648"]
+}
+```
+
+Alternatively, the address can be supplied after the servers have all been
+started by running the [`server-join` command](/docs/commands/server-join.html)
+on the servers individual to cluster the servers. All servers can join just one
+other server, and then rely on the gossip protocol to discover the rest.
+
+```
+$ nomad server-join <known-address>
+```
+
+For Nomad clients, the configuration may look something like:
+
+```hcl
+client {
+  enabled = true
+  servers = ["<known-address>:4647"]
+}
+```
+
+At this time, there is no equivalent of the <tt>server-join</tt> command for
+Nomad clients.
+
+The port corresponds to the RPC port. If no port is specified with the IP
+address, the default RCP port of `4647` is assumed.
+
+As servers are added or removed from the cluster, this information is pushed to
+the client. This means only one server must be specified because, after initial
+contact, the full set of servers in the client's region are shared with the
+client.
diff --git a/website/source/guides/cluster/requirements.html.md b/website/source/guides/cluster/requirements.html.md
@@ -0,0 +1,71 @@
+---
+layout: "guides"
+page_title: "Nomad Client and Server Requirements"
+sidebar_current: "guides-cluster-requirements"
+description: |-
+  Learn about Nomad client and server requirements such as memory and CPU
+  recommendations, network topologies, and more.
+---
+
+# Cluster Requirements
+
+## Resources (RAM, CPU, etc.)
+
+**Nomad servers** may need to be run on large machine instances. We suggest
+having 8+ cores, 32 GB+ of memory, 80 GB+ of disk and significant network
+bandwidth. The core count and network recommendations are to ensure high
+throughput as Nomad heavily relies on network communication and as the Servers
+are managing all the nodes in the region and performing scheduling. The memory
+and disk requirements are due to the fact that Nomad stores all state in memory
+and will store two snapshots of this data onto disk. Thus disk should be at
+least 2 times the memory available to the server when deploying a high load
+cluster.
+
+**Nomad clients** support reserving resources on the node that should not be
+used by Nomad. This should be used to target a specific resource utilization per
+node and to reserve resources for applications running outside of Nomad's
+supervision such as Consul and the operating system itself.
+
+Please see the [reservation configuration](/docs/agent/configuration/client.html#reserved) for
+more detail.
+
+## Network Topology
+
+**Nomad servers** are expected to have sub 10 millisecond network latencies
+between each other to ensure liveness and high throughput scheduling. Nomad
+servers can be spread across multiple datacenters if they have low latency
+connections between them to achieve high availability.
+
+For example, on AWS every region comprises of multiple zones which have very low
+latency links between them, so every zone can be modeled as a Nomad datacenter
+and every Zone can have a single Nomad server which could be connected to form a
+quorum and a region.
+
+Nomad servers uses Raft for state replication and Raft being highly consistent
+needs a quorum of servers to function, therefore we recommend running an odd
+number of Nomad servers in a region.  Usually running 3-5 servers in a region is
+recommended. The cluster can withstand a failure of one server in a cluster of
+three servers and two failures in a cluster of five servers. Adding more servers
+to the quorum adds more time to replicate state and hence throughput decreases
+so we don't recommend having more than seven servers in a region.
+
+**Nomad clients** do not have the same latency requirements as servers since they
+are not participating in Raft. Thus clients can have 100+ millisecond latency to
+their servers. This allows having a set of Nomad servers that service clients
+that can be spread geographically over a continent or even the world in the case
+of having a single "global" region and many datacenter.
+
+## Ports Used
+
+Nomad requires 3 different ports to work properly on servers and 2 on clients,
+some on TCP, UDP, or both protocols. Below we document the requirements for each
+port.
+
+* HTTP API (Default 4646). This is used by clients and servers to serve the HTTP
+  API. TCP only.
+
+* RPC (Default 4647). This is used by servers and clients to communicate amongst
+  each other. TCP only.
+
+* Serf WAN (Default 4648). This is used by servers to gossip over the WAN to
+  other servers. TCP and UDP.