Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS Guide #2923

Merged
merged 13 commits into from
Aug 12, 2017
7 changes: 7 additions & 0 deletions demo/vagrant/Vagrantfile
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,13 @@ EOF
sudo systemctl enable consul.service
sudo systemctl start consul

for bin in cfssl cfssl-certinfo cfssljson
do
echo "Installing $bin..."
curl -sSL https://pkg.cfssl.org/R1.2/${bin}_linux-amd64 > /tmp/${bin}
sudo install /tmp/${bin} /usr/local/bin/${bin}
done

SCRIPT

Vagrant.configure(2) do |config|
Expand Down
375 changes: 375 additions & 0 deletions website/source/guides/tls.html.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,375 @@
---
layout: "guides"
page_title: "Securing Nomad with TLS"
sidebar_current: "guides-tls"
description: |-
Securing Nomad's cluster communication is not only important for security but
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Securing Nomad's cluster communication with TLS is important for both security and easing operations. Nomad can also use mutual TLS (mTLS) for authenticating for all HTTP and RPC communication.

can even ease operations by preventing mistakes and misconfigurations. Nomad
optionally uses mutual TLS (mTLS) for all HTTP and RPC communication.
---

# Securing Nomad with TLS

Securing Nomad's cluster communication is not only important for security but
can even ease operations by preventing mistakes and misconfigurations. Nomad
optionally uses mutual TLS (mTLS) for all HTTP and RPC communication. Nomad's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be beneficial to link to wikipedia or similar here for TLS? That way we don't have to explain it, but we can give folks who are unfamiliar a link to the more detailed spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on the TLS article. I considered linking to the mutual TLS article but it's not good: https://en.wikipedia.org/wiki/Mutual_authentication

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mutual TLS section is pretty small, doesn't use quite the same nomenclature as us, and says whether or not it's mutual is governed by the cipher suite which isn't technically correct. :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend against linking to the site of any individual public certificate authority in this doc.

use of mTLS provides the following properties:

* Prevent unauthorized Nomad access
* Prevent observing or tampering with Nomad communication
* Prevent client/server role or region misconfigurations

The 3rd property is fairly unique to Nomad's use of TLS. While most uses of TLS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find "third", "last", etc to easily become stale when it comes to documentation. I would recommend saying "Preventing region misconfigurations is relatively unique to Nomad's use of TLS..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call and I forgot @chelseakomlo pointed out it's not exactly unique to Nomad.

verify the identity of the server you're connecting to based on a domain name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to avoid contractions (you're in this case)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Fixed them all (I think)

such as `nomadproject.io`, Nomad verifies the node you're connecting to is in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it'd be better to use mycompany.org or example.com here instead? I can see a user getting confused because that is this domain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example.com seems canonical

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example.com is reserved by IANA for this purpose.

the expected region and configured for the expected role (e.g.
`client.us-west.nomad`).

Configuring TLS can be unfortunately complex process, but if you used the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit harsh. Consider:

Correctly configuring TLS can be a complex process, especially given the wide range of deployment methodologies. The sample Vagrantfile (link) ...

[Getting Started guide's Vagrantfile][Vagrantfile] or have [cfssl][] and Nomad
installed this guide will provide you with a production ready TLS
configuration.

~> Note that while Nomad's TLS configuration will be production ready, key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/complex/(nothing)

management and rotation is a complex subject not covered by this guide.
[Vault][] is the suggested solution for key generation and management.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably still want the link here [Vault][vault]. If someone changes that in the future, it'll break (and it's annoying and hard to catch).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably still want the link here [Vault][vault]. If someone changes that in the future, it'll break (and it's annoying and hard to catch).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Making all links explicit.


## Creating Certificates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to have a really high-level list of steps before we dive in like:

  1. Determine if using a CA or self-signed
  2. Gen certs
  3. Gen node certs
  4. Configure nomad

The content in these sections is really good, but it's hard to "zoom out" and understand the high-level steps. I usually find documentation easier to follow when I have a little ToC or outline in the beginning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any special magic I can do to make a ToC that links to headers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly no 😦 , it has to be manual. However, the header links are predictable


The first step to configuring TLS for Nomad is generating certificates. In
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generating TLS certificates

order to prevent unauthorized cluster access, Nomad requires all certificates
be signed by the same Certificate Authority (CA). This should be a *private* CA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use _ (underscore) instead of * for emphasis

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a docs guide with this stuff somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this: https://github.com/hashicorp/middleman-hashicorp. Markdown allows for both, but we generally use ** for bold and _ for italics. Idk why, that's just kinda "what we do"

and not a public one like [Let's Encrypt][letsencrypt] as any certificate
signed by this CA will be allowed to communicate with the cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actual Q (and not just a devil's advocate question):

How will Nomad behave if some certs are signed by a CA while others are signed by an intermediate of the same CA? Will that chain still trust/validate or does it have to be the exact same CA? If so, we should mention that as one of those ~> callouts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! Intermediate CAs may be used as long as every node's ca_file points to a bundle of all CA certs (root+intermediates). I'll add a ~>


### Certificate Authority

There are a variety of tools for managing your own CA, [like the PKI secret
backend in Vault][vault-pki], but for the sake of simplicity in this guide
we'll use [cfssl][]. You can generate a private CA certificate and key with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused about the decision to use cfssl vs openssl. It adds an additional layer that some readers may be unfamiliar with. The majority of the TLS guides out there use openssl to generate certs. Albeit a long series of completely un-rememberable flags, it's kinda "the standard". I haven't seen much usage of cfssl in the wild.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I covered this above and am open to changing but I should probably clarify: I am not as confident in my ability to deliver correct instructions with OpenSSL as with cfssl. Definitely not as concise.

As to "in the wild" examples of cfssl, while OpenSSL clearly dominates, cfssl is quite popular in related products' documentation:

https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/02-certificate-authority.md
https://coreos.com/os/docs/latest/generate-self-signed-certificates.html
https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/
https://github.com/kelseyhightower/docker-kubernetes-tls-guide
https://www.digitalocean.com/community/tutorials/how-to-secure-your-coreos-cluster-with-tls-ssl-and-firewall-rules

coreos & k8s may have done it because cfssl is new and shiny, but I suspect they also need a lot of the features (SANs) we do that OpenSSL makes difficult.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm okay. Let's keep cfssl for now. I'm wondering if it's worth calling out that openssl is possible, but not covered, in the page somewhere

[cfssl][]:

```shell
# Generate the CA's private key and certificate
cfssl print-defaults csr | cfssl gencert -initca - | cfssljson -bare nomad-ca
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please prefix all commands with a $. This helps distinguish the command to run from the output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to follow the following conventions:

  • For commands that should be copied and pasted use shell highlighting and no $
  • For commands with output use text highlighting and $

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm okay. All commands should include the $ symbol (that's our convention). If it includes output, it should use text highlighting.

```

The CA key (`nomad-ca-key.pem`) will be used to sign certificates for Nomad
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that these files would be created on the filesystem, not output to the terminal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah. That's part of the reason why I create the file listing below. Cert generation is so complicated I wanted to make a nice checkpoint for users to verify they've gotten that far correctly.

nodes and must be kept private. The CA certificate (`nomad-ca.pem`) contains
the public key necessary to validate Nomad certificates and therefore must be
distributed to every node that requires access.

### Node Certificates

Once you have a CA certifacte and key you can generate and sign the
certificates Nomad will use directly. TLS certificates commonly use the
fully-qualified domain name of the system being identified as the certificate's
Common Name (CN). However, hosts (and therefore hostnames and IPs) are often
ephemeral in Nomad clusters. They come and go as clusters are scaled up and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "come and go" sentence is redundant given "ephemeral" in the previous.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Brevity is not my strong suite. Removed.

down or outages occur. Not only would signing a new certificate per Nomad node
be difficult, but using a hostname provides no security or functional benefits
to Nomad. To fulfill the desired security properties (above) Nomad certificates
are signed with their region and role such as:

* `client.global.nomad` for a client node in the `global` region
* `server.us-west.nomad` for a server node in the `us-west` region

To create certificates for the client and server in the cluster from the
[Getting Started guide][guide-cluster] with [cfssl][] create ([or
download][cfssl.json]) the following configuration file as `cfssl.json` to
increase the default certificate expiration time:

```json
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use 2 spaces for JSON (instead of 4 and/or a tab here - I can't tell)

Copy link
Member Author

@schmichael schmichael Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 would love this sort of thing in a style guide

(or in my dreams: mdfmt)

"signing": {
"default": {
"expiry": "87600h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
}
}
}
```

```shell
# Generate a certificate for the Nomad server
echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem -config=cfssl.json \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ to prefix commands (for all these)

-hostname="server.global.nomad,localhost" - | cfssljson -bare server

# Generate a certificate for the Nomad client
echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem -config=cfssl.json \
-hostname="client.global.nomad,localhost" - | cfssljson -bare client

# Generate a certificate for the CLI
echo '{}' | cfssl gencert -ca nomad-ca.pem -ca-key nomad-ca-key.pem -profile=client \
- | cfssljson -bare cli
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... I have a set of Terraform configurations that do this whole key-signing, etc. I wonder if we should include those in an examples/ folder for Nomad and also reference that here? /cc @dadgar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love for this guide to be a self-contained extension of the Getting Started guide. That being said I'm 👍 to more tf examples in our repo and can link to it from somewhere in this doc.


Using `localhost` as a subject alternate name (SAN) allows tools like `curl` to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also found adding an IP SAN for 127.0.0.1 is especially useful...

be able to communicate with Nomad's HTTP API when run on the same host. Other
SANs may be added including a DNS resolvable hostname to allow remote HTTP
requests from third party tools.

You should now have the following files:

* `cfssl.json` - cfssl configuration.
* `nomad-ca.csr` - CA signing request.
* `nomad-ca-key.pem` - CA private key. Keep safe!
* `nomad-ca.pem` - CA public certificate.
* `cli.csr` - Nomad CLI certificate signing request.
* `cli.pem` - Nomad CLI certificate.
* `cli-key.pem` - Nomad CLI private key.
* `client.csr` - Nomad client node certificate signing request for the `global` region.
* `client-key.pem` - Nomad client node private key for the `global` region.
* `client.pem` - Nomad client node public certificate for the `global` region.
* `server.csr` - Nomad server node certificate signing request for the `global` region.
* `server-key.pem` - Nomad server node private key for the `global` region.
* `server.pem` - Nomad server node public certificate for the `global` region.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These actually render with anchor links which isn't what I intended, but seems ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's part of middleman-hashicorp. It'll do that


Each Nomad node should have the appropriate key (`-key.pem`) and certificate
(`.pem`) file for its region and role. In addition each node needs the CA's
public certificate (`nomad-ca.pem`).

## Configuring Nomad

Once you have the appropriate key and certificates installed you're ready to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very "first person-ey". Instead consider:

Next Nomad must be configured to use the newly-created key and certificates for mutual TLS...

configure Nomad to use them for mTLS. Starting with the [server configuration
from the Getting Started guide][guide-server] add the following TLS
CONFIGUration options:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONFIGU (all caps, seems typo)


```hcl
# Increase log verbosity
log_level = "DEBUG"

# Setup data dir
data_dir = "/tmp/server1"

# Enable the server
server {
enabled = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use 2 spaces or just run hclfmt on this whole configuration file.


# Self-elect, should be 3 or 5 for production
bootstrap_expect = 1
}

# Require TLS
tls {
http = true
rpc = true

ca_file = "nomad-ca.pem"
cert_file = "server.pem"
key_file = "server-key.pem"

verify_server_hostname = true
verify_https_client = true
}
```

The new `tls` section is worth breaking down in more detail:

```hcl
http = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include tls for more context here:

tls  {
  http = true
  rcp  = true
  # ...

rpc = true
```

This enables TLS for the HTTP and RPC protocols. Unlike web servers, Nomad
doesn't use separate ports for TLS and non-TLS traffic: your cluster should
either use TLS or not.

```hcl
ca_file = "nomad-ca.pem"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, add tls context

cert_file = "server.pem"
key_file = "server-key.pem"
```

The file lines should point to whereever you placed the certificate files on
the node. This guide assumes they are in Nomad's current directory.

```hcl
verify_server_hostname = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tls context

verify_https_client = true
```

These two settings are important for ensuring all of Nomad's mTLS security
properties are met. If `verify_server_hostname` is set to `false` the node's
cerificate will be checked to ensure it is signed by the same CA, but its role
and region will not be verified. This means any service with a certificate from
the same CA as Nomad can act as a client or server of any region.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"with a certificate signed by the same CA as the Nomad servers"? What is "service" in this context? Also, the CA doesn't "give" the certificate, it signs it, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed from the to signed by:

This means any service with a certificate signed by same CA as Nomad can act as a client or server of any region.

Not sure how to clarify "service." I mean literally anything listening on a socket with a certificate signed by the same CA poses a threat if this setting is disabled. "service" seemed like the most generic term for that.


`verify_https_client` requires HTTP API clients to present a certificate signed
by the same CA as Nomad's certificate. It may be disabled to allow HTTP API
clients (eg Nomad CLI, Consul, or curl) to communicate with the HTTPS API
without presenting a client-side certificate. If `verify_https_client` is
enabled ony HTTP API clients presenting a certificate signed by the same CA as
Nomad's certificate are allowed to access Nomad.

~> Enabling `verify_https_client` feature effectively protects Nomad from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit odd that we don't support a TLS configuration like this in Consul /cc @slackpad

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory ACLs are a much more powerful way of accomplishing this. Nomad just doesn't have ACLs yet.

In practice though an extra layer of security might be appealing to some users.

unauthorized network access at the cost of breaking compatibility with Consul
HTTPS health checks.

### Client configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capital "C" in configuration like the other sections


The Nomad client configuration is similar with the only difference being the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's break this into two sentences for clarity.

The Nomad client configuration is similar to the server configuration. The biggest difference is in the certificate and key used for configuration.

certificate and key used:

```hcl
# Increase log verbosity
log_level = "DEBUG"

# Setup data dir
data_dir = "/tmp/client1"

# Enable the client
client {
enabled = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hclfmt


# For demo assume we are talking to server1. For production,
# this should be like "nomad.service.consul:4647" and a system
# like Consul used for service discovery.
servers = ["127.0.0.1:4647"]
}

# Modify our port to avoid a collision with server1
ports {
http = 5656
}

# Require TLS
tls {
http = true
rpc = true

ca_file = "nomad-ca.pem"
cert_file = "client.pem"
key_file = "client-key.pem"

verify_server_hostname = true
verify_https_client = true
}
```

### Running with TLS

Now that we have certificates generated and configuration for a client and
server we can test our TLS-enabled cluster!

In separate terminals start a server and client agent:

```shell
# In one terminal...
nomad agent -config server1.hcl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ to prefix commands (for all these)


# ...and in another
nomad agent -config client1.hcl
```

Finally in a third terminal test out `nomad node-status`:

```text
vagrant@nomad:~$ nomad node-status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the vagrant prompt and use $ instead

Error querying node status: Get http://127.0.0.1:4646/v1/nodes: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
```

Oh no! That didn't work!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very "anti-HashiCorp". We try to avoid "intentional failure" as a learning pattern in documentation. Instead, I would consider writing this as:

If you run nomad node-status now, you'll get an error, like: (error). This is because the Nomad CLI defaults to communicating via HTTP instead of HTTPS. We can configure the local Nomad client to connect using TLS and specify our custom keys and certificates using the command line:

$ nomad node-status -ca-key=... -ca-path=... -addr=... (fill these in)

This process can be cumbersome to type each time, so the Nomad CLI also searches environment variables for default values. The following environment variables ...

$ export NOMAD_ADDR=https://localhost:4646
$ export NOMAD_CACERT=nomad-ca.pem
$ export NOMAD_CLIENT_CERT=client.pem
$ export NOMAD_CLIENT_KEY=client-key.pem


Don't worry, the Nomad CLI just defaults to `http://...` instead of
`https://...`. We can override this with an environment variable:

```shell
export NOMAD_ADDR=https://localhost:4646
export NOMAD_CACERT=nomad-ca.pem
export NOMAD_CLIENT_CERT=client.pem
export NOMAD_CLIENT_KEY=client-key.pem
```

The `NOMAD_CACERT` also needs to be set so the CLI can verify it's talking to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend using a bulleted list here instead of a paragraph:

  • NOMAD_ADDR sets the address to connect to the Nomad cluster. This changes the default value of -addr...
  • NOMAD_CACERT ...
  • ...

an actual Nomad node. Finally, `NOMAD_CLIENT_CERT` and `NOMAD_CLIENT_KEY` need
to be set since we enabled `verify_https_client` above which prevents any
access lacking a client certificate.

Now the CLI works as expected:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After these environment variables are correctly configured, the CLI will respond as expected:


```text
vagrant@nomad:~$ nomad node-status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove vagrant prompts

ID DC Name Class Drain Status
237cd4c5 dc1 nomad <none> false ready

vagrant@nomad:~$ nomad init
Example job file written to example.nomad
vagrant@nomad:~$ nomad run example.nomad
==> Monitoring evaluation "e9970e1d"
Evaluation triggered by job "example"
Allocation "a1f6c3e7" created: node "237cd4c5", group "cache"
Evaluation within deployment: "080460ce"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "e9970e1d" finished with status "complete"
```

## Server Gossip

We haven't quite completely secured Nomad's communications: Nomad server's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a bit more background here, perhaps even a diagram, that explains how Nomad servers <-> clients communicate. I know @dadgar had a slide he used for HashiConf last year. I could rework that to match the new branding.

Even if we don't have a diagram, I think there needs to be a few sentences here like "Nomad uses HTTP to communicate between clients and servers, but the servers communicate among themselves using [gossip] (link). Up to this point, we have secured the client <-> server communication. To secure the server <-> server communication, we must configure the [gossip encryption] (link).

The "we haven't quite completely secured" intro to this paragraph is a bit off-style for our docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more content around the various protocols and can toss in any graphs we dig up.

gossip protocol uses a shared key instead of TLS for encryption. This
encryption key must be added to every server's configuration using the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the encryption key have to be added to the configuration? This implies that it has to be in the configuration file, but I believe it can come from an envvar or be supplied at boot time too, right? It might be good to rephrase this as "provided at runtime" instead (unless I'm wrong).

[`encrypt`](/docs/agent/configuration/server.html#encrypt) parameter.

As a convenience the Nomad CLI includes a `keygen` command for generating a new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/As a convenience/(nothing, capitalize "T" in the)

secure gossip encryption key:

```text
$ nomad keygen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to supply the openssl-equivalent command here. I can see people provisioning using CM tools like Chef, Puppet, etc just wanting to automate that bit out of band, without the nomad cli.

cg8StVXbQJ0gPvMd9o7yrg==
```

Put the generated key into each server's configuration file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make it a bit clearer that you run this once, and insert the same value. Also, see my note above about putting it in the configuration file.


```hcl
server {
enabled = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hclfmt


# Self-elect, should be 3 or 5 for production
bootstrap_expect = 1

# Encrypt gossip communication
encrypt = "cg8StVXbQJ0gPvMd9o7yrg=="
}
```

## Switching an existing cluster to TLS

Since Nomad does *not* use different ports for TLS and non-TLS communication,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ instead of * for emphasis

the use of TLS should be consistent across the cluster. Switching an existing
cluster to use TLS everywhere is similar to upgrading between versions of
Nomad.

First make sure all of your nodes are ready to be switched:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaking felt weird to me. I think we should drop this sentence and just end the above paragraph with :.


* Add the appropriate key and certificates to all nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are ordered, we should use 1. instead of bullets. You don't have to self-number. Use 1. everywhere and markdown will do the right thing ™️

* Ensure the private key file is only readable by the Nomad user.
* Add the environment variables to all nodes where the CLI is used.
* Add the appropriate `tls` block to the configuration file on all nodes.
* Generate a gossip key and add it the Nomad server configuration.

At this point a rolling restart of the cluster will enable TLS everywhere.

1. Restart servers, one at a time
2. Restart clients, one or more at a time

~> As soon as a quorum of servers are TLS-enabled, clients will not be able to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Them" is ambiguous here. Instead consider:

Once a quorum of servers are TLS-enabled, clients will no longer be able to communicate with the servers until their client configuration is updated and reloaded.

communicate with them until they are restarted.

Jobs running in the cluster will *not* be affected and will continue running
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_

throughout the switch.

[guide-server]: https://raw.githubusercontent.com/hashicorp/nomad/master/demo/vagrant/server.hcl
[guide-cluster]: https://www.nomadproject.io/intro/getting-started/cluster.html
[letsencrypt]: https://letsencrypt.org/
[cfssl]: https://cfssl.org/
[cfssl.json]: https://raw.githubusercontent.com/hashicorp/nomad/master/demo/vagrant/cfssl.json
[Vagrantfile]: https://raw.githubusercontent.com/hashicorp/nomad/master/demo/vagrant/Vagrantfile
[Vault]: https://www.vaultproject.io/
[vault-pki]: https://www.vaultproject.io/docs/secrets/pki/index.html