Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS Guide #2923

Merged
merged 13 commits into from
Aug 12, 2017
Merged

TLS Guide #2923

merged 13 commits into from
Aug 12, 2017

Conversation

schmichael
Copy link
Member

@schmichael schmichael commented Jul 27, 2017

Expand the small example tls setup example into a full guide.

I originally tried to do all of the certificate generation with OpenSSL because it's clearly the most common tool for certificate management. I switched to cfssl after fighting with OpenSSL over SANs for far too long.

OpenSSL would have been viable if I would have linked to a CA conf file for people to download, but I really wanted to keep the guide as encapsulated as possible. Also I think keeping copy-and-paste-able commands up to date will be easier than maintaining an OpenSSL CA conf.

* `client.pem` - Nomad client node public certificate for the `global` region.
* `server.csr` - Nomad server node certificate signing request for the `global` region.
* `server-key.pem` - Nomad server node private key for the `global` region.
* `server.pem` - Nomad server node public certificate for the `global` region.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These actually render with anchor links which isn't what I intended, but seems ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's part of middleman-hashicorp. It'll do that

It didn't work in cfssl 1.2 anyway (required building from cfssl
master).

Securing Nomad's cluster communication is not only important for security but
can even ease operations by preventing mistakes and misconfigurations. Nomad
optionally uses mutual TLS (mTLS) for all HTTP and RPC communication. Nomad's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be beneficial to link to wikipedia or similar here for TLS? That way we don't have to explain it, but we can give folks who are unfamiliar a link to the more detailed spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on the TLS article. I considered linking to the mutual TLS article but it's not good: https://en.wikipedia.org/wiki/Mutual_authentication

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mutual TLS section is pretty small, doesn't use quite the same nomenclature as us, and says whether or not it's mutual is governed by the cipher suite which isn't technically correct. :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend against linking to the site of any individual public certificate authority in this doc.

* Prevent observing or tampering with Nomad communication
* Prevent client/server role or region misconfigurations

The 3rd property is fairly unique to Nomad's use of TLS. While most uses of TLS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find "third", "last", etc to easily become stale when it comes to documentation. I would recommend saying "Preventing region misconfigurations is relatively unique to Nomad's use of TLS..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call and I forgot @chelseakomlo pointed out it's not exactly unique to Nomad.

page_title: "Securing Nomad with TLS"
sidebar_current: "guides-tls"
description: |-
Securing Nomad's cluster communication is not only important for security but
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Securing Nomad's cluster communication with TLS is important for both security and easing operations. Nomad can also use mutual TLS (mTLS) for authenticating for all HTTP and RPC communication.

* Prevent client/server role or region misconfigurations

The 3rd property is fairly unique to Nomad's use of TLS. While most uses of TLS
verify the identity of the server you're connecting to based on a domain name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to avoid contractions (you're in this case)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Fixed them all (I think)


The 3rd property is fairly unique to Nomad's use of TLS. While most uses of TLS
verify the identity of the server you're connecting to based on a domain name
such as `nomadproject.io`, Nomad verifies the node you're connecting to is in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it'd be better to use mycompany.org or example.com here instead? I can see a user getting confused because that is this domain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example.com seems canonical

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example.com is reserved by IANA for this purpose.

cluster to use TLS everywhere is similar to upgrading between versions of
Nomad.

First make sure all of your nodes are ready to be switched:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaking felt weird to me. I think we should drop this sentence and just end the above paragraph with :.


First make sure all of your nodes are ready to be switched:

* Add the appropriate key and certificates to all nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are ordered, we should use 1. instead of bullets. You don't have to self-number. Use 1. everywhere and markdown will do the right thing ™️

1. Restart servers, one at a time
2. Restart clients, one or more at a time

~> As soon as a quorum of servers are TLS-enabled, clients will not be able to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Them" is ambiguous here. Instead consider:

Once a quorum of servers are TLS-enabled, clients will no longer be able to communicate with the servers until their client configuration is updated and reloaded.

~> As soon as a quorum of servers are TLS-enabled, clients will not be able to
communicate with them until they are restarted.

Jobs running in the cluster will *not* be affected and will continue running
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_

@@ -50,6 +50,10 @@
</ul>
</li>

<li<%= sidebar_current("guides-tls") %>>
<a href="/guides/tls.html">Configuring TLS</a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you thoughts on "Securing Nomad" instead of "Configuring TLS"? The guide covers more than TLS, since it includes gossip, and I think we should optimize for people's googling.

If you agree, I'd also recommend renaming the file from /guides/tls.html.md to /guides/securing-nomad.html.md or /guides/securing-nomad-tls-encryption.html.md.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I went back and forth a lot on the title and headings that refer to TLS.

Reasons for keeping "tls":

  • serf is like 5% of this guide and no one is even thinking about it unless they're already configuring TLS.
  • "Securing Nomad" is a much broader topic than this guide covers. There are all kinds of system hardening, config options, etc that go into making Nomad "secure" in a generic sense. I would hate to imply if you follow this guide you've secured Nomad.

Reasons for "securing":

  • serf isn't tls
  • better seo?
  • "Securing Nomad" sounds more appealing

🤷‍♀️ Anybody have a strong opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better SEO is my optimization. In the future, it would be very easy to move securing-nomad.html to securing-nomad/index.html and then have securing-nomad/traffic.html and securing-nomad/production-hardening.html, etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a strong opinion, just a mild one: 😄
anything would be fine as long as the keywords on a web search will give this page to a newbie searching for configuration options.
keywords: TLS, secure, security guide, SSL, certificate, etc. etc.

Keep up the great work! 👍 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shantanugadgil Thanks!

@sethvargo Renamed and merging! If you get a chance to peek at it again let me know if my changes look good. Thanks!

* hclfmt the world
  * 2 space indent
  * make every example well formed with stanzas and comments
* jsonfmt too
* mdfmt manually
  * _ instead of *
  * no [links][], only [links][links]
* ordered lists instead of bullets when possible
* lots of wording fixes
* de-contractionization
* add 127.0.0.1 to SANs
* -1 on intentional errors
* -1 on first person
jvoorhis
jvoorhis previously approved these changes Aug 8, 2017
Copy link
Contributor

@jvoorhis jvoorhis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schmichael This looks good! I left a couple of non-blocking suggestions, and sanity checked the output of cfssl, which seems up to the task.


Alternatively, you can use any method that base64 encodes 16 random bytes:

```text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

$ openssl rand -base64 16
raZjciP8vikXng2S5X0m9w==

```text
$ dd if=/dev/urandom bs=16 count=1 status=none | base64
LsuYyj93KVfT3pAJPMMCgA==
$ python -c 'import base64; print base64.b64encode(open("/dev/urandom").read(16))'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest dropping the python2-specific example. It's no longer guaranteed to run everywhere, and is more complex than the openssl method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more examples the better, I think!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with @jvoorhis. I added the openssl example. I think people will get the point that base64'ing 16 bytes of entropy can be accomplished using just about anything.

Copy link
Contributor

@jvoorhis jvoorhis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing cfssl defaults...

@jvoorhis jvoorhis dismissed their stale review August 8, 2017 05:58

Reviewing cfssl defaults...

and not a public one like [Let's Encrypt][letsencrypt] as any certificate
signed by this CA will be allowed to communicate with the cluster.

~> Nomad certificates may be signed by different intermediate CAs as long as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence seems confused. Rather than loading the intermediate certs along the chain into the ca_file of all hosts, the intermediate certs should be appended to the cert_file, as noted in https://golang.org/pkg/crypto/tls/#LoadX509KeyPair.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intermediates always trip me up. Changed to this but please let me know if you have better wording:

~> Nomad certificates may be signed by intermediate CAs as long as the root CA
   is the same. Append all intermediate CAs to the `cert_file`.

respond as expected:

```text
$ nomad node-status
Copy link

@pznamensky pznamensky Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found that at least nomad alloc-status $alloc-id doesn't work as expected (for me).

~ $ nomad alloc-status e90647d0
ID = e90647d0
Eval ID = f693f405
....

Couldn't retrieve stats (HINT: ensure Client.Advertise.HTTP is set): Get https://$alloc-ip:4646/v1/client/allocation/e90647d0-6dda-545b-4c48-69094ec65b06/stats: x509: certificate is valid for 127.0.0.1, not $alloc-ip

Task "redis" is "running"
Task Resources
CPU Memory Disk IOPS Addresses
500 MHz 256 MiB 300 MiB 0 db: $alloc-ip:27590

Task Events:
...

This doesn't work in case of using cli or client certs.
Is this expected behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that's probably the same bug as #2490. I filed #3013 and will try to get it fixed for 0.6.1.

Thanks for testing it out!

@schmichael schmichael merged commit be36bfb into master Aug 12, 2017
@schmichael schmichael deleted the d-tls-guide branch August 12, 2017 04:03
schmichael added a commit that referenced this pull request Aug 17, 2017
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants