Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HTTPS/RPC/gossip encryption and setup script #50

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
secrets/*
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
_env*
examples/triton-multi-dc/docker-compose-*.yml
secrets/*
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ RUN export CONSUL_CHECKSUM=585782e1fb25a2096e1776e2da206866b1d9e1f10b71317e682e0
&& rm /tmp/${archive}

# Add Containerpilot and set its configuration
ENV CONTAINERPILOT_VER=3.6.0
ENV CONTAINERPILOT_VER=3.6.1
ENV CONTAINERPILOT=/etc/containerpilot.json5
RUN export CONTAINERPILOT_CHECKSUM=1248784ff475e6fda69ebf7a2136adbfb902f74b \
RUN export CONTAINERPILOT_CHECKSUM=57857530356708e9e8672d133b3126511fb785ab \
&& curl -Lso /tmp/containerpilot.tar.gz \
"https://github.com/joyent/containerpilot/releases/download/${CONTAINERPILOT_VER}/containerpilot-${CONTAINERPILOT_VER}.tar.gz" \
&& echo "${CONTAINERPILOT_CHECKSUM} /tmp/containerpilot.tar.gz" | sha1sum -c \
Expand Down
48 changes: 47 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,10 @@ Note: the `cns.joyent.com` hostnames cannot be resolved from outside the datacen
- `8300`: Server RPC port (TCP)
- `8302`: Serf WAN gossip port (TCP + UDP)

- `CONSUL_TLS_PATH`: Specifies the location of a directory which will contain the TLS key file, certificate, and root certificate. See the section on [securing Consul](#consul-encryption) for more details.

- `CONSUL_ENCRYPT`: Secret key used for encrypting gossip. Consul flag: [`-encrypt`](https://www.consul.io/docs/agent/options.html#_encrypt). See the section on [securing Consul](#consul-encryption) for more details.

## Using this in your own composition

There are two ways to run Consul and both come into play when deploying ContainerPilot, a cluster of Consul servers and individual Consul client agents.
Expand Down Expand Up @@ -184,7 +188,49 @@ Some details about how Docker containers work on Triton have specific bearing on

Consul supports TLS encryption for RPC and symmetric pre-shared key encryption for its gossip protocol. Deploying these features requires managing these secrets, and a demonstration of how to do so can be found in the [Vault example](https://github.com/autopilotpattern/vault).

### Testing
### Configuration

The `CONSUL_TLS_PATH` environment variable will be checked on startup and is used to indicate that TLS should be configured. If it is defined the container will await the creation of the directory specified in `CONSUL_TLS_PATH` and expect the directory to contain a CA certificate along with a datacenter-specific certificate and key. These files will be used to configure `ca_cert`, `cert_file` and `key_file` respectively in Consul, in addition to enabling both `verify_outgoing` and `verify_incoming`. The secret key used for gossip traffic can be provided directly as the environment variable `CONSUL_ENCRYPT`.

The `./setup.sh` and `./setup-multi-dc.sh` scripts both accept `-t/--tls-path` and `-g/--gossip-path` parameters to set `CONSUL_TLS_PATH` and `CONSUL_ENCRYPT` environment variables respectively. Note that `--tls-path` only specifies _where_ the key material will be injected. Deployed containers will remain idle until certificates have been installed by `./setup-encryption.sh upload`

### Generating certificates

In order to simplify certificate generation a `Dockerfile` can be found within the `ca` directory which creates a Certificate Authority on build. Use `./setup-encryption.sh build -i <image-name>` to build the container. This same image name can then be used with `./setup-encryption.sh generate -i <image-name> -d <datacenter-name> -g <gossip-filename>` to generate certificates.

### Installing certificates

When `CONSUL_TLS_PATH` is specified, the `preStart` ContainerPilot job awaits the creation of the relevant certificates and key (i.e. `CONSUL_CACERT`, `CONSUL_CLIENT_CERT`, `CONSUL_CLIENT_KEY`) and uses the [ContainerPilot Control plane](https://www.joyent.com/containerpilot/docs/configuration/control-plane) from within the job to `-putenv` and `-reload` ContainerPilot itself. Without the `-putenv` calls to set the certificates and key, the `preStart` job would see `CONSUL_TLS_PATH` and attempt to restart ContainerPilot indefinitely.

Certificates and the private key can be installed in running containers with `./setup-encryption.sh upload -d <datacenter-name> -t <CONSUL_TLS_PATH>`. Note that `-d` is only used to select the directory containing the key material, you must still run `eval "$(triton env -d)"` with the relevant datacenter's profile in order to target the correct docker endpoint.

The `upload` command will assume the default `docker-compose` file (`./docker-compose.yml`) and project name (the current working directory), reading `COMPOSE_FILE` and `COMPOSE_PROJECT_NAME` if they are defined, but can be overriden with `-f` and `-p` in the same way as `docker-compose` itself.

### Encrypting gossip

The `CONSUL_ENCRYPT` parameter can be passed to encrypt gossip traffic. Use `./setup-encryption.sh generate -g <filename>` to generate a file in the `secrets` directory with the provided name. For local deployments, simply copy the contents of the generated file as an environment variable in `examples/compose/docker-compose.yml`. For Triton deployments, the setup scripts accept a `-g` parameter to specify a relative path to a gossip file (e.g. `examples/triton/setup.sh -g ../../secrets/gossip`) and will inject the contents of the gossip key file as `CONSUL_ENCRYPT` in the relevant `_env` file.

### How do I know if it's working?

Assuming you've spun up `examples/compose/docker-compose.yml` after generating a certificate for the "dc1" datacenter (which would imply the `secrets/dc1` directory was generated) then you'll notice commands fail with odd HTTP responses unless the correct certificates and key are supplied:

```
$ docker-compose exec consul consul info -client-cert=/ssl/dc1.crt -client-key=/ssl/dc1.key -ca-file=/ssl/ca.crt
Error querying agent: Get http://127.0.0.1:8500/v1/agent/self: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02"

$ docker-compose exec consul consul info -client-cert=/ssl/dc1.crt -client-key=/ssl/dc1.key -http-addr=https://consul:8500
Error querying agent: Get https://consul:8500/v1/agent/self: x509: certificate signed by unknown authority

$ docker-compose exec consul consul info -client-cert=/ssl/dc1.crt -ca-file=/ssl/ca.crt -http-addr=https://consul:8500
Error querying agent: Get https://consul:8500/v1/agent/self: remote error: tls: bad certificate

# with everything in place
$ docker-compose exec consul consul members -client-cert=/ssl/dc1.crt -client-key=/ssl/dc1.key -ca-file=/ssl/ca.crt -http-addr=https://consul:8500
Node Address Status Type Build Protocol DC Segment
01e297f34346 172.23.0.2:8301 alive server 1.0.0 2 dc1 <all>
```

## Testing

The `tests/` directory includes integration tests for both the Triton and Compose example stacks described above. Build the test runner by making sure you've pulled down the submodule with `git submodule update --init` and then `make build/tester`.

Expand Down
159 changes: 156 additions & 3 deletions bin/consul-manage
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,14 @@ preStart() {
sed -i "s/CONSUL_DATACENTER_NAME/${CONSUL_DATACENTER_NAME}/" /etc/consul/consul.hcl
elif [ -f "/native/usr/sbin/mdata-get" ]; then
DETECTED_DATACENTER_NAME=$(/native/usr/sbin/mdata-get sdc:datacenter_name)
# re-export so it can be used later in this script
export CONSUL_DATACENTER_NAME=$DETECTED_DATACENTER_NAME
_log "Updating consul datacenter name (detected: '${DETECTED_DATACENTER_NAME}')"
sed -i "s/CONSUL_DATACENTER_NAME/${DETECTED_DATACENTER_NAME}/" /etc/consul/consul.hcl
else
_log "Updating consul datacenter name (default: 'dc1')"
# re-export so it can be used later in this script
export CONSUL_DATACENTER_NAME=dc1
sed -i "s/CONSUL_DATACENTER_NAME/dc1/" /etc/consul/consul.hcl
fi

Expand Down Expand Up @@ -58,6 +62,124 @@ preStart() {

# advertise_addr_wan tells nodes their public address for WAN communication
updateConfigFromEnvOrDefault 'advertise_addr_wan' 'CONSUL_ADVERTISE_ADDR_WAN' "$IP_ADDRESS"

# there's no consul env for this
if [ -n "$CONSUL_ENCRYPT" ]; then
sed -i '/^encrypt =/d' /etc/consul/consul.hcl
_log "Updating consul translate_wan_addrs field"
echo "encrypt = \"$CONSUL_ENCRYPT\"" >> /etc/consul/consul.hcl
else
_log "Skipping gossip encryption configuration"
fi

# this block looks for the files used for enabling TLS support in CONSUL_TLS_PATH
# and populates the configs in Consul, but not containerpilot
if [ -n "$CONSUL_TLS_PATH" ] && [ -z "$CONSUL_CACERT$CONSUL_CLIENT_CERT$CONSUL_CLIENT_KEY"]; then

# notice we are intentionally not exporting these as envs
# nor are we using containerpilot -putenv
local consul_cacert="$CONSUL_TLS_PATH/ca.crt"
local consul_client_cert="$CONSUL_TLS_PATH/$CONSUL_DATACENTER_NAME.crt"
local consul_client_key="$CONSUL_TLS_PATH/$CONSUL_DATACENTER_NAME.key"

until find "$consul_cacert" "$consul_client_cert" "$consul_client_key" &>/dev/null
do
sleep 5
_log "Still waiting for TLS key material at: CONSUL_CACERT=$consul_cacert CONSUL_CLIENT_CERT=$consul_client_cert CONSUL_CLIENT_KEY=$consul_client_key"
done

if [ -f "$consul_cacert" ] \
&& [ -f "$consul_client_cert" ] \
&& [ -f "$consul_client_key" ]; then

echo "TLS files found. Updating consul configs: ca_file, cert_file, key_file, verify_outgoing, verify_incoming"

# these need to be set in the config since containerpilot looks at the same
# environment variables. containerpilot will crash if it's booting and these are missing

sed -i '/^ca_file/d' /etc/consul/consul.hcl
echo "ca_file = \"$(realpath $consul_cacert)\"" >> /etc/consul/consul.hcl
# /usr/local/bin/containerpilot -putenv "CONSUL_CACERT=$CONSUL_CACERT"

sed -i '/^cert_file/d' /etc/consul/consul.hcl
echo "cert_file = \"$(realpath $consul_client_cert)\"" >> /etc/consul/consul.hcl
# /usr/local/bin/containerpilot -putenv "CONSUL_CLIENT_CERT=$CONSUL_CLIENT_CERT"

sed -i '/^key_file/d' /etc/consul/consul.hcl
echo "key_file = \"$(realpath $consul_client_key)\"" >> /etc/consul/consul.hcl
# /usr/local/bin/containerpilot -putenv "CONSUL_CLIENT_KEY=$CONSUL_CLIENT_KEY"

# /usr/local/bin/containerpilot -putenv "CONSUL_HTTP_SSL=true"

# maybe just listen unencrypted on a private address?

sed -i '/^verify_outgoing =/d' /etc/consul/consul.hcl
echo "verify_outgoing = true" >> /etc/consul/consul.hcl

sed -i '/^verify_incoming =/d' /etc/consul/consul.hcl
echo "verify_incoming = true" >> /etc/consul/consul.hcl

sed -i '/^verify_incoming_rpc =/d' /etc/consul/consul.hcl
echo "verify_incoming_rpc = true" >> /etc/consul/consul.hcl

sed -i '/^verify_incoming_https =/d' /etc/consul/consul.hcl
echo "verify_incoming_https = true" >> /etc/consul/consul.hcl

# docs just before https://www.consul.io/docs/agent/options.html#configuration-key-reference say:

# Consul will not enable TLS for the HTTP API unless the
# https port has been assigned a port number > 0.

sed -i 's/^ HTTP_PORT/ http = 8500/' /etc/consul/consul.hcl
sed -i 's/^ HTTPS_PORT/ https = 8501/' /etc/consul/consul.hcl

# if TLS is being configured, we probably want to lock down HTTP to only localhost, or
# a private address. By default, (the empty string) we will listen on a private address,
# unless the user has requested otherwise (either with something falsy, or with "localhost"
#
# Leaving this unspecified and attempting to visit the web UI at the 8500 address (when
# encryption is set up correctly) will lead to `ERR_EMPTY_RESPONSE` or `curl: (52) Empty reply from server`
case "$CONSUL_TLS_PRIVATE_HTTP" in
0 | f | n | false | no)
sed -i 's/^ HTTP_ADDR/ http = "{{ GetPublicIP }}"/' /etc/consul/consul.hcl ;;
1 | t | y | true | yes | '')
sed -i 's/^ HTTP_ADDR/ http = "{{ GetPrivateIP }}"/' /etc/consul/consul.hcl ;;
localhost)
sed -i 's/^ HTTP_ADDR/ http = "127.0.0.1"/' /etc/consul/consul.hcl ;;
esac

sed -i 's/^ HTTPS_ADDR/ https = "{{ GetPublicIP }}"/' /etc/consul/consul.hcl

## we should'nt need to do this if we're not giving the certs to ContainerPilot
# echo "Attempting to reload"

# /usr/local/bin/containerpilot -reload
else
# TODO: not sure what do to here
echo "TLS files missing from TLS directory. Exiting!"
exit 1
fi
else
_log "Skipping RPC server TLS configuration"

# remove HTTPS placeholder line and set http address to 8500
sed -i '/^ HTTPS_PORT/d' /etc/consul/consul.hcl
sed -i 's/^ HTTP_PORT/ http = 8500/' /etc/consul/consul.hcl
fi
}

preStop() {
echo " ~~~ preStop ~~~"

if consul info &>/dev/null; then
consul leave
else
echo "We're still bootstrapping, probably."
fi
}

postStop() {
echo " ~~~ postStop ~~~"
}

#
Expand All @@ -72,17 +194,48 @@ preStart() {
# we've got the whole cluster together.
#
health() {
if [ $(consul info | awk '/num_peers/{print$3}') == 0 ]; then
local consul_args=

if [ -z "${CONSUL}" ]; then
echo "CONSUL env was not defined."
exit 1
fi

## TODO: read either:
# - CONSUL_TLS_PATH
# or
# - CONSUL_CACERT
# - CONSUL_CLIENT_CERT
# - CONSUL_CLIENT_KEY
# to prepare consul_args
#
# if [ -n "$CONSUL_TLS_PATH" ]; then
# consul_args="$consul_args -ca-file=$CONSUL_CACERT"
# consul_args="$consul_args -client-cert=$CONSUL_CLIENT_CERT"
# consul_args="$consul_args -client-key=$CONSUL_CLIENT_KEY"
#
# if [[ $CONSUL != "https"* ]]; then
# consul_args="$consul_args -http-addr=https://$CONSUL:8500"
# fi
# fi

local info_output=$(consul info $consul_args)

if [ -z "$info_output" ]; then
_log "Healtcheck failed while collecting info."
exit 1
fi

if [ $(echo $info_output | awk '/num_peers/{print$3}') == 0 ]; then
_log "No peers in raft"
consul join ${CONSUL}
consul join $consul_args ${CONSUL}
fi
}

_log() {
echo " $(date -u '+%Y-%m-%d %H:%M:%S') containerpilot: $@"
}


#
# Defines $1 in the consul configuration as either an env or a default.
# This basically behaves like ${!name_of_var} and ${var:-default} together
Expand Down
Loading