Add docs for Envoy shutdown manager

Signed-off-by: Steve Sloka <[email protected]>
projectcontour · Feb 13, 2020 · fd372ff · fd372ff
1 parent 81f421a
commit fd372ff
Show file tree

Hide file tree

Showing 2 changed files with 75 additions and 0 deletions.
diff --git a/site/_data/master-toc.yml b/site/_data/master-toc.yml
@@ -23,6 +23,8 @@ toc:
         link: /resources/upgrading
       - page: Enabling TLS between Envoy and Contour
         url: /grpc-tls-howto
+      - page: Envoy Shutdown Manager
+        url: /shutdown-manager
   - title: Guides
     subfolderitems:
       - page: Cert-Manager

diff --git a/site/docs/master/shutdown-manager.md b/site/docs/master/shutdown-manager.md
@@ -0,0 +1,73 @@
+# Envoy Shutdown Manager
+
+The Envoy process, the data path component of Contour, at times needs to be re-deployed.
+This could be due to an upgrade, a change in configuration, or a node-failure forcing a redeployment.
+
+When implementing this roll out, the following steps should be taken: 
+
+1. Stop Envoy from accepting new connections 
+2. Start draining existing connections in Envoy by sending a `POST` request to `/healthcheck/fail` endpoint
+3. Wait for connections to drain before allowing Kubernetes to `SIGTERM` the pod
+
+## Overview
+
+Contour implements a new `envoy` sub-command which has a `shutdown-manager` who's job is to manage a single Envoy instances lifecycle for Kubernetes.
+The `shutdown-maanger` runs as a new container alongside the Envoy container in the same pod.
+It exposes two HTTP endpoints which are used for `livenessProbe` as well as to handle the Kubernetes `preStop` event hook.
+
+- **livenessProbe**: Uses to validate the shutdown manager is still running properly. If requests to `/healthz` fail, the container will be restarted.
+- **preStop**: This is used to keep the container running while waiting for Envoy to drain connections. The `/shutdown` endpoint blocks until the connections are drained.
+
+```yaml
+ - name: shutdown-manager
+   command:
+   - /bin/contour
+   args:
+     - envoy
+     - shutdown-manager
+   image: docker.io/projectcontour/contour:master
+   imagePullPolicy: Always
+   lifecycle:
+     preStop:
+       httpGet:
+         path: /shutdown
+         port: 8090
+         scheme: HTTP
+   livenessProbe:
+     httpGet:
+       path: /healthz
+       port: 8090
+     initialDelaySeconds: 3
+     periodSeconds: 10  
+```
+
+The Envoy container also has some configuration to implement the shutdown manager.
+First the `preStop` hook is configured to use the `/shutdown` endpoint which blocks the container from exiting.
+Finally, the pod's `terminationGracePeriodSeconds` is customized to extend the time in which Kubernetes will allow the pod to be in the `Terminating` state.
+If during shutdown, the connections aren't drained to the configured amount, the `terminationGracePeriodSeconds` will send a `SIGTERM` to the pod killing it.
+
+### Shutdown Manager Config Options
+
+The shutdown manager has a set of arguments that can be passed to change how it behaves:
+
+- **check-interval:** [duration] Time to poll Envoy for open connections.
+  - (Default 5s)
+- **check-delay:** [duration] Time wait before polling Envoy for open connections.
+  - (Default 60s)
+- **min-open-connections:** [int] Min number of open connections when polling Envoy.
+  - (Default 0)
+- **serve-port:** [int] Port to serve the http server on.
+  - (Default 8090)
+- **prometheus-path:** [string] The path to query Envoy's Prometheus HTTP Endpoint.
+  - (Default "/stats/prometheus")
+- **prometheus-stat:** [string] Prometheus stat to query.
+  - (Default "envoy_http_downstream_cx_active")
+- **prometheus-values:** [string array] Prometheus values to look for in prometheus-stat.
+  - (Default ["ingress_http", "ingress_https"])
+- **envoy-host:** [string] HTTP endpoint for Envoy's stats page.
+  - (Default "localhost")
+- **envoy-port:** [int] HTTP port for Envoy's stats page.
+  - (Default "9001")
+
+
+