Skip to content

Commit

Permalink
site: add troubleshooting doc for unready Envoy (#4970)
Browse files Browse the repository at this point in the history
Also recommends setting resource requests
on containers.

Updates #4851.

Signed-off-by: Steve Kriss <[email protected]>
  • Loading branch information
skriss authored Jan 10, 2023
1 parent de4c25c commit 02ff5b4
Show file tree
Hide file tree
Showing 4 changed files with 48 additions and 0 deletions.
13 changes: 13 additions & 0 deletions site/content/docs/main/deploy-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,19 @@ This secret can be auto-generated by the Contour `certgen` job or provided by an
Traffic must be forwarded to Envoy, typically via a Service of `type: LoadBalancer`.
All other requirements such as RBAC permissions, configuration details, are provided or have good defaults for most installations.

### Setting resource requests and limits

It is recommended that resource requests and limits be set on all Contour and Envoy containers.
The example YAML manifests used in the [Getting Started][8] guide do not include these, because the appropriate values can vary widely from user to user.
The table below summarizes the Contour and Envoy containers, and provides some reasonable resource requests to start with (note that these should be adjusted based on observed usage and expected load):

| Workload | Container | Request (mem) | Request (cpu) |
| ------------------- | ---------------- | ------------- | ------------- |
| deployment/contour | contour | 128Mi | 250m |
| daemonset/envoy | envoy | 256Mi | 500m |
| daemonset/envoy | shutdown-manager | 50Mi | 25m |


### Envoy as Daemonset

The recommended installation is for Contour to run as a Deployment and Envoy to run as a Daemonset.
Expand Down
4 changes: 4 additions & 0 deletions site/content/docs/main/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ Learn how to profile Contour by using [net/http/pprof][11] handlers.
### [Contour Operator][8]
Follow the linked guide to learn how to troubleshoot issues with [Contour Operator][12].

### [Envoy container stuck in unready/draining state][13]
Read the linked document if you have Envoy containers stuck in an unready/draining state.

[0]: {{< param github_url >}}/issues
[1]: {{< param slack_url >}}
[2]: /docs/{{< param latest_version >}}/troubleshooting/envoy-admin-interface/
Expand All @@ -36,3 +39,4 @@ Follow the linked guide to learn how to troubleshoot issues with [Contour Operat
[10]: https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol
[11]: https://golang.org/pkg/net/http/pprof/
[12]: https://github.com/projectcontour/contour-operator
[13]: /docs/{{< param latest_version >}}/troubleshooting/envoy-container-draining/
29 changes: 29 additions & 0 deletions site/content/docs/main/troubleshooting/envoy-container-draining.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Envoy container stuck in unready/draining state

It's possible for the Envoy containers to become stuck in an unready/draining state.
This is an unintended side effect of the shutdown-manager sidecar container being restarted by the kubelet.
For more details on exactly how this happens, see [this issue][1].

If you observe Envoy containers in this state, you should `kubectl delete` them to allow new Pods to be created to replace them.

To make this issue less likely to occur, you should:
- ensure you have [resource requests][2] on all your containers
- ensure you do **not** have a liveness probe on the shutdown-manager sidecar container in the envoy daemonset (this was removed from the example YAML in Contour 1.24.0).

If the above are not sufficient for preventing the issue, you may also add a liveness probe to the envoy container itself, like the following:

```yaml
livenessProbe:
httpGet:
path: /ready
port: 8002
initialDelaySeconds: 15
periodSeconds: 5
failureThreshold: 6
```
This will cause the kubelet to restart the envoy container if it does get stuck in this state, resulting in a return to normal operations load balancing traffic.
Note that in this case, it's possible that a graceful drain of connections may or may not occur, depending on the exact sequence of operations that preceded the envoy container failing the liveness probe.
[1]: https://github.com/projectcontour/contour/issues/4851
[2]: /docs/{{< param latest_version >}}/deploy-options/#setting-resource-requests-and-limits
2 changes: 2 additions & 0 deletions site/data/docs/main-toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,8 @@ toc:
url: /troubleshooting/profiling-contour
- page: Contour Operator
url: /troubleshooting/operator
- page: Envoy Container Stuck in Unready State
url: /troubleshooting/envoy-container-draining
- title: Resources
subfolderitems:
- page: Support Policy
Expand Down

0 comments on commit 02ff5b4

Please sign in to comment.