diff --git a/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/endpointslice-overview.png b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/endpointslice-overview.png new file mode 100644 index 0000000000000..c6cdbef25ff99 Binary files /dev/null and b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/endpointslice-overview.png differ diff --git a/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/endpointslice-with-terminating-pod.png b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/endpointslice-with-terminating-pod.png new file mode 100644 index 0000000000000..b5a516a01d2d0 Binary files /dev/null and b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/endpointslice-with-terminating-pod.png differ diff --git a/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/index.md b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/index.md new file mode 100644 index 0000000000000..91ecd167ccf6f --- /dev/null +++ b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/index.md @@ -0,0 +1,117 @@ +--- +layout: blog +title: "Kubernetes v1.26: Advancements in Kubernetes Traffic Engineering" +date: 2022-12-30 +slug: advancements-in-kubernetes-traffic-engineering +--- + +**Authors:** Andrew Sy Kim (Google) + +Kubernetes v1.26 includes significant advancements in network traffic engineering with the graduation of +two features (Service internal traffic policy support, and EndpointSlice terminating conditions) to GA, +and a third feature (Proxy terminating endpoints) to beta. The combination of these enhancements aims +to address short-comings in traffic engineering that people face today, and unlock new capabilities for the future. + +## Traffic Loss from Load Balancers During Rolling Updates + +Prior to Kubernetes v1.26, clusters could experience [loss of traffic](https://github.com/kubernetes/kubernetes/issues/85643) +from Service load balancers during rolling updates when setting the `externalTrafficPolicy` field to `Local`. +There are a lot of moving parts at play here so a quick overview of how Kubernetes manages load balancers might help! + +In Kubernetes, you can create a Service with `type: LoadBalancer` to expose an application externally with a load balancer. +The load balancer implementation varies between clusters and platforms, but the Service provides a generic abstraction +representing the load balancer that is consistent across all Kubernetes installations. + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: my-service +spec: + selector: + app.kubernetes.io/name: my-app + ports: + - protocol: TCP + port: 80 + targetPort: 9376 + type: LoadBalancer +``` + +Under the hood, Kubernetes allocates a NodePort for the Service, which is then used by kube-proxy to provide a +network data path from the NodePort to the Pod. A controller will then add all available Nodes in the cluster +to the load balancer’s backend pool, using the designated NodePort for the Service as the backend target port. + +{{< figure src="traffic-engineering-service-load-balancer.png" caption="Figure 1: Overview of Service load balancers" >}} + +Oftentimes it is beneficial to set `externalTrafficPolicy: Local` for Services, to avoid extra hops between +Nodes that are not running healthy Pods backing that Service. When using `externalTrafficPolicy: Local`, +an additional NodePort is allocated for health checking purposes, such that Nodes that do not contain healthy +Pods are excluded from the backend pool for a load balancer. + +{{< figure src="traffic-engineering-lb-healthy.png" caption="Figure 2: Load balancer traffic to a healthy Node, when externalTrafficPolicy is Local" >}} + +One such scenario where traffic can be lost is when a Node loses all Pods for a Service, +but the external load balancer has not probed the health check NodePort yet. The likelihood of this situation +is largely dependent on the health checking interval configured on the load balancer. The larger the interval, +the more likely this will happen, since the load balancer will continue to send traffic to a node +even after kube-proxy has removed forwarding rules for that Service. This also occurrs when Pods start terminating +during rolling updates. Since Kubernetes does not consider terminating Pods as “Ready”, traffic can be loss +when there are only terminating Pods on any given Node during a rolling update. + +{{< figure src="traffic-engineering-lb-without-proxy-terminating-endpoints.png" caption="Figure 3: Load balancer traffic to terminating endpoints, when externalTrafficPolicy is Local" >}} + +Starting in Kubernetes v1.26, kube-proxy enables the `ProxyTerminatingEndpoints` feature by default, which +adds automatic failover and routing to terminating endpoints in scenarios where the traffic would otherwise +be dropped. More specifically, when there is a rolling update and a Node only contains terminating Pods, +kube-proxy will route traffic to the terminating Pods based on their readiness. In addition, kube-proxy will +actively fail the health check NodePort if there are only terminating Pods available. By doing so, +kube-proxy alerts the external load balancer that new connections should not be sent to that Node but will +gracefully handle requests for existing connections. + +{{< figure src="traffic-engineering-lb-with-proxy-terminating-endpoints.png" caption="Figure 4: Load Balancer traffic to terminating endpoints with ProxyTerminatingEndpoints enabled, when externalTrafficPolicy is Local" >}} + +### EndpointSlice Conditions + +In order to support this new capability in kube-proxy, the EndpointSlice API introduced new conditions for endpoints: +`serving` and `terminating`. + +{{< figure src="endpointslice-overview.png" caption="Figure 5: Overview of EndpointSlice conditions" >}} + +The `serving` condition is semantically identical to `ready`, except that it can be `true` or `false` +while a Pod is terminating, unlike `ready` which will always be `false` for terminating Pods for compatibility reasons. +The `terminating` condition is true for Pods undergoing termination (non-empty deletionTimestamp), false otherwise. + +The addition of these two conditions enables consumers of this API to understand Pod states that were previously not possible. +For example, we can now track "ready" and "not ready" Pods that are also terminating. + +{{< figure src="endpointslice-with-terminating-pod.png" caption="Figure 6: EndpointSlice conditions with a terminating Pod" >}} + +Consumers of the EndpointSlice API, such as Kube-proxy and Ingress Controllers, can now use these conditions to coordinate connection draining +events, by continuing to forward traffic for existing connections but rerouting new connections to other non-terminating endpoints. + +## Optimizing Internal Node-Local Traffic + +Similar to how Services can set `externalTrafficPolicy: Local` to avoid extra hops for externally sourced traffic, Kubernetes +now supports `internalTrafficPolicy: Local`, to enable the same optimization for traffic originating within the cluster, specifically +for traffic using the Service Cluster IP as the destination address. This feature graduated to Beta in Kubernetes v1.24 and is graduating to GA in v1.26. + +Services default the `internalTrafficPolicy` field to `Cluster`, where traffic is randomly distributed to all endpoints. + +{{< figure src="service-internal-traffic-policy-cluster.png" caption="Figure 7: Service routing when internalTrafficPolicy is Cluster" >}} + +When `internalTrafficPolicy` is set to `Local`, kube-proxy will forward internal traffic for a Service only if there is an available endpoint +that is local to the same Node. + +{{< figure src="service-internal-traffic-policy-local.png" caption="Figure 8: Service routing when internalTrafficPolicy is Local" >}} + +{{< caution >}} +When using `internalTrafficPoliy: Local`, traffic will be dropped by kube-proxy when no local endpoints are available. +{{< /caution >}} + +## Getting Involved + +If you're interested in future discussions on Kubernetes traffic engineering, you can get involved in SIG Network through the following ways: +* Slack: [#sig-network](https://kubernetes.slack.com/messages/sig-network) +* [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-network) +* [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/sig%2Fnetwork) +* [Biweekly meetings](https://github.com/kubernetes/community/tree/master/sig-network#meetings) diff --git a/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/service-internal-traffic-policy-cluster.png b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/service-internal-traffic-policy-cluster.png new file mode 100644 index 0000000000000..e0f477aa2e39e Binary files /dev/null and b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/service-internal-traffic-policy-cluster.png differ diff --git a/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/service-internal-traffic-policy-local.png b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/service-internal-traffic-policy-local.png new file mode 100644 index 0000000000000..407a0db0ed8f8 Binary files /dev/null and b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/service-internal-traffic-policy-local.png differ diff --git a/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/traffic-engineering-lb-healthy.png b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/traffic-engineering-lb-healthy.png new file mode 100644 index 0000000000000..74ac7f4f5c931 Binary files /dev/null and b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/traffic-engineering-lb-healthy.png differ diff --git a/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/traffic-engineering-lb-with-proxy-terminating-endpoints.png b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/traffic-engineering-lb-with-proxy-terminating-endpoints.png new file mode 100644 index 0000000000000..0faa5d960a526 Binary files /dev/null and b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/traffic-engineering-lb-with-proxy-terminating-endpoints.png differ diff --git a/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/traffic-engineering-lb-without-proxy-terminating-endpoints.png b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/traffic-engineering-lb-without-proxy-terminating-endpoints.png new file mode 100644 index 0000000000000..43db9c9efb9a6 Binary files /dev/null and b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/traffic-engineering-lb-without-proxy-terminating-endpoints.png differ diff --git a/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/traffic-engineering-service-load-balancer.png b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/traffic-engineering-service-load-balancer.png new file mode 100644 index 0000000000000..a4e58c6207cb3 Binary files /dev/null and b/content/en/blog/_posts/2022-12-30-advancements-in-traffic-engineering/traffic-engineering-service-load-balancer.png differ