Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add annotation to allow use of service ClusterIP #1878

Closed
wants to merge 1 commit into from

Conversation

rajivm
Copy link

@rajivm rajivm commented Jul 19, 2017

Equivalent to kubernetes/ingress-nginx#981

Added support for ingress.kubernetes.io/service-upstream Kubernetes ingress annotation that causes traefik to use ClusterIP as backend rather than endpoints API. Primary motivation is this allows for zero-downtime deployments as Kubernetes services respect readiness checks, which is lost when using the endpoints API directly.

From the doc update I wrote:

Use the service's clusterIP as the backend rather than resolve the pod IPs via the endpoints API and use them as individual backends. This will prevent sticky sessions from working and any specific loadbalancer algorithm as there will be only one backend for the service registered in traefik. The advantage is Kubernetes readiness checks will apply and rolling deployments to services can be graceful.

Use the service's clusterIP as the backend rather than resolve the pod IPs via the endpoints API and use them as individual backends. As this results in only a single backend, some features are lost:

  • Prevents sticky session
  • Specific load balancer algorithms
  • Retries will not work (proxy_next_upstream directive will have no effect)

@rajivm rajivm force-pushed the ingress-clusterip-annotation branch from 8125e8b to f94325b Compare July 19, 2017 10:51
@ldez ldez changed the title [kubernetes] Add annotation to allow use of service ClusterIP Add annotation to allow use of service ClusterIP Jul 19, 2017
@ldez ldez requested a review from a team July 19, 2017 13:35
docs/toml.md Outdated
@@ -1250,6 +1250,9 @@ As known from nginx when used as Kubernetes Ingress Controller, a List of IP-Ran

An unset or empty list allows all Source-IPs to access. If one of the Net-Specifications are invalid, the whole list is invalid and allows all Source-IPs to access.

- `ingress.kubernetes.io/service-upstream: "true"`

Use the service's clusterIP as the backend rather than resolve the pod IPs via the endpoints API and use them as individual backends. This will prevent sticky sessions from working and any specific loadbalancer algorithm as there will be only one backend for the service registered in traefik. The advantage is Kubernetes readiness checks will apply and rolling deployments to services can be zero-downtime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry may also break.

Wondering about this though:
The advantage is Kubernetes readiness checks will apply and rolling deployments to services can be zero-downtime.

Kubernetes only adds the endpoint to the service if the readiness checks pass. If you properly configure your readiness checks, traefik should have no issues with the rolling deployments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, traefik is automatically reacting to the endpoints API changes based on readiness (unlike the nginx-ingress). I am doing further testing now because this version seemingly eliminated the brief 502s we were seeing during deploy. I am suspecting that maybe just the clusterIP kube-proxy event reaction timing is faster.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajivm you forgot the most important part:

Known Issues

If the service-upstream annotation is specified the following things should be taken into consideration:

Sticky Sessions will not work as only round-robin load balancing is supported.
The proxy_next_upstream directive will not have any effect meaning on error the request will not be dispatched to another upstream.

so as @dtomcej said, retries are not possible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aledbf did not miss that, wanted to update the docs once I verified whether there was any truth to the zero-downtime advantage claim.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with:

Use the service's clusterIP as the backend rather than resolve the pod IPs via the endpoints API and use them as individual backends. 
As this results in only a single backend, some features are lost:
 - Prevents sticky session
 - Specific load balancer algorithms
 - Retries will not work (`proxy_next_upstream` directive will have no effect)

@ldez ldez requested a review from a team July 19, 2017 15:59
 - annotation uses ClusterIP as backend rather than endpoints API
 - updated documentation & tests
@timoreimann
Copy link
Contributor

timoreimann commented Jul 19, 2017

Services do not guarantee you zero-downtime deployments or graceful termination automatically.

Please see this blog post (which was motivated by a discussion that took place at kubernetes-retired/contrib#1140) for details. In a nutshell, kube-proxy (the component powering Services under the hood) is just another controller watching the endpoints API for updates, which makes it susceptible just like any other controller (including Traefik's) to the effects of unpredictable event ordering inherent to the distributed nature of Kubernetes. Specifically, there's always a possibility for the scenario to happen in which a pod is first terminated by a SIGTERM before kube-proxy has had a chance to remove its IP address from rotation. Within this time window, requests sent through a Service would be lost and (in the case of kube-proxy's iptables mode, which is the norm these days) never retried.

The only strategy you have to execute deployments reliably is to orchestrate them properly.

@rajivm
Copy link
Author

rajivm commented Jul 19, 2017

@timoreimann Yes, I realize this was not completely accurate. I removed the documentation text as a result. It would require making the backends live long enough while moved to the not-ready state. However, at least anecdotally, kube-proxy seems to process such events quicker than traefik today. I am looking into whether there are ways to make this more efficient in traefik's k8s event watch.

@timoreimann
Copy link
Contributor

@rajivm it's possible that kube-proxy and Traefik diverge in update performance; either in favor of one or the other depending on a number of factors. If Traefik has any performance gaps in this regard, I certainly would be happy to see those getting closed.

I'm still not convinced of the general idea to route via Services though. As described above, there's no guarantee you will reach zero-downtime deployments. Even if it works for some clusters at some point in time, it may only take a load/memory shift inside your cluster to influence whether the SIGTERM or kube-proxy are going to win the race. In addition, it's also depended on the affected pods and how quickly they react to termination requests.

At best, this looks like a probabilistic mitigation to a problem that could be solved with significant better reliability using graceful termination handlers and/or pre-stop hooks. You usually want those anyway to cover all the other cases where Kubernetes needs to shut down pods gracefully (e.g., node draining, auto-scaling, pod evictions, taint updates, ConfigMap/Secrets changes).

@rajivm
Copy link
Author

rajivm commented Jul 20, 2017

@timoreimann In full agreement at this point, if there is no demand for this annotation for other use cases, then we can certainly close this. I will continue to investigate ways to improve performance here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants