-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingress controller not loaded backend changes in time #3335
Comments
@jasonwangnanjing how long of latency are we talking here? The expected delay in update is:
If you're seeing latency of >5s then there's something wrong somewhere, otherwise everything works as expected you just need to make sure your app shutdown gracefully. Kubernetes will send SIGTERM when you delete your app, some apps exits immediately on this signal but they should exit gracefully after draining in flight requests and also waiting around for ~3s after that and only then exit. I'd suggest you add following in your Deployment spec if you don't have it yet:
|
@ElvinEfendi I1108 06:35:24.171423 8 controller.go:175] Changes handled by the dynamic configuration, skipping backend reload. |
@jasonwangnanjing ingress-nginx relies on k8s readiness probes, have you confirmed that those pods actually marked as "Not Ready"? |
This looks like #3070 |
We are seeing similar, even after implementing the suggested sleep preStop hook here, i.e., the ingress controller is trying to send traffic to an upstream pod that is marked as 'Terminating' following a rollingUpdate deployment. It only does so for a small period of time, but it certainly is repeatable. Struggling to find where to begin with this, but happy to help debug further. @ElvinEfendi any suggestions for what could assist in debugging? My understanding is that the ingress controller should not be sending traffic to these pods in Terminating status, as they are not active on the service endpoint anymore |
@timm088 first thing I'd check would be to make sure you aren't seeing "Dynamic reconfiguration failed" warning messages. I'd also run ingress-nginx with v=2 flag and check for "Dynamic reconfiguration succeeded" at the time when your pod gets into "terminating" state. Normally when your pod turns into "terminating" state, ingress-nginx should log "Dynamic reconfiguration succeeded" - you can measure the latency here and adjust At a given time you can also inspect what endpoints Nginx has in its memory by looking at
|
Interesting, i'll give that a go and come back. Definitely not seeing 'Dynamic reconfiguration failed' messages. I'm not quite understanding how the preStop hook helps the ingress controller here...
So during the 'terminating' phase, the ingress controller has (1s by default, or 4s if we implement the 3s sleep hook ... likely faster depending on where it is on the 1s counter) to update backends, removing the terminating pods as active endpoints? I have some internal that receive >10 calls per second, and are somewhat critical to the stack, so getting timeouts on these calls is going to bubble 5xx back up the stack to users. |
* related to: * kubernetes#3070 * kubernetes#3335 * add a 503 test * test a service that starts out empty (a.k.a. ingress-nginx controller (re-)start) * test scaling up (should route traffic accordingly) * test scaling down to empty service * use custom deployments for scaling test. * provide a fix by updating the lua table (cache) of the configured backends to unset the backend if there are no endpoints available.
We are seeing a similar problem at my company as well. We are using version We are experimenting with zero-downtime upgrade of the Kubernetes Cluster while running load testing at the same time (i.e. 20 requests / second). The ingress controller returns 502 when it tries to redirect to one of the pods located in a machine we are running I understand that there exists some kind of race condition between when a pod is shutting down and the "un-ready" event is being handled by the ingress controller. But I don't think using a "sleep" is a good way of solving this problem. Is there no other event you can listen to that's being sent when a deletion of a pod is being sent? Or perhaps introduce some kind of configurable "retry" functionality for all idempotent (such as GET) requests? |
Please check https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#proxy-next-upstream The default value is |
This flag only works if we have more than one server, whatever that means, and it did not solve the problem for me. I can see that the nginx.conf file was updated with the a Now, it feels like i hijacked the original issue, so if this seems like a separate problem from the original question at the top then please let me know and I'll create a new issue on it. |
@perandersson please open a new issue :)
This is correct, if your pod dies, there is nothing nginx can do to avoid a 502 response. To avoid this you need to use probes in your app the sleep workaround. Another thing you can configure is |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
NGINX Ingress controller version:
quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.19.0
Kubernetes version (use
kubectl version
):1.11
Environment:
uname -a
):What happened:
when one pod deleted, Nginx did not reload backend in time. Which caused it still pass request to died pod.Then 502 error return to user.
What you expected to happen:
when one pod deleted, nginx should be updated in time to stop traffic to it.
How to reproduce it (as minimally and precisely as possible):
Define a service for more than one pods, then put service as backend service inside one ingress object.
After both pod running, delete one.
Anything else we need to know:
The text was updated successfully, but these errors were encountered: