-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Health check with dynamic configuration #2325
Comments
This was implemented here #2308 |
As far as I understand, with the quoted implementation, Healthz uri can return a 200 before all backends are loaded. |
@ElvinEfendi ping |
I'm trying to understand the issue. I can see that on boot readiness prob will pass as soon as controller POSTs default backend(or really any backend) to Lua land. And the k8s API might be sending more Ingress/Service updates from that point on incrementally(@valeriano-manassero I think that's what you're referring to with rolling update) - and between those increments ingress-nginx will already be ready to receive traffic and if there's a request to one of the backends that has not been POSTed to Lua client will see 503. So in this sense this is more of a general problem, not dynamic configuration specific - I can see that this might be causing an issue when for some reason an ingress-nginx pod restarts. But IMHO the changes introduced at NCCloud/fluid#24 is not fixing this issue - that implementation to me at least seems effectively the same as #2308. A more complete solution IMO would be to query the ingresses from k8s API on health check and pass the respective backend names to a specific Lua health check endpoint where it asserts that all those backends have been registered in the shared dictionary in Lua. @valeriano-manassero could you come up with an e2e regression test to show this behaviour? Or at least step by step guide for us to regenerate it - that would be very valuable before spending more time on the solution. |
@ElvinEfendi I'm sorry but, right now I'm very busy and I don't have time to come up with an e2e regression test but you can show this behavior simply getting something like 10k to 20k backends and then update the deployment. |
No worries! I'll try that when I have some time.
@valeriano-manassero but this is exactly what @aledbf did at https://github.com/kubernetes/ingress-nginx/pull/2308/files, no? |
the ingress controller deployment? |
I think so yeah, in order to trigger a restart |
Yep, for example, I see the issue updating to latest version of Fluid in my environment. Many customers get some 503 for some seconds during rollover update and that's because readinessProbe was already getting 200 but the new pods were not ready to serve all the backends. |
For that scenario, I want to make a refactoring to split the go code from nginx.
After the next release, I will work on this. Right now I am looking how to secure the communication between this two deployments (like using https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/) |
@aledbf Hi. Thanks for maintaining nginx ingress! Just wondered if we really need the secure communication thing. How about ingress controller just periodically writes rendered a nginx.conf and a dynamic config into a configmap? Then the nginx pod could mount the configmap as a volume, watches for change with inotify/fsnotify, and on update either run nginx-reload or feed the config via Lua? |
I thought in that and the issue I see is the permissions over that configmap. We not only need to dump the nginx.conf but also the SSL certificates |
Closing. This can also occur with the "normal" reload mode. We cannot block serving traffic. |
Readiness probe must return 200 only when first LUA config load is ok.
Feature added here: NCCloud/fluid#24
The text was updated successfully, but these errors were encountered: