Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many config reloads #2596

Closed
Stono opened this issue Jun 2, 2018 · 9 comments · Fixed by #2598
Closed

Too many config reloads #2596

Stono opened this issue Jun 2, 2018 · 9 comments · Fixed by #2598

Comments

@Stono
Copy link
Contributor

Stono commented Jun 2, 2018

Hey hey @aledbf
Our internet ingress controllers will be handling circa 200 ingress hosts. Most of these apps continuously deploy thus updating the ingress metadata labels with the current release version, even if the ingress spec hasn't actually changed.

Unfortunately updates to ingress labels, still result in a backend reload in nginx:

I0602 18:16:30.610138       1 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"monkeynetes", Name:"monkeynetes-ingress-internal", UID:"7ce512f9-5a7e-11e8-a530-42010aa4009f", APIVersion:"extensions", ResourceVersion:"71252
13", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress monkeynetes/monkeynetes-ingress-internal
I0602 18:16:30.611405       1 controller.go:168] backend reload required
I0602 18:16:31.105323       1 controller.go:178] ingress backend successfully reloaded...

We only have about 10 services on here at the moment, and a backend reload takes a fair bit of time to validate such a big config. I'm concerned and trying to pre-empt when it gets to 200 or so, and they're happening a lot, we're going to hit some issues.

In summary, it'd be great if ingress-nginx only performed a backend reload when the spec section of the ingress object has changed, because that's the bit that really matters!

@aledbf
Copy link
Member

aledbf commented Jun 2, 2018

@Stono please add the flag --v=2 in the ingress controller deployment to see exactly what's triggering the reload. Changing labels should not be the reason why that's happening.

@Stono
Copy link
Contributor Author

Stono commented Jun 2, 2018

@aledbf it appears to be because of the order of the vhosts:

I0602 19:03:46.271884       1 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"monkeynetes", Name:"monkeynetes-ingress-internal", UID:"7ce512f9-5a7e-11e8-a530-42010aa4009f", APIVersion:"extensions", ResourceVersion:"71348
37", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress monkeynetes/monkeynetes-ingress-internal
I0602 19:03:46.273610       1 controller.go:168] backend reload required
I0602 19:03:46.273657       1 util.go:67] rlimit.max=1048576
I0602 19:03:46.273671       1 nginx.go:557] maximum number of open file descriptors : 86357
I0602 19:03:46.565525       1 nginx.go:658] NGINX configuration diff
I0602 19:03:46.565567       1 nginx.go:659] --- /etc/nginx/nginx.conf   2018-06-02 19:02:10.945206418 +0000
+++ /tmp/new-nginx-cfg674813152 2018-06-02 19:03:46.562357191 +0000
@@ -192,165 +192,165 @@

        proxy_ssl_session_reuse on;

-       upstream conference-application-app-http-web {
+       upstream mercury-admin-tool-app-http-web {
                least_conn;

                keepalive 32;

-               server 10.192.71.130:80 max_fails=0 fail_timeout=0;
+               server 10.192.77.155:80 max_fails=0 fail_timeout=0;

        }

-       upstream data-platform-alertmanager-frontend-frontend {
+       upstream chaos-kraken-app-http-web-admin {
                least_conn;

                keepalive 32;

-               server 10.192.79.141:9093 max_fails=0 fail_timeout=0;
+               server 10.192.75.29:9080 max_fails=0 fail_timeout=0;

        }

-       upstream data-platform-prometheus-prometheus {
+       upstream sauron-web-app-http-web {
                least_conn;

                keepalive 32;

-               server 10.192.73.23:9090 max_fails=0 fail_timeout=0;
+               server 10.192.70.169:80 max_fails=0 fail_timeout=0;

        }

-       upstream supply-and-demand-service-app-http-web-admin {
+       upstream rate-of-sale-model-service-app-http-web-admin {
                least_conn;

                keepalive 32;

-               server 10.192.74.61:9080 max_fails=0 fail_timeout=0;
+               server 10.192.67.100:9080 max_fails=0 fail_timeout=0;

        }
-       upstream ingress-nginx-oauth2-proxy-4180 {                                                                                                                                                                                  [159/9481]
+       upstream consumer-platform-app-http-web {
                least_conn;

                keepalive 32;

-               server 10.192.70.159:4180 max_fails=0 fail_timeout=0;
+               server 10.192.65.23:80 max_fails=0 fail_timeout=0;

        }

-       upstream chaos-kraken-app-http-web {
+       upstream monkeynetes-app-http-web {
                least_conn;

                keepalive 32;

-               server 10.192.75.29:80 max_fails=0 fail_timeout=0;
+               server 10.192.73.97:80 max_fails=0 fail_timeout=0;

        }

-       upstream chaos-kraken-app-http-web-admin {
+       upstream data-platform-grafana-grafana {
                least_conn;

                keepalive 32;

-               server 10.192.75.29:9080 max_fails=0 fail_timeout=0;
+               server 10.192.76.158:3000 max_fails=0 fail_timeout=0;

        }

-       upstream sauron-web-app-http-web {
+       upstream data-platform-alertmanager-frontend-frontend {
                least_conn;

                keepalive 32;

-               server 10.192.70.169:80 max_fails=0 fail_timeout=0;
+               server 10.192.79.141:9093 max_fails=0 fail_timeout=0;

        }

-       upstream istio-system-grafana-http {
+       upstream ingress-nginx-oauth2-proxy-4180 {
                least_conn;

                keepalive 32;

-               server 10.192.71.120:3000 max_fails=0 fail_timeout=0;
+               server 10.192.70.159:4180 max_fails=0 fail_timeout=0;

        }

-       upstream istio-system-prometheus-alertmanager-frontend-frontend {
+       upstream conference-application-app-http-web {
                least_conn;

                keepalive 32;
-               server 10.192.76.42:9093 max_fails=0 fail_timeout=0;
+               server 10.192.71.130:80 max_fails=0 fail_timeout=0;

        }

-       upstream istio-system-tracing-query-http {
+       upstream conference-application-app-http-web-admin {
                least_conn;

                keepalive 32;

-               server 10.192.79.62:80 max_fails=0 fail_timeout=0;
+               server 10.192.71.130:9080 max_fails=0 fail_timeout=0;

        }

-       upstream search-one-app-http-web {
+       upstream istio-system-prometheus-alertmanager-frontend-frontend {
                least_conn;

                keepalive 32;

-               server 10.192.79.197:80 max_fails=0 fail_timeout=0;
+               server 10.192.76.42:9093 max_fails=0 fail_timeout=0;

        }

-       upstream mercury-admin-tool-app-http-web {
+       upstream supply-and-demand-service-app-http-web {
                least_conn;

                keepalive 32;

-               server 10.192.77.155:80 max_fails=0 fail_timeout=0;
+               server 10.192.74.61:80 max_fails=0 fail_timeout=0;

        }

-       upstream upstream-default-backend {
+       upstream sauron-web-mobile-app-http-web {
                least_conn;

                keepalive 32;

-               server 10.202.0.20:8080 max_fails=0 fail_timeout=0;
+               server 10.192.72.175:80 max_fails=0 fail_timeout=0;

        }

-       upstream data-platform-grafana-grafana {
+       upstream supply-and-demand-service-app-http-web-admin {
                least_conn;

                keepalive 32;

-               server 10.192.76.158:3000 max_fails=0 fail_timeout=0;
+               server 10.192.74.61:9080 max_fails=0 fail_timeout=0;

        }
-       upstream istio-system-prometheus-http-prometheus {
+       upstream upstream-default-backend {
                least_conn;

                keepalive 32;

-               server 10.192.70.137:9090 max_fails=0 fail_timeout=0;
+               server 10.202.0.20:8080 max_fails=0 fail_timeout=0;

        }

-       upstream core-system-weave-scope-http {
+       upstream istio-system-prometheus-http-prometheus {
                least_conn;

                keepalive 32;

-               server 10.192.70.143:80 max_fails=0 fail_timeout=0;
+               server 10.192.70.137:9090 max_fails=0 fail_timeout=0;

        }

-       upstream monkeynetes-app-http-web {
+       upstream core-system-weave-scope-http {
                least_conn;

                keepalive 32;

-               server 10.192.73.97:80 max_fails=0 fail_timeout=0;
+               server 10.192.70.143:80 max_fails=0 fail_timeout=0;

        }

@@ -363,30 +363,30 @@

        }

-       upstream sauron-web-mobile-app-http-web {
+       upstream istio-system-tracing-query-http {
                least_conn;

                keepalive 32;

-               server 10.192.72.175:80 max_fails=0 fail_timeout=0;
+               server 10.192.79.62:80 max_fails=0 fail_timeout=0;

        }

-       upstream supply-and-demand-service-app-http-web {
+       upstream search-one-app-http-web {
                least_conn;

                keepalive 32;

-               server 10.192.74.61:80 max_fails=0 fail_timeout=0;
+               server 10.192.79.197:80 max_fails=0 fail_timeout=0;

        }
-       upstream conference-application-app-http-web-admin {
+       upstream chaos-kraken-app-http-web {
                least_conn;

                keepalive 32;

-               server 10.192.71.130:9080 max_fails=0 fail_timeout=0;
+               server 10.192.75.29:80 max_fails=0 fail_timeout=0;

        }

@@ -399,21 +399,21 @@

        }

-       upstream rate-of-sale-model-service-app-http-web-admin {
+       upstream data-platform-prometheus-prometheus {
                least_conn;

                keepalive 32;

-               server 10.192.67.100:9080 max_fails=0 fail_timeout=0;
+               server 10.192.73.23:9090 max_fails=0 fail_timeout=0;

        }

-       upstream consumer-platform-app-http-web {
+       upstream istio-system-grafana-http {
                least_conn;

                keepalive 32;

-               server 10.192.65.23:80 max_fails=0 fail_timeout=0;
+               server 10.192.71.120:3000 max_fails=0 fail_timeout=0;

        }


I0602 19:03:46.751995       1 controller.go:178] ingress backend successfully reloaded...

It's the same number of hosts, just ordered differently.

@Stono
Copy link
Contributor Author

Stono commented Jun 2, 2018

I suggest some sort of sort by id on the bit which renders these upstreams

@aledbf
Copy link
Member

aledbf commented Jun 2, 2018

@Stono please use quay.io/aledbf/nginx-ingress-controller:0.366

@Stono
Copy link
Contributor Author

Stono commented Jun 2, 2018

OK, give me a few mins

@Stono
Copy link
Contributor Author

Stono commented Jun 2, 2018

It still reloads @aledbf - but nothing in the log output:

I0602 19:31:59.952979       1 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"monkeynetes", Name:"monkeynetes-ingress-internal", UID:"7ce512f9-5a7e-11e8-a530-42010aa4009f", APIVersion:"extensions", ResourceVersion:"7140929", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress monkeynetes/monkeynetes-ingress-internal
I0602 19:32:01.884065       1 controller.go:168] backend reload required
I0602 19:32:01.884103       1 util.go:67] rlimit.max=1048576
I0602 19:32:01.884110       1 nginx.go:557] maximum number of open file descriptors : 86357
I0602 19:32:02.275807       1 controller.go:178] ingress backend successfully reloaded...

@aledbf
Copy link
Member

aledbf commented Jun 2, 2018

@Stono please use quay.io/aledbf/nginx-ingress-controller:0.367

@Stono
Copy link
Contributor Author

Stono commented Jun 2, 2018

Same again!

I0602 20:00:13.108476       1 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"monkeynetes", Name:"monkeynetes-ingress-internal", UID:"7ce512f9-5a7e-11e8-a530-42010aa4009f", APIVersion:"extensions", ResourceVersion:"7147200", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress monkeynetes/monkeynetes-ingress-internal
I0602 20:00:13.109864       1 controller.go:168] backend reload required
I0602 20:00:13.109892       1 util.go:67] rlimit.max=1048576
I0602 20:00:13.109900       1 nginx.go:557] maximum number of open file descriptors : 86357
I0602 20:00:13.542453       1 controller.go:178] ingress backend successfully reloaded...

@Stono
Copy link
Contributor Author

Stono commented Jun 2, 2018

Any ideas what it may be? I'm going live with a bunch of stuff on Monday so wondering if we're going to be able to fix it by then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants