Performance degradation when upgrading from 0.21.0 to 0.23.0 #3951

eranreshef · 2019-04-01T09:44:27Z

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
Bug report
NGINX Ingress controller version:
0.23.0

Kubernetes version (use kubectl version):
1.11.7

Environment:

Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release): Container Linux by CoreOS 2023.4.0 (Rhyolite)
Kernel (e.g. uname -a): 4.19.23-coreos-r1
Install tools: ?
Others: ?

What happened:
After upgrading from v0.21.0 to v0.23.0, I noticed an impact on performance both in memory and CPU.

What you expected to happen:
I expected performance to remain more or less the same or become better.

How to reproduce it (as minimally and precisely as possible):
I just upgraded the image in my k8s manifest and update the nginx.tmpl

Anything else we need to know: I attached an image of my nginx grafana dashboard, its easy to see when the upgrade happened.

The text was updated successfully, but these errors were encountered:

aledbf · 2019-04-03T11:54:58Z

and update the nginx.tmpl

Can you test the image without any customization of the template?

eranreshef · 2019-04-03T12:12:36Z

when I wrote that I updated the nginx.tmpl, I meant that I updated it to match v0.23.0 as there were some noticeable changes there.
My customisations are about some error codes, nothing related to performance.

eranreshef · 2019-04-07T08:28:29Z

I downgraded to v0.22.0 and things got back to normal, so this indicated that the impact on performance was introduced in v0.23.0.
I went over the changelog but couldn't find any change that might affect performance, did I miss anything?
(btw, I also tried upgrading to 0.24.0, didn't help)

eranreshef · 2019-04-08T06:55:09Z

It seems that as of v0.23.0, the nginx is constantly reloading its configuration for some un-explained reason. This image describes the nginx_ingress_controller_config_last_reload_successful_timestamp_seconds metric in last 7 days:

aledbf · 2019-04-08T10:35:45Z

@eranreshef please add the flag -v=2 in the ingress controller deployment to see the change in the configuration file that triggers the reload

eranreshef · 2019-04-15T06:51:18Z

Sorry for not responding, the issue was resolved a couple of days ago.
The problem was that we had 2 ingresses that use dynamic certificates and were not configured correctly. This caused the nginx-ingress pods to constantly reload the config, but no warning or error was printed to the log. Another important issue is that these object were also miss-configured in 0.21.0 and 0.22.0 but there was no constant reloading there.

eranreshef · 2019-04-18T16:34:30Z

I'm re-opening this because I missed an important step in the resolution process I described above.
Fixing the ingresses configurations didn't solve the problem and the nginx kept reloading the configs, until we deleted the ingresses in matter and re-applied the exact same objects. We did a get operation for the ingresses with the -o yaml flag and saved the output to a file, deleted the ingress and applied the pre-saved files. Only then the problem was resolved so I believe there is something thats worth looking deeper into.

aledbf · 2019-04-18T16:36:16Z

Only then the problem was resolved so I believe there is something thats worth looking deeper into.

Please add the flag -v=2 in the ingress controller deployment to see the change in the configuration file that triggers the reload. The log should show a diff with the changes

eranreshef · 2019-04-18T16:46:51Z

by now all of my enviornments have been successfully upgraded, I'll see if I can somehow reproduce it (might take a while). In the meantime I can share that these log lines were the ones that kept appearing in the info log:

nginx-ingress-controller-5968dfb986-b6jtb nginx-ingress-controller I0418 15:11:47.439405       8 controller.go:172] Configuration changes detected, backend reload required.
nginx-ingress-controller-5968dfb986-b6jtb nginx-ingress-controller I0418 15:11:47.773637       8 controller.go:190] Backend successfully reloaded.```

aledbf · 2019-04-18T16:53:13Z

nginx-ingress-controller-5968dfb986-b6jtb nginx-ingress-controller I0418 15:11:47.439405 8 controller.go:172] Configuration changes detected, backend reload required.
nginx-ingress-controller-5968dfb986-b6jtb nginx-ingress-controller I0418 15:11:47.773637 8 controller.go:190] Backend successfully reloaded.```

With --v=2 you could see the details.

aledbf · 2019-04-18T17:01:58Z

Also, if you have access to the cluster your local machine and can use port-forward, you can use this approach to the flag --v=2 and not disrupt production traffic.

In a terminal execute kubectl proxy and in a different one

docker run \
  --rm \
  --name ingress-controller \
  --net=host \
  -e POD_NAMESPACE='default' \
  -e POD_NAME='dummy' \
  quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.24.1 /nginx-ingress-controller \
  --update-status=false \
  --apiserver-host=http://localhost:8001 \
  --v=2

eranreshef · 2019-04-18T17:04:15Z

by now all of my environments have been successfully upgraded

That includes also my production environment

paalkr · 2019-05-06T13:12:30Z

Hi

It looks like fallback to default ssl certificate triggers a backend reload. This did not happen with version < 0.22. Is this a bug or expected behavior going forward? We do have a wildcard default certificate, and just relying on fallback to the default certificate has worked well until now.

        args:
        - /nginx-ingress-controller
        - --v=1
        - --configmap=$(POD_NAMESPACE)/ingress-nginx
        - --publish-service=$(POD_NAMESPACE)/ingress-nginx
        - --default-ssl-certificate=$(POD_NAMESPACE)/<default_cert>

from the logs

W0506 13:04:20.733505       7 controller.go:1042] Error getting SSL certificate "inatur-prod/star-gdo-tls-chain": local SSL certificate inatur-prod/star-gdo-tls-chain was not found. Using default certificate
W0506 13:04:20.733540       7 controller.go:1042] Error getting SSL certificate "inatur-test/star-gdo-tls-chain": local SSL certificate inatur-prod/star-gdo-tls-chain was not found. Using default certificate
I0506 13:04:20.826786       7 controller.go:170] Configuration changes detected, backend reload required.
I0506 13:04:42.311631       7 controller.go:188] Backend successfully reloaded.
W0506 13:04:20.733505       7 controller.go:1042] Error getting SSL certificate "inatur-prod/star-gdo-tls-chain": local SSL certificate inatur-prod/star-gdo-tls-chain was not found. Using default certificate
W0506 13:04:20.733540       7 controller.go:1042] Error getting SSL certificate "inatur-test/star-gdo-tls-chain": local SSL certificate inatur-prod/star-gdo-tls-chain was not found. Using default certificate

Or is this just a notice information, actually provided because of the triggered backend reload, not causing it.

aledbf · 2019-05-06T13:17:36Z

@paalkr please make sure you are using 0.24.1. It contains a fix for this scenario #4000
If you are using that version, please open a new issue

paalkr · 2019-05-06T13:20:34Z

@aledbf , thanks for the feedback. I'm actually using 0.24.1
But I think my case might be related to #3843. And that the log entry about fallback to the default cert is just information given because of the reloads.

aledbf · 2019-07-08T13:52:28Z

Closing. Please update to 0.25.0 and reopen if the issue persist.

eranreshef closed this as completed Apr 15, 2019

eranreshef reopened this Apr 18, 2019

aledbf closed this as completed Jul 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance degradation when upgrading from 0.21.0 to 0.23.0 #3951

Performance degradation when upgrading from 0.21.0 to 0.23.0 #3951

eranreshef commented Apr 1, 2019 •

edited

Loading

aledbf commented Apr 3, 2019

eranreshef commented Apr 3, 2019

eranreshef commented Apr 7, 2019

eranreshef commented Apr 8, 2019

aledbf commented Apr 8, 2019

eranreshef commented Apr 15, 2019

eranreshef commented Apr 18, 2019

aledbf commented Apr 18, 2019

eranreshef commented Apr 18, 2019

aledbf commented Apr 18, 2019

aledbf commented Apr 18, 2019 •

edited

Loading

eranreshef commented Apr 18, 2019

paalkr commented May 6, 2019 •

edited

Loading

aledbf commented May 6, 2019 •

edited

Loading

paalkr commented May 6, 2019

aledbf commented Jul 8, 2019

Performance degradation when upgrading from 0.21.0 to 0.23.0 #3951

Performance degradation when upgrading from 0.21.0 to 0.23.0 #3951

Comments

eranreshef commented Apr 1, 2019 • edited Loading

aledbf commented Apr 3, 2019

eranreshef commented Apr 3, 2019

eranreshef commented Apr 7, 2019

eranreshef commented Apr 8, 2019

aledbf commented Apr 8, 2019

eranreshef commented Apr 15, 2019

eranreshef commented Apr 18, 2019

aledbf commented Apr 18, 2019

eranreshef commented Apr 18, 2019

aledbf commented Apr 18, 2019

aledbf commented Apr 18, 2019 • edited Loading

eranreshef commented Apr 18, 2019

paalkr commented May 6, 2019 • edited Loading

aledbf commented May 6, 2019 • edited Loading

paalkr commented May 6, 2019

aledbf commented Jul 8, 2019

eranreshef commented Apr 1, 2019 •

edited

Loading

aledbf commented Apr 18, 2019 •

edited

Loading

paalkr commented May 6, 2019 •

edited

Loading

aledbf commented May 6, 2019 •

edited

Loading