Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation when upgrading from 0.21.0 to 0.23.0 #3951

Closed
eranreshef opened this issue Apr 1, 2019 · 16 comments
Closed

Performance degradation when upgrading from 0.21.0 to 0.23.0 #3951

eranreshef opened this issue Apr 1, 2019 · 16 comments

Comments

@eranreshef
Copy link

eranreshef commented Apr 1, 2019

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
Bug report
NGINX Ingress controller version:
0.23.0

Kubernetes version (use kubectl version):
1.11.7

Environment:

  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): Container Linux by CoreOS 2023.4.0 (Rhyolite)
  • Kernel (e.g. uname -a): 4.19.23-coreos-r1
  • Install tools: ?
  • Others: ?

What happened:
After upgrading from v0.21.0 to v0.23.0, I noticed an impact on performance both in memory and CPU.

What you expected to happen:
I expected performance to remain more or less the same or become better.

How to reproduce it (as minimally and precisely as possible):
I just upgraded the image in my k8s manifest and update the nginx.tmpl

Anything else we need to know: I attached an image of my nginx grafana dashboard, its easy to see when the upgrade happened.
image

@aledbf
Copy link
Member

aledbf commented Apr 3, 2019

and update the nginx.tmpl

Can you test the image without any customization of the template?

@eranreshef
Copy link
Author

when I wrote that I updated the nginx.tmpl, I meant that I updated it to match v0.23.0 as there were some noticeable changes there.
My customisations are about some error codes, nothing related to performance.

@eranreshef
Copy link
Author

I downgraded to v0.22.0 and things got back to normal, so this indicated that the impact on performance was introduced in v0.23.0.
I went over the changelog but couldn't find any change that might affect performance, did I miss anything?
(btw, I also tried upgrading to 0.24.0, didn't help)

@eranreshef
Copy link
Author

It seems that as of v0.23.0, the nginx is constantly reloading its configuration for some un-explained reason. This image describes the nginx_ingress_controller_config_last_reload_successful_timestamp_seconds metric in last 7 days:
image

@aledbf
Copy link
Member

aledbf commented Apr 8, 2019

@eranreshef please add the flag -v=2 in the ingress controller deployment to see the change in the configuration file that triggers the reload

@eranreshef
Copy link
Author

Sorry for not responding, the issue was resolved a couple of days ago.
The problem was that we had 2 ingresses that use dynamic certificates and were not configured correctly. This caused the nginx-ingress pods to constantly reload the config, but no warning or error was printed to the log. Another important issue is that these object were also miss-configured in 0.21.0 and 0.22.0 but there was no constant reloading there.

@eranreshef
Copy link
Author

I'm re-opening this because I missed an important step in the resolution process I described above.
Fixing the ingresses configurations didn't solve the problem and the nginx kept reloading the configs, until we deleted the ingresses in matter and re-applied the exact same objects. We did a get operation for the ingresses with the -o yaml flag and saved the output to a file, deleted the ingress and applied the pre-saved files. Only then the problem was resolved so I believe there is something thats worth looking deeper into.

@eranreshef eranreshef reopened this Apr 18, 2019
@aledbf
Copy link
Member

aledbf commented Apr 18, 2019

Only then the problem was resolved so I believe there is something thats worth looking deeper into.

Please add the flag -v=2 in the ingress controller deployment to see the change in the configuration file that triggers the reload. The log should show a diff with the changes

@eranreshef
Copy link
Author

by now all of my enviornments have been successfully upgraded, I'll see if I can somehow reproduce it (might take a while). In the meantime I can share that these log lines were the ones that kept appearing in the info log:

nginx-ingress-controller-5968dfb986-b6jtb nginx-ingress-controller I0418 15:11:47.439405       8 controller.go:172] Configuration changes detected, backend reload required.
nginx-ingress-controller-5968dfb986-b6jtb nginx-ingress-controller I0418 15:11:47.773637       8 controller.go:190] Backend successfully reloaded.```

@aledbf
Copy link
Member

aledbf commented Apr 18, 2019

nginx-ingress-controller-5968dfb986-b6jtb nginx-ingress-controller I0418 15:11:47.439405 8 controller.go:172] Configuration changes detected, backend reload required.
nginx-ingress-controller-5968dfb986-b6jtb nginx-ingress-controller I0418 15:11:47.773637 8 controller.go:190] Backend successfully reloaded.```

With --v=2 you could see the details.

@aledbf
Copy link
Member

aledbf commented Apr 18, 2019

Also, if you have access to the cluster your local machine and can use port-forward, you can use this approach to the flag --v=2 and not disrupt production traffic.

In a terminal execute kubectl proxy and in a different one

docker run \
  --rm \
  --name ingress-controller \
  --net=host \
  -e POD_NAMESPACE='default' \
  -e POD_NAME='dummy' \
  quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.24.1 /nginx-ingress-controller \
  --update-status=false \
  --apiserver-host=http://localhost:8001 \
  --v=2

@eranreshef
Copy link
Author

by now all of my environments have been successfully upgraded

That includes also my production environment

@paalkr
Copy link
Contributor

paalkr commented May 6, 2019

Hi

It looks like fallback to default ssl certificate triggers a backend reload. This did not happen with version < 0.22. Is this a bug or expected behavior going forward? We do have a wildcard default certificate, and just relying on fallback to the default certificate has worked well until now.

        args:
        - /nginx-ingress-controller
        - --v=1
        - --configmap=$(POD_NAMESPACE)/ingress-nginx
        - --publish-service=$(POD_NAMESPACE)/ingress-nginx
        - --default-ssl-certificate=$(POD_NAMESPACE)/<default_cert>

from the logs

W0506 13:04:20.733505       7 controller.go:1042] Error getting SSL certificate "inatur-prod/star-gdo-tls-chain": local SSL certificate inatur-prod/star-gdo-tls-chain was not found. Using default certificate
W0506 13:04:20.733540       7 controller.go:1042] Error getting SSL certificate "inatur-test/star-gdo-tls-chain": local SSL certificate inatur-prod/star-gdo-tls-chain was not found. Using default certificate
I0506 13:04:20.826786       7 controller.go:170] Configuration changes detected, backend reload required.
I0506 13:04:42.311631       7 controller.go:188] Backend successfully reloaded.
W0506 13:04:20.733505       7 controller.go:1042] Error getting SSL certificate "inatur-prod/star-gdo-tls-chain": local SSL certificate inatur-prod/star-gdo-tls-chain was not found. Using default certificate
W0506 13:04:20.733540       7 controller.go:1042] Error getting SSL certificate "inatur-test/star-gdo-tls-chain": local SSL certificate inatur-prod/star-gdo-tls-chain was not found. Using default certificate

Or is this just a notice information, actually provided because of the triggered backend reload, not causing it.

@aledbf
Copy link
Member

aledbf commented May 6, 2019

@paalkr please make sure you are using 0.24.1. It contains a fix for this scenario #4000
If you are using that version, please open a new issue

@paalkr
Copy link
Contributor

paalkr commented May 6, 2019

@aledbf , thanks for the feedback. I'm actually using 0.24.1
But I think my case might be related to #3843. And that the log entry about fallback to the default cert is just information given because of the reloads.

@aledbf
Copy link
Member

aledbf commented Jul 8, 2019

Closing. Please update to 0.25.0 and reopen if the issue persist.

@aledbf aledbf closed this as completed Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants