Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingress nginx OOM #4703

Closed
andrii29 opened this issue Oct 21, 2019 · 10 comments
Closed

Ingress nginx OOM #4703

andrii29 opened this issue Oct 21, 2019 · 10 comments

Comments

@andrii29
Copy link

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.): already asked in slack channel - no answer

What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.): memory, OOM, nginx, nginx-ingress


Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG

NGINX Ingress controller version: 0.26.1

Kubernetes version (use kubectl version):
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: hardware
  • OS (e.g. from /etc/os-release): reproduced both on ubuntu16 and ubuntu18
  • Kernel (e.g. uname -a): 4.15.0-65-generic
  • Install tools:
  • Others:

What happened: memory start leaking and after few hours container was killed by OOM killer

What you expected to happen: no memory leaks

How to reproduce it (as minimally and precisely as possible): ~10-15k RPS

Anything else we need to know:
The main process begins to use more and more memory until it is killed by the OOM killer. I added a location to check the garbage collection (#3314 (comment)). It shows 1-5 MB. No errors or warnings were observed in nginx log.
nginx-ingress-memory-leak

@andrii29
Copy link
Author

I checked profiler and found that metrics can be possible source of problem
image (2)
So, I disabled metrics on one host and enabled on another (almost simmilar servers with same traffic)
image (3)
As you can see, server with enabled metrics have problems with memory leaks. Also this server use more CPU resources.
Is there any way to reconfigure metrics part (for example, enable only some metrics) to avoid memory leaking and high CPU usage?
This metrics are really useful, and I don't want to switch into log parsing (https://github.com/martin-helmich/prometheus-nginxlog-exporter) or any other nginx metrics collectors.

Regards,
Andrii

@bzon
Copy link

bzon commented Oct 24, 2019

There was indeed an increase in memory usage after we upgraded 0.26.1. Nginx pods are consuming 700-800 Mi on average with 0 qps.

@davidcodesido
Copy link

I'm getting sudden timeouts when nginx-ingress is running for a few days (6-7) with no apparent error in the logs, like if the requests were not being processed at all. This behaviour started to show up after upgrading to 0.26.1. I rolled back to version 0.24.1 and everything works smooth. Not sure how I can provide data/information that would allow you to debug that.

@lucax88x
Copy link

Having the same big issue right now in PROD.
it starts from using no memory and after 1/2 hours it consumed it all. growing pretty fast.

Screenshot_2019-12-19_13-37-15
I'll try the 0.24.1 as suggested by @davidcodesido

@lucax88x
Copy link

Having same issue with 0.24.1 :(

It doesn't always happens, it randomly starts bumping high gb/30 minutes and then server collapses and stabilizes again.

@aledbf
Copy link
Member

aledbf commented Dec 29, 2019

Please test quay.io/kubernetes-ingress-controller/nginx-ingress-controller-amd64:dev-1.17.7-1
This image contains the current master and #4863

@andrii29
Copy link
Author

Hello,
@aledbf
I'm using helm chart for ingress-nginx and got error when trying to use this tag
Could you change tag name to satisfy condition
https://github.com/helm/charts/blob/master/stable/nginx-ingress/templates/controller-daemonset.yaml#L64-L73
?

Regards,
Andrii

@lucax88x
Copy link

Please test quay.io/kubernetes-ingress-controller/nginx-ingress-controller-amd64:dev-1.17.7-1
This image contains the current master and #4863

@aledbf In my case I'm still having same issue :(

As soon I try to telnet a specific port nginx it suddenly starts loopping saying this port is not reachable and goes OOM after some minutes.

And yes, if I close the connection it keeps telling and logging port is not reachable and I must kill manually the pod.

@andrii29
Copy link
Author

andrii29 commented Feb 5, 2020

Hello,
tried new release of ingress-nginx 0.28.0 with enabled metrics - no memory problems
image
Will wait for 1 more day to confirm that everything is OK

Regards,
Andrii

@andrii29
Copy link
Author

andrii29 commented Feb 6, 2020

Hello,

confirmed - no memory problems

Regards,
Andrii

@andrii29 andrii29 closed this as completed Feb 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants