Ingress nginx OOM #4703

andrii29 · 2019-10-21T11:15:38Z

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.): already asked in slack channel - no answer

What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.): memory, OOM, nginx, nginx-ingress

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG

NGINX Ingress controller version: 0.26.1

Kubernetes version (use kubectl version):
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: hardware
OS (e.g. from /etc/os-release): reproduced both on ubuntu16 and ubuntu18
Kernel (e.g. uname -a): 4.15.0-65-generic
Install tools:
Others:

What happened: memory start leaking and after few hours container was killed by OOM killer

What you expected to happen: no memory leaks

How to reproduce it (as minimally and precisely as possible): ~10-15k RPS

Anything else we need to know:
The main process begins to use more and more memory until it is killed by the OOM killer. I added a location to check the garbage collection (#3314 (comment)). It shows 1-5 MB. No errors or warnings were observed in nginx log.

The text was updated successfully, but these errors were encountered:

andrii29 · 2019-10-23T06:02:20Z

I checked profiler and found that metrics can be possible source of problem

So, I disabled metrics on one host and enabled on another (almost simmilar servers with same traffic)

As you can see, server with enabled metrics have problems with memory leaks. Also this server use more CPU resources.
Is there any way to reconfigure metrics part (for example, enable only some metrics) to avoid memory leaking and high CPU usage?
This metrics are really useful, and I don't want to switch into log parsing (https://github.com/martin-helmich/prometheus-nginxlog-exporter) or any other nginx metrics collectors.

Regards,
Andrii

bzon · 2019-10-24T10:51:37Z

There was indeed an increase in memory usage after we upgraded 0.26.1. Nginx pods are consuming 700-800 Mi on average with 0 qps.

davidcodesido · 2019-11-19T18:24:57Z

I'm getting sudden timeouts when nginx-ingress is running for a few days (6-7) with no apparent error in the logs, like if the requests were not being processed at all. This behaviour started to show up after upgrading to 0.26.1. I rolled back to version 0.24.1 and everything works smooth. Not sure how I can provide data/information that would allow you to debug that.

lucax88x · 2019-12-19T12:57:43Z

Having the same big issue right now in PROD.
it starts from using no memory and after 1/2 hours it consumed it all. growing pretty fast.

I'll try the 0.24.1 as suggested by @davidcodesido

lucax88x · 2019-12-20T10:09:09Z

Having same issue with 0.24.1 :(

It doesn't always happens, it randomly starts bumping high gb/30 minutes and then server collapses and stabilizes again.

aledbf · 2019-12-29T10:09:30Z

Please test quay.io/kubernetes-ingress-controller/nginx-ingress-controller-amd64:dev-1.17.7-1
This image contains the current master and #4863

andrii29 · 2019-12-30T08:00:56Z

Hello,
@aledbf
I'm using helm chart for ingress-nginx and got error when trying to use this tag
Could you change tag name to satisfy condition
https://github.com/helm/charts/blob/master/stable/nginx-ingress/templates/controller-daemonset.yaml#L64-L73
?

Regards,
Andrii

lucax88x · 2019-12-31T13:23:00Z

Please test quay.io/kubernetes-ingress-controller/nginx-ingress-controller-amd64:dev-1.17.7-1
This image contains the current master and #4863

@aledbf In my case I'm still having same issue :(

As soon I try to telnet a specific port nginx it suddenly starts loopping saying this port is not reachable and goes OOM after some minutes.

And yes, if I close the connection it keeps telling and logging port is not reachable and I must kill manually the pod.

andrii29 · 2020-02-05T09:01:34Z

Hello,
tried new release of ingress-nginx 0.28.0 with enabled metrics - no memory problems

Will wait for 1 more day to confirm that everything is OK

Regards,
Andrii

andrii29 · 2020-02-06T09:38:42Z

Hello,

confirmed - no memory problems

Regards,
Andrii

andrii29 closed this as completed Feb 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingress nginx OOM #4703

Ingress nginx OOM #4703

andrii29 commented Oct 21, 2019

andrii29 commented Oct 23, 2019

bzon commented Oct 24, 2019

davidcodesido commented Nov 19, 2019

lucax88x commented Dec 19, 2019

lucax88x commented Dec 20, 2019

aledbf commented Dec 29, 2019 •

edited

Loading

andrii29 commented Dec 30, 2019

lucax88x commented Dec 31, 2019

andrii29 commented Feb 5, 2020

andrii29 commented Feb 6, 2020

Ingress nginx OOM #4703

Ingress nginx OOM #4703

Comments

andrii29 commented Oct 21, 2019

andrii29 commented Oct 23, 2019

bzon commented Oct 24, 2019

davidcodesido commented Nov 19, 2019

lucax88x commented Dec 19, 2019

lucax88x commented Dec 20, 2019

aledbf commented Dec 29, 2019 • edited Loading

andrii29 commented Dec 30, 2019

lucax88x commented Dec 31, 2019

andrii29 commented Feb 5, 2020

andrii29 commented Feb 6, 2020

aledbf commented Dec 29, 2019 •

edited

Loading