Readiness and Liveness Probe Failing under High CPU Load #4505

sreedharbukya · 2019-08-29T03:38:02Z

In my production cluster, under high CPU resouce utilization readiness and liveness probe is failing constantly.

I have setup ingress-nginx using helm package

ingress-nginx 3 Thu Aug 29 00:32:40 2019 DEPLOYED nginx-ingress-1.15.0 0.25.0 ingress-nginx

This is event logs from ingress-nginx namesapce.

kubectl get events -n  ingress-nginx
LAST SEEN   TYPE      REASON      KIND   MESSAGE
28m         Warning   Unhealthy   Pod    Readiness probe failed: Get http://100.96.166.13:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
42m         Warning   Unhealthy   Pod    Liveness probe failed: Get http://100.96.166.13:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2m46s       Warning   Unhealthy   Pod    Readiness probe failed: Get http://100.96.164.19:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
14m         Warning   Unhealthy   Pod    Liveness probe failed: Get http://100.96.164.19:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
24m         Warning   Unhealthy   Pod    Liveness probe failed: HTTP probe failed with statuscode: 500
9m52s       Warning   Unhealthy   Pod    Liveness probe failed: Get http://100.96.166.12:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
47m         Warning   Unhealthy   Pod    Readiness probe failed: Get http://100.96.166.12:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

NGINX Ingress controller version:
0.25.0

Kubernetes version (use kubectl version):
v1.12.8

Environment:

Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release): Debian GNU/Linux 9 (stretch)
Kernel (e.g. uname -a): 4.9.0-9-amd64
Install tools:
Others:

What happened:
ingress pods keep restarting.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know:

The text was updated successfully, but these errors were encountered:

sreedharbukya · 2019-08-30T03:05:35Z

More events logs for the system

41m         Normal    Scheduled          Pod          Successfully assigned ingress-nginx/ingress-nginx-nginx-ingress-controller-6d49959c4f-75blw to ip-10-0-33-231.ap-southeast-1.compute.internal
41m         Normal    Pulling            Pod          pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.24.1"
40m         Normal    Pulled             Pod          Successfully pulled image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.24.1"
40m         Normal    Created            Pod          Created container
40m         Normal    Started            Pod          Started container
14m         Warning   Unhealthy          Pod          Liveness probe failed: Get http://100.96.151.62:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
4m40s       Warning   Unhealthy          Pod          Readiness probe failed: Get http://100.96.151.62:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
13m         Warning   Unhealthy          Pod          Readiness probe failed: Get http://100.96.151.60:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
100s        Warning   Unhealthy          Pod          Liveness probe failed: Get http://100.96.151.60:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
24m         Warning   Unhealthy          Pod          Liveness probe failed: Get http://100.96.138.105:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
9m26s       Normal    Killing            Pod          Killing container with id docker://nginx-ingress-controller:Container failed liveness probe.. Container will be killed and recreated.
4m20s       Warning   BackOff            Pod          Back-off restarting failed container
34m         Warning   Unhealthy          Pod          Readiness probe failed: HTTP probe failed with statuscode: 500
2m22s       Warning   Unhealthy          Pod          Liveness probe failed: Get http://100.96.138.108:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
57m         Warning   Unhealthy          Pod          Readiness probe failed: Get http://100.96.138.108:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
42m         Warning   Unhealthy          Pod          Liveness probe failed: Get http://100.96.138.108:10254/healthz: dial tcp 100.96.138.108:10254: connect: connection refused
7m16s       Warning   BackOff            Pod          Back-off restarting failed container
44m         Normal    Pulled             Pod          Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.24.1" already present on machine
49m         Warning   BackOff            Pod          Back-off restarting failed container
41m         Normal    SuccessfulCreate   ReplicaSet   Created pod: ingress-nginx-nginx-ingress-controller-6d49959c4f-75blw
58m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
58m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
56m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
51m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
50m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
49m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
43m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
40m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
35m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
34m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
32m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
27m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
25m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
24m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
20m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
17m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
12m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
10m         Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
3m38s       Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
2m40s       Normal    CREATE             ConfigMap    ConfigMap ingress-nginx/ingress-nginx-nginx-ingress-controller
35m         Warning   Unhealthy          Pod          Readiness probe failed: Get http://100.96.138.110:8080/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
35m         Warning   Unhealthy          Pod          Readiness probe failed: Get http://100.96.138.106:8080/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
60m         Warning   Unhealthy          Pod          Liveness probe failed: Get http://100.96.138.106:8080/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Top from Pods

kubectl top pods -n ingress-nginx
NAME                                                          CPU(cores)   MEMORY(bytes)
ingress-nginx-nginx-ingress-controller-6d49959c4f-75blw       4m           143Mi
ingress-nginx-nginx-ingress-controller-6d49959c4f-b4j54       4m           147Mi
ingress-nginx-nginx-ingress-controller-6d49959c4f-cc8t2       11m          171Mi
ingress-nginx-nginx-ingress-controller-6d49959c4f-vlfkd       0m           0Mi
ingress-nginx-nginx-ingress-controller-6d49959c4f-vxdz2       4m           145Mi
ingress-nginx-nginx-ingress-default-backend-8df6c5b67-5hcsj   1m           3Mi
ingress-nginx-nginx-ingress-default-backend-8df6c5b67-bp5cl   1m           3Mi
ingress-nginx-nginx-ingress-default-backend-8df6c5b67-cqpzh   1m           3Mi

aledbf · 2019-08-30T13:25:54Z

@sreedharbukya what kind of node are you using? what's the load on the node where is failing?

aledbf · 2019-08-30T13:53:39Z

@sreedharbukya please try to provide the required steps to reproduce this issue.
(reading your report there is nothing actionable we can do to reproduce it)

aledbf · 2019-09-02T14:11:15Z

Closing. This is fixed in master #4487
If you want to test the fix, you can use the image quay.io/kubernetes-ingress-controller/nginx-ingress-controller:dev

sreedharbukya · 2019-09-02T14:15:02Z

Thank you @aledbf. When is the plan for next release?

sreedharbukya changed the title ~~Readiness and Liveness Probe Failing under High CPU machine~~ Readiness and Liveness Probe Failing under High CPU Load Aug 29, 2019

aledbf closed this as completed Sep 2, 2019

devops-corgi mentioned this issue Jan 23, 2020

Unable to start up / Liveness probe failed #4898

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readiness and Liveness Probe Failing under High CPU Load #4505

Readiness and Liveness Probe Failing under High CPU Load #4505

sreedharbukya commented Aug 29, 2019

sreedharbukya commented Aug 30, 2019

aledbf commented Aug 30, 2019

aledbf commented Aug 30, 2019

aledbf commented Sep 2, 2019

sreedharbukya commented Sep 2, 2019

Readiness and Liveness Probe Failing under High CPU Load #4505

Readiness and Liveness Probe Failing under High CPU Load #4505

Comments

sreedharbukya commented Aug 29, 2019

sreedharbukya commented Aug 30, 2019

aledbf commented Aug 30, 2019

aledbf commented Aug 30, 2019

aledbf commented Sep 2, 2019

sreedharbukya commented Sep 2, 2019