Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readiness and Liveness probe failed: HTTP probe failed with statuscode: 500 #2171

Closed
max-rocket-internet opened this issue Mar 5, 2018 · 21 comments

Comments

@max-rocket-internet
Copy link

NGINX Ingress controller version: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.11.0 installed with helm using stable chart.
Kubernetes version (use kubectl version): 1.8.4
Environment:

  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): kops version 1.8.1 (Debian I think)

What happened: nginx-ingress-controller pod Readiness and Liveness probe failed: HTTP probe failed with statuscode: 500. The pod is terminated and restarted. This happens 2-5 times until it starts successfully.

What you expected to happen: Pod to start successfully without failing Readiness and Liveness probe.

How to reproduce it (as minimally and precisely as possible): We are running the nginx-ingress-controller as a daemonset so whenever a new node is created we see this problem.

Anything else we need to know: This issue has been opened before:

Here are the events from the nginx-ingress-controller pod:

Events:
  Type     Reason                 Age                From                                                  Message
  ----     ------                 ----               ----                                                  -------
  Normal   SuccessfulMountVolume  2m                 kubelet, ip-10-0-19-85.eu-central-1.compute.internal  MountVolume.SetUp succeeded for volume "ingress1-nginx-ingress-token-jm48x"
  Warning  FailedSync             1m (x3 over 2m)    kubelet, ip-10-0-19-85.eu-central-1.compute.internal  Error syncing pod
  Normal   Pulling                1m                 kubelet, ip-10-0-19-85.eu-central-1.compute.internal  pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.11.0"
  Normal   Pulled                 48s                kubelet, ip-10-0-19-85.eu-central-1.compute.internal  Successfully pulled image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.11.0"
  Warning  Unhealthy              13s (x3 over 33s)  kubelet, ip-10-0-19-85.eu-central-1.compute.internal  Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  Unhealthy              4s (x4 over 34s)   kubelet, ip-10-0-19-85.eu-central-1.compute.internal  Readiness probe failed: HTTP probe failed with statuscode: 500
  Normal   Created                0s (x2 over 48s)   kubelet, ip-10-0-19-85.eu-central-1.compute.internal  Created container
  Normal   Started                0s (x2 over 48s)   kubelet, ip-10-0-19-85.eu-central-1.compute.internal  Started container
  Normal   Killing                0s                 kubelet, ip-10-0-19-85.eu-central-1.compute.internal  Killing container with id docker://nginx-ingress-controller:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Pulled                 0s                 kubelet, ip-10-0-19-85.eu-central-1.compute.internal  Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.11.0" already present on machine

Here is the default probe config:

       livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1

Here is the helm chart values we use: https://gist.github.com/max-rocket-internet/ba6b368502f58bc7061d3062939b5dca

I have logs from pod with --v=10 argument set but there is a lot of output and some of it is sensitive. Here is an excerpt but let me know if need more:

I0305 10:57:12.548693       7 main.go:47] annotation kubernetes.io/ingress.class is not present in ingress default/env1-app1-part1
I0305 10:57:15.587793       7 round_trippers.go:417] curl -k -v -XGET  -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "User-Agent: nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer xxxxxxxx" https://100.64.0.1:443/api/v1/namespaces/default/configmaps/ingress-controller-leader-nginx
I0305 10:57:15.590327       7 round_trippers.go:436] GET https://100.64.0.1:443/api/v1/namespaces/default/configmaps/ingress-controller-leader-nginx 200 OK in 2 milliseconds
I0305 10:57:15.590344       7 round_trippers.go:442] Response Headers:
I0305 10:57:15.590350       7 round_trippers.go:445]     Content-Type: application/vnd.kubernetes.protobuf
I0305 10:57:15.590355       7 round_trippers.go:445]     Content-Length: 437
I0305 10:57:15.590362       7 round_trippers.go:445]     Date: Mon, 05 Mar 2018 10:57:15 GMT
I0305 10:57:15.590397       7 request.go:871] Response Body:
00000000  6b 38 73 00 0a 0f 0a 02  76 31 12 09 43 6f 6e 66  |k8s.....v1..Conf|
...
I0305 10:57:15.590459       7 leaderelection.go:243] lock is held by ingress1-nginx-ingress-controller-9jsqp and has not yet expired
I0305 10:57:15.590467       7 leaderelection.go:180] failed to acquire lease default/ingress-controller-leader-nginx
I0305 10:57:22.549142       7 main.go:47] annotation kubernetes.io/ingress.class is not present in ingress default/env1-app2-admin
I0305 10:57:26.091336       7 main.go:152] Received SIGTERM, shutting down
I0305 10:57:26.091359       7 nginx.go:359] shutting down controller queues
I0305 10:57:26.091376       7 nginx.go:367] stopping NGINX process...
2018/03/05 10:57:26 [notice] 29#29: signal process started
I0305 10:57:29.097347       7 nginx.go:380] NGINX process has stopped
I0305 10:57:29.097372       7 main.go:160] Handled quit, awaiting pod deletion
I0305 10:57:30.992643       7 round_trippers.go:417] curl -k -v -XGET  -H "User-Agent: nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer xxxxxx" -H "Accept: application/vnd.kubernetes.protobuf, */*" https://100.64.0.1:443/api/v1/namespaces/default/configmaps/ingress-controller-leader-nginx
I0305 10:57:30.994766       7 round_trippers.go:436] GET https://100.64.0.1:443/api/v1/namespaces/default/configmaps/ingress-controller-leader-nginx 200 OK in 2 milliseconds
I0305 10:57:30.994786       7 round_trippers.go:442] Response Headers:
I0305 10:57:30.994792       7 round_trippers.go:445]     Content-Length: 437
I0305 10:57:30.994818       7 round_trippers.go:445]     Date: Mon, 05 Mar 2018 10:57:30 GMT
I0305 10:57:30.994832       7 round_trippers.go:445]     Content-Type: application/vnd.kubernetes.protobuf
I0305 10:57:30.994891       7 request.go:871] Response Body:
00000000  6b 38 73 00 0a 0f 0a 02  76 31 12 09 43 6f 6e 66  |k8s.....v1..Conf|
....
000001b0  00 1a 00 22 00                                    |...".|
I0305 10:57:30.995001       7 leaderelection.go:243] lock is held by ingress1-nginx-ingress-controller-9jsqp and has not yet expired
I0305 10:57:30.995029       7 leaderelection.go:180] failed to acquire lease default/ingress-controller-leader-nginx
I0305 10:57:39.097529       7 main.go:163] Exiting with 0
@dennis-bell
Copy link

dennis-bell commented Mar 15, 2018

Seeing the same problem, as above..

However I also see this message in the log:
Error: exit status 1 2018/03/15 16:08:15 [emerg] 180#180: "client_max_body_size" directive invalid value in /tmp/nginx-cfg653645632:777 nginx: [emerg] "client_max_body_size" directive invalid value in /tmp/nginx-cfg653645632:777 nginx: configuration file /tmp/nginx-cfg653645632 test failed

Tested with 0.10.2 and 0.11.0

@ElvinEfendi
Copy link
Member

ElvinEfendi commented Mar 19, 2018

I'm seeing the same issue, here are the logs with v=10

I0319 18:21:58.035389       7 round_trippers.go:442] Response Headers:
I0319 18:21:58.035393       7 round_trippers.go:445]     Audit-Id: 977bee30-c94f-470a-8aa0-f36703b552d0
I0319 18:21:58.035397       7 round_trippers.go:445]     Content-Type: application/vnd.kubernetes.protobuf;stream=watch
I0319 18:21:58.035400       7 round_trippers.go:445]     Date: Mon, 19 Mar 2018 18:21:58 GMT

<notice it was stuck here for 5s - which is livenessProve.timeoutSeconds I configured>

I0319 18:22:32.514283       7 main.go:150] Received SIGTERM, shutting down
I0319 18:22:32.514349       7 nginx.go:321] shutting down controller queues
I0319 18:22:32.514371       7 nginx.go:329] stopping NGINX process...
2018/03/19 18:22:32 [notice] 48#48: signal process started
2018/03/19 18:22:32 [error] 48#48: open() "/run/nginx.pid" failed (2: No such file or directory)
nginx: [error] open() "/run/nginx.pid" failed (2: No such file or directory)
I0319 18:22:32.587615       7 main.go:154] Error during shutdown exit status 1
I0319 18:22:32.587670       7 main.go:158] Handled quit, awaiting pod deletion
I0319 18:22:42.587856       7 main.go:161] Exiting with 1

Release: 0.10.2

@MrBlaise
Copy link

MrBlaise commented May 7, 2018

I am seeing the same issue with 0.14.0 as well.

@vic3lord
Copy link

Having the same issue with 0.15.0

@alexlokshin
Copy link

Same issue with 0.14.0, 0.15.0, but not 0.9.0.

@anilreddyv
Copy link

Having same issue with 0.9.0, 0.10.0, 0.15.0. Using K8 version 1.8.11

@kjackson87
Copy link

Having same issue with 0.14.0, K8s version 1.8.4

@keslerm
Copy link

keslerm commented Jun 20, 2018

Same issue with 0.15.0
Attached its log output

v10.log

@aledbf
Copy link
Member

aledbf commented Jun 20, 2018

@keslerm can you update your image to current master?

@keslerm
Copy link

keslerm commented Jun 20, 2018

@aledbf i built the image from master and that did the trick, looks good now.

Anything I can provide that might help?

@aledbf
Copy link
Member

aledbf commented Jun 22, 2018

Closing. Please update to 0.16.0

@aledbf aledbf closed this as completed Jun 22, 2018
@michaelkunzmann-sap
Copy link

michaelkunzmann-sap commented Aug 15, 2018

Hi! I am having the same issues with 0.24.0

$ kubectl describe pod nginx-ingress-controller-7846888d77-xlvwk
Events:
  Type     Reason                 Age                From                                               Message
  ----     ------                 ----               ----                                               -------
  Normal   Scheduled              1m                 default-scheduler                                  Successfully assigned nginx-ingress-controller-7846888d77-xlvwk to gke-qaas-test-default-pool-4b3a3303-h9xk
  Normal   SuccessfulMountVolume  1m                 kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk  MountVolume.SetUp succeeded for volume "nginx-ingress-token-lrptw"
  Normal   Pulled                 24s (x2 over 58s)  kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk  Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.17.1" already present on machine
  Normal   Created                24s (x2 over 58s)  kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk  Created container
  Normal   Started                24s (x2 over 58s)  kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk  Started container
  Normal   Killing                24s                kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk  Killing container with id docker://nginx-ingress-controller:Container failed liveness probe.. Container will be killed and recreated.
  Warning  Unhealthy              5s (x4 over 45s)   kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk  Liveness probe failed: Get http://10.12.1.12:10254/healthz: dial tcp 10.12.1.12:10254: getsockopt: connection refused
  Warning  Unhealthy              2s (x4 over 42s)   kubelet, gke-qaas-test-default-pool-4b3a3303-h9xk  Readiness probe failed: Get http://10.12.1.12:10254/healthz: dial tcp 10.12.1.12:10254: getsockopt: connection refused
$ kubectl logs nginx-ingress-controller-7846888d77-xlvwk
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.17.1
  Build:      git-12f7966
  Repository: https://github.com/kubernetes/ingress-nginx.git
-------------------------------------------------------------------------------

I0815 22:21:46.579086       5 flags.go:180] Watching for Ingress class: nginx

@aledbf
Copy link
Member

aledbf commented Dec 12, 2018

@michaelkunzmann-sap if the log ends there it means the pod cannot reach the apiserver.
You can get more details about this increasing the log level in the ingress controller deployment adding the flag --v=10

@ccctask
Copy link

ccctask commented Jul 16, 2019

I have the same problem,And just solved it

In my question, I tried to delete the ingress that references nginx ingress, then delete nginx-ingress-controller , reinstall it

Finally succeeded, no more reported unhealthy

@spursy
Copy link

spursy commented Jul 31, 2019

I havint the same issue with 0.25 version

@sreedharbukya
Copy link

I have the same problem,And just solved it

In my question, I tried to delete the ingress that references nginx ingress, then delete nginx-ingress-controller , reinstall it

Finally succeeded, no more reported unhealthy

I have a similar issue with ingress-Nginx. Do you mind sharing your configuration which is working?

@AlcipPopa
Copy link

I'm having the same issues with my minikube, with the nginx-ingress-controller 0.25 version; as subject stated, it's a 500 error code from the "describe pod" command:

Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  17m                   default-scheduler  Successfully assigned ingress-nginx/nginx-ingress-controller-79f6884cf6-qj65t to minikube
  Normal   Started    17m (x2 over 17m)     kubelet, minikube  Started container nginx-ingress-controller
  Warning  Unhealthy  16m (x6 over 17m)     kubelet, minikube  Liveness probe failed: HTTP probe failed with statuscode: 500
  Normal   Killing    16m (x2 over 17m)     kubelet, minikube  Container nginx-ingress-controller failed liveness probe, will be restarted
  Normal   Pulled     16m (x3 over 17m)     kubelet, minikube  Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1" already present on machine
  Normal   Created    16m (x3 over 17m)     kubelet, minikube  Created container nginx-ingress-controller
  Warning  Unhealthy  7m40s (x35 over 17m)  kubelet, minikube  Readiness probe failed: HTTP probe failed with statuscode: 500
  Warning  BackOff    2m43s (x44 over 12m)  kubelet, minikube  Back-off restarting failed container

The nginx-ingress-controller pod also went in status CrashLoopBackOff (I guess for too many fails):

NAME                                        READY   STATUS             RESTARTS   AGE
nginx-ingress-controller-79f6884cf6-qj65t   0/1     CrashLoopBackOff   11         28m

@jurrian
Copy link

jurrian commented Oct 18, 2019

Any progress here? We have the same problem with 0.26.1. Nginx config looks good nginx: configuration file /etc/nginx/nginx.conf test is successful. Any clues?

@jurrian
Copy link

jurrian commented Oct 21, 2019

Possibly related to #3993. Eventually we fixed this by upgrading the nodes to 1.14.7-gke.10. After that the for i in $(seq 1 200); do curl localhost:10254/healthz; done inside the ingress-nginx container was done in a few seconds, whereas before it took minutes. It could well be that the upgrade triggered a reset on the root cause, which is still unknown to me. Or maybe somehow nginx-ingress-controller:0.26.1 works better with the newer kubernetes version.

@pankajakhade
Copy link

I am also getting this issue:

Events:
Type Reason Age From Message


Normal Scheduled 13m default-scheduler Successfully assigned jenkins/nginx-ingress-controller-6d9c6d875b-8h98z to ip-192-168-150-176.ec2.internal
Normal Started 12m (x2 over 13m) kubelet, ip-192-168-150-176.ec2.internal Started container nginx-ingress-controller
Warning Unhealthy 11m (x6 over 12m) kubelet, ip-192-168-150-176.ec2.internal Liveness probe failed: HTTP probe failed with statuscode: 500
Normal Killing 11m (x2 over 12m) kubelet, ip-192-168-150-176.ec2.internal Container nginx-ingress-controller failed liveness probe, will be restarted
Warning Unhealthy 11m (x9 over 13m) kubelet, ip-192-168-150-176.ec2.internal Readiness probe failed: HTTP probe failed with statuscode: 500
Normal Pulled 11m (x3 over 13m) kubelet, ip-192-168-150-176.ec2.internal Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.30.0" already present on machine
Normal Created 11m (x3 over 13m) kubelet, ip-192-168-150-176.ec2.internal Created container nginx-ingress-controller
Warning BackOff 2m53s (x24 over 8m57s) kubelet, ip-192-168-150-176.ec2.internal Back-off restarting failed container

I am using quay.io/kubernetes-ingress-controller/nginx-ingress-controller image.
Could you please help?

@lkh-smile
Copy link

删除引用ingress的,然后删除pod,在重新安装即可

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests