Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nginx healthcheck error #3926

Closed
LoicMahieu opened this issue Mar 25, 2019 · 4 comments · Fixed by #4091
Closed

nginx healthcheck error #3926

LoicMahieu opened this issue Mar 25, 2019 · 4 comments · Fixed by #4091

Comments

@LoicMahieu
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

Bug Report

NGINX Ingress controller version: 0.23.0

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.7", GitCommit:"0c38c362511b20a098d7cd855f1314dad92c2780", GitTreeState:"clean", BuildDate:"2018-08-20T10:09:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.5-gke.10", GitCommit:"d0686b9f0adfcf759cde9f1d2d80fd52ab01d58f", GitTreeState:"clean", BuildDate:"2019-02-22T20:02:13Z", GoVersion:"go1.10.8b4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: GKE
  • OS (e.g. from /etc/os-release): Ubuntu
  • Kernel (e.g. uname -a): Linux gke-XXX-pool-2-XXX-lrdx 4.15.0-1026-gcp #27-Ubuntu SMP Thu Dec 6 18:27:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: -
  • Others: -

What happened:

ingress-nginx exited in status 0.

What you expected to happen:

I expect that ingress-nginx continues to handle requests properly ;)

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know:

Logs:

XX.XX.XX.XX - [XX.XX.XX.XX] - - [25/Mar/2019:14:28:44 +0000] "GET /XXXXX HTTP/1.0" 200 2751 "https://XXXXXX" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" 704 0.009 [XXXX-app-3000] 10.36.4.47:3000 2751 0.008 200 762c3ddac86f8310df611a90387979ae
XX.XX.XX.XX - [XX.XX.XX.XX] - - [25/Mar/2019:14:28:45 +0000] "GET /XXXXX HTTP/1.0" 200 3259 "https://XXXXXX" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" 837 0.039 [XXXX-app-3000] 10.36.1.23:3000 3259 0.040 200 371238ba0ac8e3f412fff40f28a52a3d
XX.XX.XX.XX - [XX.XX.XX.XX] - - [25/Mar/2019:14:28:45 +0000] "GET /XXXXXXX HTTP/1.0" 200 7295 "-" "rogerbot/1.2 (https://moz.com/help/guides/moz-procedures/what-is-rogerbot, [email protected])" 352 3.549 [XXXXXX-app-80] 10.36.5.25:80 7318 3.548 200 011ddec3134233d8d10458da79d7748d
2a02:a03f:3cac:6d00:55c1:ec8:c9b7:8022 - [2a02:a03f:3cac:6d00:55c1:ec8:c9b7:8022] - - [25/Mar/2019:14:28:46 +0000] "GET /XXXXXX HTTP/1.1" 304 0 "https://XXXXX/XXXXXX" "Mozilla/5.0 (iPhone; CPU iPhone OS 12_1_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1" 810 0.122 [XXXXX-app-3000] 10.36.5.5:3000 0 0.124 304 77f0f2348fd7c5f064cceef340b25f35
W0325 14:28:47.851014       5 controller.go:846] Service "XXXX-app/XXXX" does not have any active Endpoint.
[25/Mar/2019:14:28:47 +0000]TCP200000.001
XX.XX.XX.XX - [XX.XX.XX.XX] - - [25/Mar/2019:14:28:58 +0000] "GET /XXXXXXX HTTP/1.0" 200 140 "https://XXXXXX" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" 1056 0.261 [XXXX-app-3000] 10.36.1.23:3000 140 0.260 200 691801e258fa39e5d8d257f08084e573
2019/03/25 14:29:14 [error] 39#39: *373 upstream prematurely closed connection while reading response header from upstream, client: XX.XX.XX.XX, server: xxxx-apps.xxxx.be, request: "GET /XXX HTTP/1.0", upstream: "http://10.36.5.25:80/fr/XXXXXX", host: "XXXX-apps.xxxx.be"
E0325 14:29:20.274025       5 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
E0325 14:29:22.740989       5 checker.go:52] healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: read unix @->/tmp/nginx-status-server.sock: i/o timeout
I0325 14:29:28.864169       5 main.go:167] Received SIGTERM, shutting down
I0325 14:29:28.864457       5 nginx.go:358] Shutting down controller queues
I0325 14:29:28.864489       5 status.go:200] updating status of Ingress rules (remove)
E0325 14:29:36.457486       5 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
E0325 14:29:36.558370       5 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
E0325 14:29:39.442029       5 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
E0325 14:29:40.674022       5 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
I0325 14:29:43.394509       5 status.go:210] leaving status update for next leader (3)
I0325 14:29:43.394560       5 nginx.go:366] Stopping NGINX process
E0325 14:29:47.464789       5 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
XX.XX.XX.XX - [XX.XX.XX.XX] - - [25/Mar/2019:14:29:14 +0000] "GET /XXXXXXX HTTP/1.0" 200 2179 "https://XXXXXX" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" 1056 11.781 [XXXX-app-3000] 10.36.4.47:3000 2179 11.781 200 5507f9b466201726133b917796cd2720
XX.XX.XX.XX - [XX.XX.XX.XX] - - [25/Mar/2019:14:29:14 +0000] "GET / HTTP/1.0" 200 3802 "-" "GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)" 398 9.005 [XXXX-app-3000] 10.36.4.26:3000 7898 9.009 200 8810badbc3433c1240ce2b03b224b474
XX.XX.XX.XX - [XX.XX.XX.XX] - - [25/Mar/2019:14:29:14 +0000] "GET /XXXXXXXXX HTTP/1.0" 200 2179 "https://XXXXXX" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134" 1056 9.005 [XXXX-app-3000] 10.36.4.47:3000 2179 9.009 200 719e3c9baf427d954973b10f4f1a5ada
XX.XX.XX.XX - [XX.XX.XX.XX] - - [25/Mar/2019:14:29:50 +0000] "GET /wp-login.php HTTP/1.0" 404 292 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1" 458 0.073 [XXXX-ZZZZ-80] 10.36.5.39:80 292 0.072 404 6295fd1ab9ceaaf83b3b6b500ac0053e
XX.XX.XX.XX - [XX.XX.XX.XX] - - [25/Mar/2019:14:29:50 +0000] "GET / HTTP/1.0" 302 52 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 983 0.128 [XXXXX-3000] 10.36.5.167:3000 52 0.128 302 1af8df9449d3d27cb3ee1d18c5f6e8de
2019/03/25 14:29:50 [notice] 106#106: ModSecurity-nginx v1.0.0
XX.XX.XX.XX - [XX.XX.XX.XX] - - [25/Mar/2019:14:29:50 +0000] "GET /mXXXX HTTP/1.1" 304 0 "https://XXX.be/fr" "Mozilla/5.0 (Linux; Android 5.1; HUAWEI LYO-L01) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Mobile Safari/537.36" 666 35.493 [xxxxx-3000] 10.36.5.5:3000 0 35.493 304 d056acc5b9468c5ab49d84c6acad4047
XX.XX.XX.XX - [XX.XX.XX.XX] - - [25/Mar/2019:14:29:50 +0000] "GET /mXXXX HTTP/1.1" 304 0 "https://XXX.be/fr/470/chiot?gclid=EAIaIQobChMIj4eMj8Cd4QIVDvlRCh2%5FcALREAAYASAAEgJ83PD%5FBwE" "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0_3 like Mac OS X) AppleWebKit/604.1.34 (KHTML, like Gecko) GSA/69.1.238102067 Mobile/15A432 Safari/604.1" 718 0.108 [xxxxx-3000] 10.36.4.37:3000 0 0.108 304 afe0a5cf98fdd27b842e7ff7cded6056
XX.XX.XX.XX - [XX.XX.XX.XX] - - [25/Mar/2019:14:29:50 +0000] "GET /siXXXXX HTTP/1.0" 404 103520 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 621 0.250 [XXXX-app-4000] 10.36.4.48:4000 103520 0.252 404 8703e9273df8fefe2015cccede1d4fd3
2019/03/25 14:29:50 [notice] 106#106: signal process started
XX.XX.XX.XX - [XX.XX.XX.XX] - - [25/Mar/2019:14:29:54 +0000] "GET /XXXX HTTP/1.0" 499 0 "-" "rogerbot/1.2 (https://moz.com/help/guides/moz-procedures/what-is-rogerbot, [email protected])" 353 46.857 [XXXXX-80] 10.36.5.25:80, 10.36.5.25:80 0, 0 7.181, 39.673 502, - 0db38bca9e11989f2df60bd038f2e8a6
E0325 14:29:54.842349       5 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
XX.XX.XX.XX - [XX.XX.XX.XX] - - [25/Mar/2019:14:29:55 +0000] "GET /XXXX HTTP/1.0" 200 6819 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 691 5.066 [XXXXX-80] 10.36.5.25:80 6819 5.068 200 17cb3a30e12d20a8760d8d470070c768
XX.XX.XX.XX - [XX.XX.XX.XX] - - [25/Mar/2019:14:29:55 +0000] "GET /XXXX HTTP/1.0" 200 7537 "-" "rogerbot/1.2 (https://moz.com/help/guides/moz-procedures/what-is-rogerbot, [email protected])" 353 5.114 [XXXXX-80] 10.36.5.25:80 7537 5.116 200 acedc420d107f4a3152ae2fa0f32d1a5
XX.XX.XX.XX - [XX.XX.XX.XX] - - [25/Mar/2019:14:29:56 +0000] "GET /XXXX HTTP/1.0" 200 6853 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 709 6.490 [XXXXX-80] 10.36.5.25:80 6876 6.492 200 29baf035c03670b6b92949cb2c6e3d78
I0325 14:29:57.818826       5 nginx.go:379] NGINX process has stopped
I0325 14:29:57.818865       5 main.go:175] Handled quit, awaiting Pod deletion
E0325 14:30:04.840498       5 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I0325 14:30:07.826007       5 main.go:178] Exiting with 0
@aledbf
Copy link
Member

aledbf commented Mar 25, 2019

@LoicMahieu from the log it seems you changes the liveness probe intervals https://github.com/kubernetes/ingress-nginx/blob/master/deploy/mandatory.yaml#L251-L254
Please check the values.

@mcasperson
Copy link

mcasperson commented Apr 3, 2019

I've noticed something similar performing large file uploads with nginx 0.23.0 (from the helm chart 1.4.0).

The health check fails, and the connection is closed making large uploads impossible. The logs below show the failure.

2019/04/03 03:24:46 [warn] 123#123: *62681 a client request body is buffered to a temporary file /tmp/client-body/0000000001, client: 10.1.0.4, server: example.org, request: "POST /api/Spaces-1/packages/raw?replace=False HTTP/1.1", host: "example.org"
E0403 03:37:12.784000       6 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
E0403 03:37:16.097232       6 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
E0403 03:37:22.783907       6 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
W0403 03:37:23.856238       6 controller.go:1108] Error getting SSL certificate "monitoring/tls-octopus-secret": local SSL certificate monitoring/tls-octopus-secret was not found. Using default certificate
E0403 03:37:26.097290       6 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
I0403 03:37:27.406613       6 main.go:167] Received SIGTERM, shutting down
I0403 03:37:27.406642       6 nginx.go:358] Shutting down controller queues
I0403 03:37:27.406662       6 status.go:200] updating status of Ingress rules (remove)
I0403 03:37:27.512300       6 status.go:210] leaving status update for next leader (4)
I0403 03:37:27.512321       6 nginx.go:366] Stopping NGINX process
2019/04/03 03:37:27 [notice] 156#156: ModSecurity-nginx v1.0.0
2019/04/03 03:37:27 [notice] 156#156: signal process started
E0403 03:37:32.812238       6 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
E0403 03:37:36.111928       6 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
E0403 03:37:42.812240       6 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
E0403 03:37:52.812299       6 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
E0403 03:37:56.312278 6 checker.go:52] healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: dial unix /tmp/nginx-status-server.sock: connect: connection refused

I have made no changes to the liveness check. This is from kubectl pod describe:

Liveness:             http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3

@LoicMahieu
Copy link
Contributor Author

Hi!

Thanks @aledbf . But liveness probe is the same:

"livenessProbe": {
  "httpGet": {
    "path": "/healthz",
    "port": 10254,
    "scheme": "HTTP"
  },
  "initialDelaySeconds": 10,
  "timeoutSeconds": 1,
  "periodSeconds": 10,
  "successThreshold": 1,
  "failureThreshold": 3
},
"readinessProbe": {
  "httpGet": {
    "path": "/healthz",
    "port": 10254,
    "scheme": "HTTP"
  },
  "initialDelaySeconds": 10,
  "timeoutSeconds": 1,
  "periodSeconds": 10,
  "successThreshold": 1,
  "failureThreshold": 3
}

(Installation was done with helm stable chart.)

Indeed, the restart of the pod is certainly due to the liveness probe.

@aledbf
Copy link
Member

aledbf commented Apr 9, 2019

(Installation was done with helm stable chart.)

The chart is not maintained by this project. You should adjust the timeouts using the link I posted in my previous comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants