Version 10.x liveness and readiness fail with nginx-ingress-controller failed: reason withheld #1989

JorritSalverda · 2018-01-26T13:31:29Z

BUG REPORT

Version 0.9.0 works fine in my version 1.8.6-gke.0 Kubernetes Engine clusters. However when upgrading to 0.10.0, 0.10.1 or 0.10.2 the liveness and readiness probes fail. Curling the healthz endpoints throws the following error:

$ kubectl exec nginx-ingress-controller-7985b8c588-7755s -n ingress-nginx -- curl -v http://localhost:10254/healthz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 10254 (#0)
> GET /healthz HTTP/1.1
> Host: localhost:10254
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Fri, 26 Jan 2018 13:18:07 GMT
< Content-Length: 84
<
{ [84 bytes data]
* Curl_http_done: called premature == 0
100    84  100    84    0     0  15029      0 --:--:-- --:--:-- --:--:-- 16800
* Connection #0 to host localhost left intact
[+]ping ok
[-]nginx-ingress-controller failed: reason withheld
healthz check failed

The logs from the nginx ingress controller don't show anything out of the ordinary. Just some regular lines and then after failing the liveness probe some errors during shutdown.

nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:26:25.357693       7 backend_ssl.go:68] adding secret ***/*** to the local store
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:26:25.358440       7 backend_ssl.go:68] adding secret ***/*** to the local store
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:26:25.359204       7 backend_ssl.go:68] adding secret ***/*** to the local store
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:28:42.815170       7 main.go:150] Received SIGTERM, shutting down
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:28:42.815382       7 nginx.go:321] shutting down controller queues
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:28:42.815421       7 nginx.go:329] stopping NGINX process...
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller 2018/01/26 13:28:42 [notice] 34#34: signal process started
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller 2018/01/26 13:28:42 [error] 34#34: open() "/run/nginx.pid" failed (2: No such file or directory)
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller nginx: [error] open() "/run/nginx.pid" failed (2: No such file or directory)
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:28:42.823090       7 main.go:154] Error during shutdown exit status 1
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:28:42.823122       7 main.go:158] Handled quit, awaiting pod deletion
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:28:52.823284       7 main.go:161] Exiting with 1

NGINX Ingress controller version:

0.10.2

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-16T03:15:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.6-gke.0", GitCommit:"ee9a97661f14ee0b1ca31d6edd30480c89347c79", GitTreeState:"clean", BuildDate:"2018-01-05T03:36:42Z", GoVersion:"go1.8.3b4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration:

Google Cloud Platform Kubernetes Engine

OS (e.g. from /etc/os-release):

BUILD_ID=10032.71.0
NAME="Container-Optimized OS"
KERNEL_COMMIT_ID=c4c6234ae4f384ce00819c41b48ca8f6f1fa3ba8
GOOGLE_CRASH_ID=Lakitu
VERSION_ID=63
BUG_REPORT_URL=https://crbug.com/new
PRETTY_NAME="Container-Optimized OS from Google"
VERSION=63
GOOGLE_METRICS_PRODUCT_ID=26
HOME_URL="https://cloud.google.com/compute/docs/containers/vm-image/"
ID=cos

Kernel (e.g. uname -a):

Linux gke-development-euro-auto-scaling-pre-33198d65-13cc 4.4.86+ #1 SMP Thu Dec 7 20:11:11 PST 2017 x86_64 Intel(R
) Xeon(R) CPU @ 2.50GHz GenuineIntel GNU/Linux

What happened:

Deploying version 0.10.0 fails to get the pods into a readiness state. They get restarted whenever the liveness probe duration is passed.

What you expected to happen:

Nginx ingress to pass liveness and readiness checks.

How to reproduce it (as minimally and precisely as possible):

No idea, I have a vanilla deployment following the steps at https://github.com/kubernetes/ingress-nginx/tree/nginx-0.10.2/deploy

Anything else we need to know:

The only change versus the deployment scripts is that I have the ingress service set to externalTrafficPolicy: Cluster, added cloudflare's source ips as loadBalancerSourceRanges and have the below settings in the nginx ingress configmap:

whitelist-source-range:  "****"
forwarded-for-header: "X-Forwarded-For"
# trust internal ranges and cloudflare to provide client ip (https://www.cloudflare.com/ips-v4)
proxy-real-ip-cidr: "10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,${CLOUDFLARE_IP_RANGES}"

The text was updated successfully, but these errors were encountered:

aledbf · 2018-02-21T21:36:23Z

@JorritSalverda please add the flag --v=6 to the deployment. That will increase the verbosity of the logs and you will see the reason in the pod logs.

JorritSalverda · 2018-02-27T07:37:22Z

When upgrading to version 0.11.0 it's no longer an issue. Either because there was an issue that's now fixed or because I increased resources in the meantime.

JorritSalverda closed this as completed Feb 27, 2018

max-rocket-internet mentioned this issue Mar 5, 2018

Readiness and Liveness probe failed: HTTP probe failed with statuscode: 500 #2171

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 10.x liveness and readiness fail with nginx-ingress-controller failed: reason withheld #1989

Version 10.x liveness and readiness fail with nginx-ingress-controller failed: reason withheld #1989

JorritSalverda commented Jan 26, 2018

aledbf commented Feb 21, 2018

JorritSalverda commented Feb 27, 2018

Version 10.x liveness and readiness fail with nginx-ingress-controller failed: reason withheld #1989

Version 10.x liveness and readiness fail with nginx-ingress-controller failed: reason withheld #1989

Comments

JorritSalverda commented Jan 26, 2018

aledbf commented Feb 21, 2018

JorritSalverda commented Feb 27, 2018