Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 10.x liveness and readiness fail with nginx-ingress-controller failed: reason withheld #1989

Closed
JorritSalverda opened this issue Jan 26, 2018 · 2 comments

Comments

@JorritSalverda
Copy link
Contributor

BUG REPORT

Version 0.9.0 works fine in my version 1.8.6-gke.0 Kubernetes Engine clusters. However when upgrading to 0.10.0, 0.10.1 or 0.10.2 the liveness and readiness probes fail. Curling the healthz endpoints throws the following error:

$ kubectl exec nginx-ingress-controller-7985b8c588-7755s -n ingress-nginx -- curl -v http://localhost:10254/healthz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 10254 (#0)
> GET /healthz HTTP/1.1
> Host: localhost:10254
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Fri, 26 Jan 2018 13:18:07 GMT
< Content-Length: 84
<
{ [84 bytes data]
* Curl_http_done: called premature == 0
100    84  100    84    0     0  15029      0 --:--:-- --:--:-- --:--:-- 16800
* Connection #0 to host localhost left intact
[+]ping ok
[-]nginx-ingress-controller failed: reason withheld
healthz check failed

The logs from the nginx ingress controller don't show anything out of the ordinary. Just some regular lines and then after failing the liveness probe some errors during shutdown.

nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:26:25.357693       7 backend_ssl.go:68] adding secret ***/*** to the local store
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:26:25.358440       7 backend_ssl.go:68] adding secret ***/*** to the local store
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:26:25.359204       7 backend_ssl.go:68] adding secret ***/*** to the local store
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:28:42.815170       7 main.go:150] Received SIGTERM, shutting down
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:28:42.815382       7 nginx.go:321] shutting down controller queues
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:28:42.815421       7 nginx.go:329] stopping NGINX process...
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller 2018/01/26 13:28:42 [notice] 34#34: signal process started
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller 2018/01/26 13:28:42 [error] 34#34: open() "/run/nginx.pid" failed (2: No such file or directory)
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller nginx: [error] open() "/run/nginx.pid" failed (2: No such file or directory)
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:28:42.823090       7 main.go:154] Error during shutdown exit status 1
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:28:42.823122       7 main.go:158] Handled quit, awaiting pod deletion
nginx-ingress-controller-7985b8c588-7755s nginx-ingress-controller I0126 13:28:52.823284       7 main.go:161] Exiting with 1

NGINX Ingress controller version:

0.10.2

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-16T03:15:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.6-gke.0", GitCommit:"ee9a97661f14ee0b1ca31d6edd30480c89347c79", GitTreeState:"clean", BuildDate:"2018-01-05T03:36:42Z", GoVersion:"go1.8.3b4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration:
Google Cloud Platform Kubernetes Engine
  • OS (e.g. from /etc/os-release):
BUILD_ID=10032.71.0
NAME="Container-Optimized OS"
KERNEL_COMMIT_ID=c4c6234ae4f384ce00819c41b48ca8f6f1fa3ba8
GOOGLE_CRASH_ID=Lakitu
VERSION_ID=63
BUG_REPORT_URL=https://crbug.com/new
PRETTY_NAME="Container-Optimized OS from Google"
VERSION=63
GOOGLE_METRICS_PRODUCT_ID=26
HOME_URL="https://cloud.google.com/compute/docs/containers/vm-image/"
ID=cos
  • Kernel (e.g. uname -a):
Linux gke-development-euro-auto-scaling-pre-33198d65-13cc 4.4.86+ #1 SMP Thu Dec 7 20:11:11 PST 2017 x86_64 Intel(R
) Xeon(R) CPU @ 2.50GHz GenuineIntel GNU/Linux

What happened:

Deploying version 0.10.0 fails to get the pods into a readiness state. They get restarted whenever the liveness probe duration is passed.

What you expected to happen:

Nginx ingress to pass liveness and readiness checks.

How to reproduce it (as minimally and precisely as possible):

No idea, I have a vanilla deployment following the steps at https://github.com/kubernetes/ingress-nginx/tree/nginx-0.10.2/deploy

Anything else we need to know:

The only change versus the deployment scripts is that I have the ingress service set to externalTrafficPolicy: Cluster, added cloudflare's source ips as loadBalancerSourceRanges and have the below settings in the nginx ingress configmap:

whitelist-source-range:  "****"
forwarded-for-header: "X-Forwarded-For"
# trust internal ranges and cloudflare to provide client ip (https://www.cloudflare.com/ips-v4)
proxy-real-ip-cidr: "10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,${CLOUDFLARE_IP_RANGES}"
@aledbf
Copy link
Member

aledbf commented Feb 21, 2018

@JorritSalverda please add the flag --v=6 to the deployment. That will increase the verbosity of the logs and you will see the reason in the pod logs.

@JorritSalverda
Copy link
Contributor Author

When upgrading to version 0.11.0 it's no longer an issue. Either because there was an issue that's now fixed or because I increased resources in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants