ingress-nginx crashes on reload of configuration #4284

ac-hibbert · 2019-07-08T17:32:56Z

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.):

What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.):

This is a follow on from #4041, which has been closed due to PR #4091. I have tested this with the latest version 0.25.0 and it still occurs.

Also found some related issues previously:-

#3459
#3457
#3737
#3684

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

NGINX Ingress controller version:
0.25.0

Kubernetes version (use kubectl version):

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"66049e3b21efe110454d67df4fa62b08ea79a19b", GitTreeState:"clean", BuildDate:"2019-05-16T18:55:03Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.7-eks-c57ff8", GitCommit:"c57ff8e35590932c652433fab07988da79265d5b", GitTreeState:"clean", BuildDate:"2019-06-07T20:43:03Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: AWS EKS
OS (e.g. from /etc/os-release): AL2
Kernel (e.g. uname -a):
Install tools:
Others:

What happened:

Upon reloading of configuration nginx crashes

I0708 17:23:27.231397       8 controller.go:133] Configuration changes detected, backend reload required.
I0708 17:23:27.231983       8 controller.go:133] Configuration changes detected, backend reload required.
W0708 17:23:27.231633       7 controller.go:309] Error getting Service "jenkins-andy/jenkins-jnlp": no object matching key "jenkins-andy/jenkins-jnlp" in local store
I0708 17:23:27.231695       7 controller.go:133] Configuration changes detected, backend reload required.
I0708 17:23:27.309345       8 controller.go:149] Backend successfully reloaded.
[08/Jul/2019:17:23:27 +0000]TCP200000.000
I0708 17:23:27.310684       7 controller.go:149] Backend successfully reloaded.
[08/Jul/2019:17:23:27 +0000]TCP200000.000
I0708 17:23:27.337758       8 controller.go:149] Backend successfully reloaded.
[08/Jul/2019:17:23:27 +0000]TCP200000.000
I0708 17:23:27.332694       8 controller.go:149] Backend successfully reloaded.
[08/Jul/2019:17:23:27 +0000]TCP200000.000
I0708 17:23:27.465950       7 event.go:258] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"tcp-services", UID:"86ae9cfe-458c-11e9-9bac-0afa1cd96c8a", APIVersion:"v1", ResourceVersion:"30052709", FieldPath:""}): type: 'Normal' reason: 'UPDATE' ConfigMap ingress-nginx/tcp-services
I0708 17:23:27.465645       8 event.go:258] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"tcp-services", UID:"86ae9cfe-458c-11e9-9bac-0afa1cd96c8a", APIVersion:"v1", ResourceVersion:"30052709", FieldPath:""}): type: 'Normal' reason: 'UPDATE' ConfigMap ingress-nginx/tcp-services
I0708 17:23:27.465569       8 event.go:258] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"tcp-services", UID:"86ae9cfe-458c-11e9-9bac-0afa1cd96c8a", APIVersion:"v1", ResourceVersion:"30052709", FieldPath:""}): type: 'Normal' reason: 'UPDATE' ConfigMap ingress-nginx/tcp-services
I0708 17:23:27.466535       8 event.go:258] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"tcp-services", UID:"86ae9cfe-458c-11e9-9bac-0afa1cd96c8a", APIVersion:"v1", ResourceVersion:"30052709", FieldPath:""}): type: 'Normal' reason: 'UPDATE' ConfigMap ingress-nginx/tcp-services
I0708 17:23:28.919548       7 main.go:154] Received SIGTERM, shutting down
I0708 17:23:28.919601       7 nginx.go:402] Shutting down controller queues
I0708 17:23:28.919617       7 status.go:117] updating status of Ingress rules (remove)
I0708 17:23:28.943567       7 nginx.go:418] Stopping NGINX process
2019/07/08 17:23:28 [notice] 455#455: signal process started
E0708 17:23:34.860702       7 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I0708 17:23:34.976083       7 nginx.go:431] NGINX process has stopped
I0708 17:23:34.976100       7 main.go:162] Handled quit, awaiting Pod deletion
E0708 17:23:38.047482       7 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I0708 17:23:38.252232       8 main.go:154] Received SIGTERM, shutting down
I0708 17:23:38.252264       8 nginx.go:402] Shutting down controller queues
I0708 17:23:38.252282       8 status.go:117] updating status of Ingress rules (remove)
I0708 17:23:38.263363       8 nginx.go:418] Stopping NGINX process
2019/07/08 17:23:38 [notice] 457#457: signal process started
E0708 17:23:38.912718       8 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
E0708 17:23:39.764432       8 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I0708 17:23:39.788563       8 main.go:154] Received SIGTERM, shutting down
I0708 17:23:39.788594       8 nginx.go:402] Shutting down controller queues
I0708 17:23:39.788607       8 status.go:117] updating status of Ingress rules (remove)
I0708 17:23:39.807639       8 nginx.go:418] Stopping NGINX process
2019/07/08 17:23:39 [notice] 457#457: signal process started
I0708 17:23:40.304388       8 nginx.go:431] NGINX process has stopped
I0708 17:23:40.304406       8 main.go:162] Handled quit, awaiting Pod deletion
W0708 17:23:40.564490       8 controller.go:1129] SSL certificate for server "jenkins-marmccor.dev-cdaas.umbrella.com" is about to expire (2019-06-04 19:04:05 +0000 UTC)
E0708 17:23:40.574011       8 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I0708 17:23:42.850962       8 nginx.go:431] NGINX process has stopped
I0708 17:23:42.850979       8 main.go:162] Handled quit, awaiting Pod deletion
W0708 17:23:43.897846       8 controller.go:1129] SSL certificate for server "jenkins-marmccor.dev-cdaas.umbrella.com" is about to expire (2019-06-04 19:04:05 +0000 UTC)
E0708 17:23:44.860781       7 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I0708 17:23:44.976238       7 main.go:165] Exiting with 0
[08/Jul/2019:17:23:45 +0000]TCP2002352330.001
I0708 17:23:46.437513       8 main.go:154] Received SIGTERM, shutting down
I0708 17:23:46.437545       8 nginx.go:402] Shutting down controller queues
I0708 17:23:46.437571       8 status.go:117] updating status of Ingress rules (remove)
I0708 17:23:46.464289       8 nginx.go:418] Stopping NGINX process
2019/07/08 17:23:46 [notice] 457#457: signal process started
I0708 17:23:48.506632       8 nginx.go:431] NGINX process has stopped
I0708 17:23:48.506652       8 main.go:162] Handled quit, awaiting Pod deletion
E0708 17:23:48.572382       8 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
E0708 17:23:48.910265       8 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
E0708 17:23:49.764411       8 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
rpc error: code = Unknown desc = Error: No such container: 691b0eaffc35f7aac64854e9ed0330dac09c1bcdc370f76fdbb3622f12cfa5f8I0708 17:23:50.304541       8 main.go:165] Exiting with 0
E0708 17:23:50.573989       8 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I0708 17:23:52.851102       8 main.go:165] Exiting with 0
E0708 17:23:53.405254       8 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
rpc error: code = Unknown desc = Error: No such container: 780ee4713a3ec3f59eb26a0986e098c828925424e8549fde145558196affbe38E0708 17:23:56.164227       8 checker.go:41] healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
rpc error: code = Unknown desc = Error: No such container: 3f23efe22e028944d66691dc468b1af2bf9f49f2a24a67d5967984f0c8b92fecI0708 17:23:58.506779       8 main.go:165] Exiting with 0
rpc error: code = Unknown desc = Error: No such container: a8de6da4abda8a9fc42c098fa57eb310f4d9088bab2f37ff7b132445caa74532

What you expected to happen:

Pod stays up when being configured

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know:

The text was updated successfully, but these errors were encountered:

aledbf · 2019-07-08T17:35:19Z

main.go:154] Received SIGTERM, shutting down

This means the pod is not passing the probes (readiness/liveness)

ac-hibbert · 2019-07-08T17:44:25Z

Specifically here I am removing a namespace (along with pods, ingress etc) and modifying tcp-services, nginx-ingress-controller (deployment) and ingress-nginx (service) to remove the tcp ports. Which triggers the reload

My setup is the same as the mandatory.yaml etc from this github repo. To me it seems the healthchecks are failing because the pod has been terminated + the pods have been terminated due to the reload.

I was under the impression that this was fixed

aledbf · 2019-07-08T18:15:40Z

nginx-ingress-controller (deployment) and

If you change the deployment the running pod will be replaced. Why are you doing this? (this is not related to ingress-nginx but any deployment in k8s)

ac-hibbert · 2019-07-08T18:24:38Z

Ah good point. I patch the deployment to delete the port of the service I have removed

aledbf · 2019-07-08T18:56:21Z

@Hibbert can we close this issue?

ac-hibbert · 2019-07-08T19:10:39Z

That bit I understand. Although it is not just when the deployment is reconfigured that I have the problem. It is also when I delete the apps namespace. I am using ingress-nginx to route JNLP traffic through to the jenkins master when I am running this. The reload seems to cause connectivity problem:-

Cannot contact i-08cb100f0b67f1286: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection from ip-10-207-56-171.ec2.internal/10.207.56.171:51970 failed. The channel is closing down or has closed down

aledbf · 2019-07-08T19:14:00Z

It is also when I delete the apps namespace. I am using ingress-nginx to route JNLP traffic through to the jenkins master when I am running this.

That's expected. You are deleting the app being exposed. There is no pod running.

The reload seems to cause connectivity problem:-

Which reload?

aledbf · 2019-09-03T00:32:59Z

Closing. This is fixed in master #4487
If you want to test the fix, you can use the image quay.io/kubernetes-ingress-controller/nginx-ingress-controller:dev

aledbf closed this as completed Sep 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ingress-nginx crashes on reload of configuration #4284

ingress-nginx crashes on reload of configuration #4284

ac-hibbert commented Jul 8, 2019

aledbf commented Jul 8, 2019

ac-hibbert commented Jul 8, 2019

aledbf commented Jul 8, 2019 •

edited

Loading

ac-hibbert commented Jul 8, 2019

aledbf commented Jul 8, 2019

ac-hibbert commented Jul 8, 2019

aledbf commented Jul 8, 2019

aledbf commented Sep 3, 2019

ingress-nginx crashes on reload of configuration #4284

ingress-nginx crashes on reload of configuration #4284

Comments

ac-hibbert commented Jul 8, 2019

#3459 #3457 #3737 #3684

aledbf commented Jul 8, 2019

ac-hibbert commented Jul 8, 2019

aledbf commented Jul 8, 2019 • edited Loading

ac-hibbert commented Jul 8, 2019

aledbf commented Jul 8, 2019

ac-hibbert commented Jul 8, 2019

aledbf commented Jul 8, 2019

aledbf commented Sep 3, 2019

#3459
#3457
#3737
#3684

aledbf commented Jul 8, 2019 •

edited

Loading