Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

controllers shutting down #937

Closed
zerkms opened this issue Jun 17, 2020 · 5 comments
Closed

controllers shutting down #937

zerkms opened this issue Jun 17, 2020 · 5 comments

Comments

@zerkms
Copy link
Contributor

zerkms commented Jun 17, 2020

This is for v1.0.0-rc4 and I'm going to upgrade to rc5 now, but this is what I observed just several minutes ago

It's not obvious why it started shutting down the controllers though.

time="2020-06-17T00:20:37Z" level=info msg="Peer Up" Key=10.50.9.1 State=BGP_FSM_OPENCONFIRM Topic=Peer
I0617 00:20:41.116904       1 network_routes_controller.go:414] Cleaning up old routes if there are any
I0617 00:20:41.119667       1 network_routes_controller.go:428] Cleaning up if there is any existing tunnel interface for the node
I0617 00:20:42.523367       1 network_routes_controller.go:414] Cleaning up old routes if there are any
I0617 00:20:42.523901       1 network_routes_controller.go:428] Cleaning up if there is any existing tunnel interface for the node
I0617 21:13:05.319743       1 kube-router.go:187] Shutting down the controllers
I0617 21:13:05.321534       1 health_controller.go:161] Shutting down health controller
I0617 21:13:05.321977       1 network_policy_controller.go:176] Shutting down network policies controller
I0617 21:13:05.413728       1 health_controller.go:178] Shutting down HealthController RunCheck
I0617 21:13:05.611899       1 network_routes_controller.go:309] Shutting down network routes controller
E0617 21:13:05.614464       1 health_controller.go:150] Health controller error: http: Server closed
I0617 21:13:08.111812       1 network_services_controller.go:368] Shutting down network services controller
E0617 21:13:08.125269       1 runtime.go:66] Observed a panic: "send on closed channel" (send on closed channel)
/go/src/github.com/cloudnativelabs/kube-router/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/cloudnativelabs/kube-router/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/cloudnativelabs/kube-router/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/usr/local/go/src/runtime/chan.go:185
/usr/local/go/src/runtime/chan.go:609
/go/src/github.com/cloudnativelabs/kube-router/pkg/controllers/proxy/network_services_controller.go:415
/go/src/github.com/cloudnativelabs/kube-router/pkg/controllers/proxy/network_services_controller.go:795
/go/src/github.com/cloudnativelabs/kube-router/pkg/controllers/proxy/network_services_controller.go:2048
/go/src/github.com/cloudnativelabs/kube-router/pkg/controllers/proxy/network_services_controller.go:2020
/go/src/github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/tools/cache/controller.go:202
/go/src/github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/tools/cache/shared_informer.go:540
/go/src/github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/tools/cache/shared_informer.go:383
/go/src/github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/tools/cache/shared_informer.go:383
/go/src/github.com/cloudnativelabs/kube-router/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71
/usr/local/go/src/runtime/asm_amd64.s:2361
panic: send on closed channel [recovered]
	panic: send on closed channel
goroutine 506 [running]:
github.com/cloudnativelabs/kube-router/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/cloudnativelabs/kube-router/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x16a5bc0, 0x1abb840)
	/usr/local/go/src/runtime/panic.go:502 +0x229
github.com/cloudnativelabs/kube-router/pkg/controllers/proxy.(*NetworkServicesController).sync(0xc420fe17c0, 0x16)
	/go/src/github.com/cloudnativelabs/kube-router/pkg/controllers/proxy/network_services_controller.go:415 +0x4a
github.com/cloudnativelabs/kube-router/pkg/controllers/proxy.(*NetworkServicesController).OnEndpointsUpdate(0xc420fe17c0, 0xc420e2f0e0)
	/go/src/github.com/cloudnativelabs/kube-router/pkg/controllers/proxy/network_services_controller.go:795 +0x30b
github.com/cloudnativelabs/kube-router/pkg/controllers/proxy.(*NetworkServicesController).handleEndpointsUpdate(0xc420fe17c0, 0x18f6da0, 0xc4201b6fc0, 0x18f6da0, 0xc420e2f0e0)
	/go/src/github.com/cloudnativelabs/kube-router/pkg/controllers/proxy/network_services_controller.go:2048 +0x5a
github.com/cloudnativelabs/kube-router/pkg/controllers/proxy.(*NetworkServicesController).newEndpointsEventHandler.func2(0x18f6da0, 0xc4201b6fc0, 0x18f6da0, 0xc420e2f0e0)
	/go/src/github.com/cloudnativelabs/kube-router/pkg/controllers/proxy/network_services_controller.go:2020 +0x52
github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(0xc4203aaed0, 0xc4203aaee0, 0xc4203aaef0, 0x18f6da0, 0xc4201b6fc0, 0x18f6da0, 0xc420e2f0e0)
	/go/src/github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/tools/cache/controller.go:202 +0x5d
github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/tools/cache.(*processorListener).run(0xc420d285a0)
	/go/src/github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/tools/cache/shared_informer.go:540 +0x1b7
github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/tools/cache.(*processorListener).(github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/tools/cache.run)-fm()
	/go/src/github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/tools/cache/shared_informer.go:383 +0x2a
github.com/cloudnativelabs/kube-router/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc4205779c0, 0xc420343240)
	/go/src/github.com/cloudnativelabs/kube-router/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by github.com/cloudnativelabs/kube-router/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/go/src/github.com/cloudnativelabs/kube-router/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62
@murali-reddy
Copy link
Member

It's not obvious why it started shutting down the controllers though.

@zerkms Please enable verbose logging and share the logs. Also were there any errros before the logs you shared above? Do you see this issue on all the kube-router pods every time or its just one of the incident?

Panic you see in the logs is after controllers are shutdown. I created #939 to fix the panic.

@murali-reddy murali-reddy changed the title Sending to a closed channel controllers shutting down Jun 18, 2020
@zerkms
Copy link
Contributor Author

zerkms commented Jun 18, 2020

"Please enable verbose logging and share the logs" --- I will.

"Also were there any errros before the logs you shared above?" --- no, any messages before were more than 10 minutes apart, I included all the logs in close time proximity.

"Do you see this issue on all the kube-router pods every time or its just one of the incident?" --- two of them (out of 10) did it at the same moment. The other one whose IP you can see in the first log line 10.50.9.1 had identical logs structure, with nothing suspicous.
I will now try to see more closely if they do it periodically now.

murali-reddy added a commit that referenced this issue Jun 29, 2020
controller has already shutdown

fixes panic seen in #937
@aauren
Copy link
Collaborator

aauren commented Jul 10, 2020

@zerkms Were you able to enable more verbose logs and get more information?

@zerkms
Copy link
Contributor Author

zerkms commented Jul 10, 2020

It was a one-time occurrence. I did not watch closely since then - but after checking right now on a 10 nodes cluster - none of the kube-router instances experienced any similar symptoms within last several weeks.

@aauren
Copy link
Collaborator

aauren commented Jul 10, 2020

Thanks for the response! Resolving.

@aauren aauren closed this as completed Jul 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants