-
Notifications
You must be signed in to change notification settings - Fork 673
Liveness probe makes debugging very hard without apparent benefit #3417
Comments
Could you please elaborate what issue you are running into? Are you not able to get to the logs of previous pods and check for any errors related to why liveness check failed? |
Probably not. Conceptually there are corner cases where the process can get wedged and re-running the whole startup will un-wedge it. But I can’t recall seeing one of those in the years this thing has been live. FWIW we had to raise the liveness timeout just last week to debug one install. We should remove it. Since we don’t have a Service I’m not aware of any practical benefit of a readiness probe, but you’re right it could be useful in telling an operator that something needs looking into. |
Hey Brian. We got to talk at KubeCon in May this year. Regarding the issue I was seeing (@murali-reddy) that was related to the issue that the Weave container couldn't resolve the API server via the |
Also see my search: https://github.com/weaveworks/weave/search?q=10.96.0.1&type=Issues. These are all impacted by the liveness probe and can't really be debugged with it on. |
Hello again! Sorry about that. Just one of these things you (re-)learn from experience. |
Ah! we also stumbled upon this recently, details are in the PR: #3421. I like the idea of morphing the livenessProbe into a readiness one to still surface something is wrong. Will change the PR accordingly. |
The Kubernetes DaemonSet uses a liveness probe to restart failed Weave pods which makes debugging issues with Weave almost impossible. We've since switched it over to a readiness probe which results in the behavior of the Weave pod not being in the ready state while an operator can then go into the Weave container and see what's wrong (
weave --local status
and all of those).My question is: is there something that I'm missing on why a liveness probe was chosen?
What happened?
Failing Weave containers are continuously being reaped by the Kubelet, making it impossible to debug a problem.
Anything else we need to know?
No, this isn't a question specific to any cloud provider or hardware, just Weave on K8s using the DaemonSet.
Versions:
The text was updated successfully, but these errors were encountered: