-
Notifications
You must be signed in to change notification settings - Fork 404
healthchecks? #266
Comments
Something a bit more built-in would be ideal, but the following is a reasonable strategy: livenessProbe:
exec:
command:
- /bin/sh
- -c
- '! wget -q -O- localhost:3001/metrics | grep status=\"error\"'
initialDelaySeconds: 30,
periodSeconds: 30, Checks the prometheus metrics to see if there have been any sync errors. |
Thanks @timmyers. |
It's definitely not ideal and I can think of cases it wouldn't be what you want. For our set-up though, generally there are 0 sync errors unless something seems wrong with the pod and all of its communication is failing. A restart seems to fix it, so this might be better than no health check. |
Agreed with @timmyers -- we also have syncing stop occurring without producing any errors in the logs, and a livenessProbe:
exec:
command:
- /bin/sh
- -c
- "cd ~ && wget -O es.json --no-check-certificate --header \"Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\" \"https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT/apis/kubernetes-client.io/v1/namespaces/default/externalsecrets\""
periodSeconds: 30 My backup plan may be to restart it every 5 min/run it as a CronJob 😄 |
Thanks guys, i'll implement the probe suggested by @timmyers until a proper healthcheck is added. |
I've ended up removing the livenessProbe right away as it was restarting the container due to errors that are not related to the pod's communications. The container can function as expected and throw errors due to a variety of reasons such as lack of permissions to a specific secret. |
I would have readiness probe just be the same check as liveness in this case. And that's unfortunate :( I have been pretty good with my probe so far (API call to get |
It's just a GET against the Kubernetes API to retrieve Again, not saying this is a comprehensive fix by any means, but in my case the controller gets into a state where it's unable to detect new |
I didn't test wget liveness probe and i don't think it actually solves the problem here because the open tcp connection from the watch request is being dropped on a different network hop and checking if we can open a new connection does not detect that particular issue. The livenessprobe should just work ™️. I'd propose to have a |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days. |
This issue was closed because it has been stalled for 30 days with no activity. |
Hi,
Is there a way to perform health checks on the controller using kubernetes probes?
Thanks
The text was updated successfully, but these errors were encountered: