/readyz
endpoint returns 200 OK when not all enabled services are running
#43440
Labels
feature-request
Used for new features in Teleport, improvements to current should be #enhancements
What would you like Teleport to do?
Introduce a new health-check endpoint (or modify the existing
/readyz
endpoint) that provides a 200 OK response only if all enabled services in the configuration are up and running without errors.What problem does this solve?
Currently, the
/readyz
endpoint returns a 200 OK status as soon as the instance successfully heartbeats with the cluster.This means that if one or more of the configured Teleport services (e.g., app_service) is not yet ready after, or never starts up properly,
/readyz
still returns a 200 OK. This is true as long as it was able to do a heartbeat of any kind.A repeatable method to force a successful heartbeat, but have a broken service is to enable both the
ssh_service
and theapp_service
, and then try to join the cluster with a token that is good for theapp
role only. The app service starts up, the instance heartbeats, but thessh_service
never becomes healthy, all while/readyz
returns 200 OK.If a workaround exists, please include it.
I looked over the
/metrics
endpoint, hoping that health/status info for each service might be there, but it wasn't. There doesn't appear to be a good way to determine the readiness based on the status of the individual Teleport services./healthz
will always return a 200 if the process is running. If it is determined that the current behavior ofreadyz
should not be altered, an additional endpoint with the desired behavior would be great.The text was updated successfully, but these errors were encountered: