-
Notifications
You must be signed in to change notification settings - Fork 40.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow health groups to be configured at an additional path #25471
Comments
We discussed this today and have a few ideas that we'd like to explore. In the meantime, I don't think you need to worry about the warning in the documentation. If Istio is monitoring the success rates for requests hitting the service, it is mitigating the risk that the warning describes. |
Hi, thank you for taking this in consideration.
Indeed, the monitoring by Istio is certainly mitigating this warning, yet the lurking issue is about Istio allowing real request while the application main web infrastructure is not ready or live. Retry policy could help, but this may probably not what we want for non GET/HEAD requests. Imagine a change in configuration, that leads to more requests waiting on IO, this may lead to (Tomcat) connector saturation issues. |
Irrespective of the management port that's being used, the readiness probe won't report that the application is ready to handle traffic until it really is ready. As long as there's no fundamental problem with your application endpoints, Istio's monitoring and the liveness and readiness probes should give you everything that you need. |
Just to weigh in, the following statement may not hold well when the process allows to deploy often, like multiple time a day.
And typically we got caught with an incorrect configuration change that was deployed too early (before the correct docker image was deployed), resulting in many unsatisfied requests waiting on the third party dependency that was misconfigured, this lead to saturation on the connector of the main application. The problem would have been detected earlier if the liveness probe failed at this time. EDIT: |
Reopening to remind us to update the release notes. |
Motivation
In a Kubernetes production with Istio and prometheus metrics.
/actuator/prometheus
We noticed that servicemonitor starts trying to fetch prometheus metrics very early, before the application is ready, this results in a noticeable 503 during the rollout.
Since grabbing the metrics is a separate concern than serving metrics, we wanted to expose those on a different port in order to exclude the port from Istio.
This is possible as documented here : https://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-features.html#production-ready-customizing-management-server-port
There's always the possibility to change the management (actuator) port, but the documentation actually warns about trusting the health endpoints with this setup: https://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-features.html#production-ready-kubernetes-probes
Suggestion
Could it be worth to distinguish two class of actuators ?
The text was updated successfully, but these errors were encountered: