-
-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create health check API endpoint(s) that don't require auth #4020
Comments
I don't immediately love the idea to be honest. If we did provide it then it would at minimum be a configurable option that defaults to requiring auth. Even then I still lean towards an API key. It can be provided in query params so there shouldn't be anything crazy required on the monitoring side. |
It goes without saying that this feature (the health API endpoints) needs to exist first. My initial thoughts are captured in the metrics PR. |
@bigmstone I totally understand about the auth part. Is there a way to lock down a user and/or API key so that it can only be used for health checks and not used for executing actions (outside of RBAC which isn't available in Open Source)? Also, i think this should apply to the |
There sure is - with RBAC. :) |
I personally also don't think no authentication is a good idea. It's quite easy to generate an API key and include it as part of the query string in the URL. I could, perhaps, with good arguments be convinced otherwise, but only as an explicit opt-in behavior and not by default (auth enabled by default). |
@nmaludy In essence what you're asking for is a limited open source RBAC - even if implemented w/ auth/no-auth switch. We're all pretty -1 on no auth version (still willing to be convinced, but its an uphill battle), but even if that was how it's implemented it would basically be a form of RBAC implementation. Which is an existing enterprise feature. |
@Kami @lakshmi-kannan We'll finally need This is the case with running production-level StackStorm in K8s, but definitely would be helpful for other environments, like @nmaludy mentioned. In K8s we define For example, if service is started & running, but actually doesn't have working Mongo/Rabbit connection and is in re-connect loop, - we need to know that out via aggregated Because of the way how the checks are defined in K8s objects, we'll need it working without st2 token ( |
I was looking at it for a similar health check scenario for use with Consul service discovery. |
There are 2 paths to consider which was briefly discussed with @Kami:
|
Yes, after looking more into examples on how many other apps add this functionality (ex: coreos/flannel and others), |
To provide my short evaluation on this after looking at it last week (sorry for a wall of text and some missing context, it's from Slack) - https://gist.github.com/Kami/3df2dcf33bce81a6b609e42a9eb0b6a1 In short - in the short term, we will likely go with "service exits on failure" (which is a good idea for distributed and kubernetes environments anyway) + tcp liveness probe for non HTTP services. Correctly implementing "/health" endpoint would require quite some work (due to the way how our service code is structured right now), and doing some quick "hack" would be worse than not doing it - it would give user false sense of correctness and possibly make things less highly available. |
Reassigning this to Also would like to note that failing service fast on failure (like DB/MQ connection loss) is not something that fully solves the problem. One of the important parts of liveness/readiness checks is that we can identify moment when service can really start accepting requests after init time. In heavy environments this may take more than a few moments and without such checks new pod is added to load balancer immediately, meaning we'll likely send requests to a services that were not ready to serve it which results in loss of requests. |
BTW here is an example of liveness http endpoint for This module when installed adds new endpoint Simple as: module.exports = (robot) ->
robot.router.get '/health', (req, res) -> res.status(200).end() but adds some better value, comparing to TCP connect or process is running check, ensuring we send requests to service only when the bootstrap stage is finished (could take time) and we don't loose the requests into a black hole. We'll need at least minimally something like that from StackStorm side, potentially improving functionality with simplest DB/MQ success query checks like https://www.ianlewis.org/en/using-kubernetes-health-checks in future. |
I am also running into this need trying to use Consul with service checks for regular HA. |
Starting with the auth service it would be simple to add a /auth/v1/health endpoint that would return a 200 once the service is up. The readiness probe would need to be implemented on the k8 side using secrets for an api key, then activate the liveness probe in the helm chart only if said secret is provided. Please let me know if this solution is acceptable. |
setting the health check on an alternate port and endpoint doesn't really check anything unless that endpoint makes a get request to the /auth endpoint. which at that point why have 2 different endpoints? |
This seems like a sensible staged approach. A liveness check would meet most basic requirements for monitoring (is it up at all) in lieu of having granular service endpoint status and health checks available. Once this is implemented we can revisit the discussion to see how we want to approach the latter in a way that serves most people's needs. |
Working with monitoring and service discovery and it would be nice if each "service" that has an API had a "health" endpoint that could respond without authentication. Really don't want to give an account and/or API key to our monitoring system just for simple up/down checks.
Thoughts?
The text was updated successfully, but these errors were encountered: