Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create health check API endpoint(s) that don't require auth #4020

Open
nmaludy opened this issue Mar 2, 2018 · 17 comments
Open

Create health check API endpoint(s) that don't require auth #4020

nmaludy opened this issue Mar 2, 2018 · 17 comments
Assignees

Comments

@nmaludy
Copy link
Member

nmaludy commented Mar 2, 2018

Working with monitoring and service discovery and it would be nice if each "service" that has an API had a "health" endpoint that could respond without authentication. Really don't want to give an account and/or API key to our monitoring system just for simple up/down checks.

Thoughts?

@bigmstone
Copy link
Contributor

I don't immediately love the idea to be honest. If we did provide it then it would at minimum be a configurable option that defaults to requiring auth. Even then I still lean towards an API key. It can be provided in query params so there shouldn't be anything crazy required on the monitoring side.

@bigmstone
Copy link
Contributor

bigmstone commented Mar 2, 2018

It goes without saying that this feature (the health API endpoints) needs to exist first. My initial thoughts are captured in the metrics PR.

@nmaludy
Copy link
Member Author

nmaludy commented Mar 2, 2018

@bigmstone I totally understand about the auth part. Is there a way to lock down a user and/or API key so that it can only be used for health checks and not used for executing actions (outside of RBAC which isn't available in Open Source)?

Also, i think this should apply to the st2stream and st2auth services along with st2api so each can be checked for their "health" status independently.

@bigmstone
Copy link
Contributor

There sure is - with RBAC. :)

@Kami
Copy link
Member

Kami commented Mar 2, 2018

I personally also don't think no authentication is a good idea.

It's quite easy to generate an API key and include it as part of the query string in the URL.

I could, perhaps, with good arguments be convinced otherwise, but only as an explicit opt-in behavior and not by default (auth enabled by default).

@bigmstone
Copy link
Contributor

@nmaludy In essence what you're asking for is a limited open source RBAC - even if implemented w/ auth/no-auth switch. We're all pretty -1 on no auth version (still willing to be convinced, but its an uphill battle), but even if that was how it's implemented it would basically be a form of RBAC implementation. Which is an existing enterprise feature.

@arm4b
Copy link
Member

arm4b commented Jun 18, 2018

@Kami @lakshmi-kannan We'll finally need /health endpoint for each st2 service like auth, api, stream.

This is the case with running production-level StackStorm in K8s, but definitely would be helpful for other environments, like @nmaludy mentioned.

In K8s we define livenessProbe and readinessProbe to verify service-level checks. The objective to know that service is not just started & process exists, but it actually works, can accept and handle connections properly. Failure of probe will mark the pod/container as not ready and won't send traffic to it or restart. It's used in deployments (roll-out, roll-back, put-in & put-out if you want 😃 ), load-balancing, traffic-routing to guarantee service availability.

For example, if service is started & running, but actually doesn't have working Mongo/Rabbit connection and is in re-connect loop, - we need to know that out via aggregated /health endpoint (should return non-2xx HTTP response code).
https://www.ianlewis.org/en/using-kubernetes-health-checks describes exactly the case what's needed and why.

Because of the way how the checks are defined in K8s objects, we'll need it working without st2 token (401 unauthorized).
https://github.com/StackStorm/k8s-st2/blob/b92e0f6d1bc8e5f93865516570e4968475838664/st2web-enterprise/deployments.yaml#L28-L45
I'm OK if there will be st2.conf setting for each respective service disabling auth for /health explicitly, if we have concerns about exposing these endpoints by default.

@arm4b arm4b reopened this Jun 18, 2018
@arm4b arm4b added the K8s label Jun 18, 2018
@Kami Kami self-assigned this Jun 18, 2018
@nmaludy
Copy link
Member Author

nmaludy commented Jun 18, 2018

I was looking at it for a similar health check scenario for use with Consul service discovery.

@arm4b
Copy link
Member

arm4b commented Jun 20, 2018

There are 2 paths to consider which was briefly discussed with @Kami:

  1. Add health to existing API like: /v1/stream/health, /v1/api/health, /v1/auth/health. Need an option in st2.conf to disable auth for health endpoints (auth on by default).
  2. Expose /health on a new dedicated port for every service. No auth is needed. User can consider to expose it to the world, running on localhost, change the port or not running at all (default) via st2.conf.
    2.1) Additional advantage: we can it to other services and it'll look consistent (not just 3: auth, stream and api that already have http/api interface).

@arm4b arm4b added this to the 2.9.0 milestone Jun 20, 2018
@arm4b
Copy link
Member

arm4b commented Jul 17, 2018

Yes, after looking more into examples on how many other apps add this functionality (ex: coreos/flannel and others), /health on a dedicated port (option 2) disabled by default makes sense and is more flexible, because this way we can bring consistent health for other stackstorm services outside of api, stream and auth.

@Kami
Copy link
Member

Kami commented Aug 20, 2018

To provide my short evaluation on this after looking at it last week (sorry for a wall of text and some missing context, it's from Slack) - https://gist.github.com/Kami/3df2dcf33bce81a6b609e42a9eb0b6a1

In short - in the short term, we will likely go with "service exits on failure" (which is a good idea for distributed and kubernetes environments anyway) + tcp liveness probe for non HTTP services.

Correctly implementing "/health" endpoint would require quite some work (due to the way how our service code is structured right now), and doing some quick "hack" would be worse than not doing it - it would give user false sense of correctness and possibly make things less highly available.

@arm4b
Copy link
Member

arm4b commented Sep 6, 2018

Reassigning this to 3.0 milestone, per @Kami plans. Note that this was initially requested/proposed for 2.8.

Also would like to note that failing service fast on failure (like DB/MQ connection loss) is not something that fully solves the problem.

One of the important parts of liveness/readiness checks is that we can identify moment when service can really start accepting requests after init time. In heavy environments this may take more than a few moments and without such checks new pod is added to load balancer immediately, meaning we'll likely send requests to a services that were not ready to serve it which results in loss of requests.

@arm4b
Copy link
Member

arm4b commented Apr 13, 2019

Discovered during the K8s st2chatops work StackStorm/stackstorm-k8s#55

BTW here is an example of liveness http endpoint for hubot.

This module when installed adds new endpoint /health https://github.com/joelwallis/hubot-health which is available when hubot internal bootstrap is finished and app is ready to serve the requests.

Simple as:

module.exports = (robot) ->
  robot.router.get '/health', (req, res) -> res.status(200).end()

but adds some better value, comparing to TCP connect or process is running check, ensuring we send requests to service only when the bootstrap stage is finished (could take time) and we don't loose the requests into a black hole.

We'll need at least minimally something like that from StackStorm side, potentially improving functionality with simplest DB/MQ success query checks like https://www.ianlewis.org/en/using-kubernetes-health-checks in future.

@punkrokk
Copy link
Member

I am also running into this need trying to use Consul with service checks for regular HA.

@guzzijones
Copy link
Contributor

guzzijones commented May 15, 2023

Starting with the auth service it would be simple to add a /auth/v1/health endpoint that would return a 200 once the service is up.
That would satisfy the 'liveness probe'
There is not any authentication that is needed on the auth endpoint afaik.

The readiness probe would need to be implemented on the k8 side using secrets for an api key, then activate the liveness probe in the helm chart only if said secret is provided.
Retrieving a valid token is the only way to validate the auth service is working in my opinion.

Please let me know if this solution is acceptable.
We setup k8 and we noticed these todo in the helm chart. We would like to go to production by year end if possible.

@guzzijones
Copy link
Contributor

setting the health check on an alternate port and endpoint doesn't really check anything unless that endpoint makes a get request to the /auth endpoint. which at that point why have 2 different endpoints?

@Stealthii
Copy link
Contributor

Starting with the auth service it would be simple to add a /auth/v1/health endpoint that would return a 200 once the service is up. That would satisfy the 'liveness probe' There is not any authentication that is needed on the auth endpoint afaik.

This seems like a sensible staged approach. A liveness check would meet most basic requirements for monitoring (is it up at all) in lieu of having granular service endpoint status and health checks available. Once this is implemented we can revisit the discussion to see how we want to approach the latter in a way that serves most people's needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants