Create health check API endpoint(s) that don't require auth #4020

nmaludy · 2018-03-02T00:44:26Z

Working with monitoring and service discovery and it would be nice if each "service" that has an API had a "health" endpoint that could respond without authentication. Really don't want to give an account and/or API key to our monitoring system just for simple up/down checks.

Thoughts?

bigmstone · 2018-03-02T01:08:01Z

I don't immediately love the idea to be honest. If we did provide it then it would at minimum be a configurable option that defaults to requiring auth. Even then I still lean towards an API key. It can be provided in query params so there shouldn't be anything crazy required on the monitoring side.

bigmstone · 2018-03-02T01:08:54Z

It goes without saying that this feature (the health API endpoints) needs to exist first. My initial thoughts are captured in the metrics PR.

nmaludy · 2018-03-02T01:18:02Z

@bigmstone I totally understand about the auth part. Is there a way to lock down a user and/or API key so that it can only be used for health checks and not used for executing actions (outside of RBAC which isn't available in Open Source)?

Also, i think this should apply to the st2stream and st2auth services along with st2api so each can be checked for their "health" status independently.

bigmstone · 2018-03-02T01:19:16Z

There sure is - with RBAC. :)

Kami · 2018-03-02T15:10:39Z

I personally also don't think no authentication is a good idea.

It's quite easy to generate an API key and include it as part of the query string in the URL.

I could, perhaps, with good arguments be convinced otherwise, but only as an explicit opt-in behavior and not by default (auth enabled by default).

bigmstone · 2018-03-02T16:01:20Z

@nmaludy In essence what you're asking for is a limited open source RBAC - even if implemented w/ auth/no-auth switch. We're all pretty -1 on no auth version (still willing to be convinced, but its an uphill battle), but even if that was how it's implemented it would basically be a form of RBAC implementation. Which is an existing enterprise feature.

arm4b · 2018-06-18T15:43:30Z

@Kami @lakshmi-kannan We'll finally need /health endpoint for each st2 service like auth, api, stream.

This is the case with running production-level StackStorm in K8s, but definitely would be helpful for other environments, like @nmaludy mentioned.

In K8s we define livenessProbe and readinessProbe to verify service-level checks. The objective to know that service is not just started & process exists, but it actually works, can accept and handle connections properly. Failure of probe will mark the pod/container as not ready and won't send traffic to it or restart. It's used in deployments (roll-out, roll-back, put-in & put-out if you want 😃 ), load-balancing, traffic-routing to guarantee service availability.

For example, if service is started & running, but actually doesn't have working Mongo/Rabbit connection and is in re-connect loop, - we need to know that out via aggregated /health endpoint (should return non-2xx HTTP response code).
https://www.ianlewis.org/en/using-kubernetes-health-checks describes exactly the case what's needed and why.

Because of the way how the checks are defined in K8s objects, we'll need it working without st2 token (401 unauthorized).
https://github.com/StackStorm/k8s-st2/blob/b92e0f6d1bc8e5f93865516570e4968475838664/st2web-enterprise/deployments.yaml#L28-L45
I'm OK if there will be st2.conf setting for each respective service disabling auth for /health explicitly, if we have concerns about exposing these endpoints by default.

nmaludy · 2018-06-18T15:50:36Z

I was looking at it for a similar health check scenario for use with Consul service discovery.

arm4b · 2018-06-20T15:17:20Z

There are 2 paths to consider which was briefly discussed with @Kami:

Add health to existing API like: /v1/stream/health, /v1/api/health, /v1/auth/health. Need an option in st2.conf to disable auth for health endpoints (auth on by default).
Expose /health on a new dedicated port for every service. No auth is needed. User can consider to expose it to the world, running on localhost, change the port or not running at all (default) via st2.conf.
2.1) Additional advantage: we can it to other services and it'll look consistent (not just 3: auth, stream and api that already have http/api interface).

arm4b · 2018-07-17T16:02:10Z

Yes, after looking more into examples on how many other apps add this functionality (ex: coreos/flannel and others), /health on a dedicated port (option 2) disabled by default makes sense and is more flexible, because this way we can bring consistent health for other stackstorm services outside of api, stream and auth.

Kami · 2018-08-20T10:44:59Z

To provide my short evaluation on this after looking at it last week (sorry for a wall of text and some missing context, it's from Slack) - https://gist.github.com/Kami/3df2dcf33bce81a6b609e42a9eb0b6a1

In short - in the short term, we will likely go with "service exits on failure" (which is a good idea for distributed and kubernetes environments anyway) + tcp liveness probe for non HTTP services.

Correctly implementing "/health" endpoint would require quite some work (due to the way how our service code is structured right now), and doing some quick "hack" would be worse than not doing it - it would give user false sense of correctness and possibly make things less highly available.

arm4b · 2018-09-06T15:43:40Z

Reassigning this to 3.0 milestone, per @Kami plans. Note that this was initially requested/proposed for 2.8.

Also would like to note that failing service fast on failure (like DB/MQ connection loss) is not something that fully solves the problem.

One of the important parts of liveness/readiness checks is that we can identify moment when service can really start accepting requests after init time. In heavy environments this may take more than a few moments and without such checks new pod is added to load balancer immediately, meaning we'll likely send requests to a services that were not ready to serve it which results in loss of requests.

arm4b · 2019-04-13T13:29:02Z

Discovered during the K8s st2chatops work StackStorm/stackstorm-k8s#55

BTW here is an example of liveness http endpoint for hubot.

This module when installed adds new endpoint /health https://github.com/joelwallis/hubot-health which is available when hubot internal bootstrap is finished and app is ready to serve the requests.

Simple as:

module.exports = (robot) ->
  robot.router.get '/health', (req, res) -> res.status(200).end()

but adds some better value, comparing to TCP connect or process is running check, ensuring we send requests to service only when the bootstrap stage is finished (could take time) and we don't loose the requests into a black hole.

We'll need at least minimally something like that from StackStorm side, potentially improving functionality with simplest DB/MQ success query checks like https://www.ianlewis.org/en/using-kubernetes-health-checks in future.

punkrokk · 2020-03-27T02:05:47Z

I am also running into this need trying to use Consul with service checks for regular HA.

guzzijones · 2023-05-15T20:43:05Z

Starting with the auth service it would be simple to add a /auth/v1/health endpoint that would return a 200 once the service is up.
That would satisfy the 'liveness probe'
There is not any authentication that is needed on the auth endpoint afaik.

The readiness probe would need to be implemented on the k8 side using secrets for an api key, then activate the liveness probe in the helm chart only if said secret is provided.
Retrieving a valid token is the only way to validate the auth service is working in my opinion.

Please let me know if this solution is acceptable.
We setup k8 and we noticed these todo in the helm chart. We would like to go to production by year end if possible.

guzzijones · 2023-05-15T20:52:09Z

setting the health check on an alternate port and endpoint doesn't really check anything unless that endpoint makes a get request to the /auth endpoint. which at that point why have 2 different endpoints?

Stealthii · 2024-02-06T16:48:24Z

Starting with the auth service it would be simple to add a /auth/v1/health endpoint that would return a 200 once the service is up. That would satisfy the 'liveness probe' There is not any authentication that is needed on the auth endpoint afaik.

This seems like a sensible staged approach. A liveness check would meet most basic requirements for monitoring (is it up at all) in lieu of having granular service endpoint status and health checks available. Once this is implemented we can revisit the discussion to see how we want to approach the latter in a way that serves most people's needs.

nmaludy closed this as completed Mar 2, 2018

arm4b added feature API labels Jun 15, 2018

arm4b reopened this Jun 18, 2018

arm4b added the K8s label Jun 18, 2018

Kami self-assigned this Jun 18, 2018

arm4b added this to the 2.9.0 milestone Jun 20, 2018

arm4b mentioned this issue Jul 19, 2018

Upgrade notes for new st2timersengine StackStorm/st2docs#749

Merged

arm4b modified the milestones: 2.9.0, 3.0.0 Sep 6, 2018

arm4b mentioned this issue Sep 18, 2018

Use K8s Liveness & Readiness Probes StackStorm/stackstorm-k8s#3

Open

arm4b mentioned this issue May 11, 2019

healthcheck and readiness StackStorm/st2-docker#67

Closed

arm4b removed this from the 2.10.0 milestone Jul 26, 2019

This was referenced Aug 21, 2019

st2.conf settings for RabbitMQ connection retrites/limits don't work #4775

Open

Missing st2.conf setting for MongoDB connection retries #4777

Open

st2 services ignore MongoDB failures #4778

Open

sravs-dev mentioned this issue Dec 18, 2023

ActionRunners get stuck when there is transient connection loss to RabbitMQ/ Zookeeper #6097

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create health check API endpoint(s) that don't require auth #4020

Create health check API endpoint(s) that don't require auth #4020

nmaludy commented Mar 2, 2018

bigmstone commented Mar 2, 2018

bigmstone commented Mar 2, 2018 •

edited

Loading

nmaludy commented Mar 2, 2018 •

edited

Loading

bigmstone commented Mar 2, 2018

Kami commented Mar 2, 2018

bigmstone commented Mar 2, 2018

arm4b commented Jun 18, 2018 •

edited

Loading

nmaludy commented Jun 18, 2018

arm4b commented Jun 20, 2018 •

edited

Loading

arm4b commented Jul 17, 2018 •

edited

Loading

Kami commented Aug 20, 2018

arm4b commented Sep 6, 2018 •

edited

Loading

arm4b commented Apr 13, 2019 •

edited

Loading

punkrokk commented Mar 27, 2020

guzzijones commented May 15, 2023 •

edited

Loading

guzzijones commented May 15, 2023

Stealthii commented Feb 6, 2024

Create health check API endpoint(s) that don't require auth #4020

Create health check API endpoint(s) that don't require auth #4020

Comments

nmaludy commented Mar 2, 2018

bigmstone commented Mar 2, 2018

bigmstone commented Mar 2, 2018 • edited Loading

nmaludy commented Mar 2, 2018 • edited Loading

bigmstone commented Mar 2, 2018

Kami commented Mar 2, 2018

bigmstone commented Mar 2, 2018

arm4b commented Jun 18, 2018 • edited Loading

nmaludy commented Jun 18, 2018

arm4b commented Jun 20, 2018 • edited Loading

arm4b commented Jul 17, 2018 • edited Loading

Kami commented Aug 20, 2018

arm4b commented Sep 6, 2018 • edited Loading

arm4b commented Apr 13, 2019 • edited Loading

punkrokk commented Mar 27, 2020

guzzijones commented May 15, 2023 • edited Loading

guzzijones commented May 15, 2023

Stealthii commented Feb 6, 2024

bigmstone commented Mar 2, 2018 •

edited

Loading

nmaludy commented Mar 2, 2018 •

edited

Loading

arm4b commented Jun 18, 2018 •

edited

Loading

arm4b commented Jun 20, 2018 •

edited

Loading

arm4b commented Jul 17, 2018 •

edited

Loading

arm4b commented Sep 6, 2018 •

edited

Loading

arm4b commented Apr 13, 2019 •

edited

Loading

guzzijones commented May 15, 2023 •

edited

Loading