Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liveness probe support request #333

Closed
arturas-d opened this issue Dec 29, 2020 · 8 comments
Closed

Liveness probe support request #333

arturas-d opened this issue Dec 29, 2020 · 8 comments
Labels
good first issue Good for newcomers Status: Help Wanted Type: Enhancement New feature or request

Comments

@arturas-d
Copy link

We ensure compliance of Kubernetes Cluster with company policies using Gatekeeper. One of the main policies is ensuring we have a liveness probe to helps the kubelet know when it should restart a container. Without a defined liveliness probe we simply can't deploy aws-node-termination-handler on our cluster as it breaks required policies.
Is there any recommended command to define liveness probe atm? Will it be supported in the future?

@bwagner5
Copy link
Contributor

I think the liveness probe is a reasonable thing to add to aws-node-termination-handler especially when operating in queue-processor mode (running as a K8s Deployment).

IMDS mode operates as a DaemonSet and by default runs w/ HostNetworking, meaning it shares the root network namespace of the node it's running on. Exposing a liveness endpoint in the root network namespace might be problematic because of port conflicts and security concerns.

I would propose implementing a configuration option called --enable-health=8080 or ENABLE_HEALTH=8080 which would activate a health check HTTP endpoint on the specified port which can be used as the liveness probe. The helm chart could set enableHealth default as 8080 in queue-processor mode and disable it in IMDS mode (the user could still enable it in IMDS mode if they wanted).

$ curl localhost:8080/health
{
   "health": "OK"
}

As for what the health endpoint should check... I'm not quite sure right now. It might be enough the webserver is running, at least the first draft of it. aws-node-termination-handler is pretty liberal on exiting if a fatal error or persistent errors occur since it know kubernetes will restart it.

How does this sound to you @arturas-d ? Anyone else with an opinion on this proposal, let me know!

@bwagner5
Copy link
Contributor

@arturas-d you also asked if there was something existing that could be used for the liveness probe. You could use the /metrics endpoint. This endpoint is disabled by default (for the same reasons I mentioned above for IMDS mode operating w/ HostNetworking).

You can configure it with the following parameters in the helm chart (https://github.com/aws/aws-node-termination-handler/tree/main/config/helm/aws-node-termination-handler):

  • enablePrometheusServer
  • prometheusServerPort

And then modify the PodSpec to something like this:

livenessProbe:
  httpGet:
    path: /metrics
    port: 9092
  initialDelaySeconds: 10

@tomelliff
Copy link

@sliaptsou are you planning on raising that as a pull request back upstream?

@sliaptsou
Copy link
Contributor

@tomelliff what do you mean?

@sliaptsou
Copy link
Contributor

@bwagner5 Could you please review again?

@bwagner5
Copy link
Contributor

@sliaptsou Yes! Sorry for the delay!

@bwagner5
Copy link
Contributor

Merged and released!

@tomelliff
Copy link

@tomelliff what do you mean?

Arf, apologies. I only saw your PR on your own fork that you had merged and didn't see the PR link back upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers Status: Help Wanted Type: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants