Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reschedule or restart a task on health check failure #2973

Closed
samart opened this issue Aug 5, 2017 · 4 comments
Closed

Reschedule or restart a task on health check failure #2973

samart opened this issue Aug 5, 2017 · 4 comments

Comments

@samart
Copy link

samart commented Aug 5, 2017

It would nice if a failing check restarted a task or re-scheduled it elsewhere.

Consul provides a service discovery healthcheck, but when a service is unresponsive we'd like to restart it. Our mesos clusters do this with marathon keep-alive healthchecks and it works well to keep applications responsive.

We should be able to specify at least:

  • gracePeriod to wait before starting to healthcheck, after the task has started
  • number of check failures before a restart/re-schedule
  • interval between checks
  • timeout on the check attempt

Including the current nomad restart options would be nice as well. Max restart attempts, mode, etc.

@epetrovich
Copy link

epetrovich commented Aug 7, 2017

#876
This is an essential feature, but I've did it using my own script that is launched in the task group of the main app. Restarts of individual tasks/allocation is possible via Consul-kv and template engine. I just wrote a simple script that checks the status of consul healthcheck and implements my own restart logic via Consul KV.
Here's my working script for that.
It's designed to be launched in the allocation with the main app. All necessary parameters should be passed from jobs manifest

watcher.py.gz

@dadgar
Copy link
Contributor

dadgar commented Aug 7, 2017

Hey @samart Going to close as duplicate of #876 but I captured your comment there. We will be working on this soon!

@dadgar dadgar closed this as completed Aug 7, 2017
@samart
Copy link
Author

samart commented Aug 7, 2017

i will watch the other issue, thank you!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants