Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad doesn't re-schedule jobs if Docker daemon is restarted #629

Closed
c4milo opened this issue Dec 24, 2015 · 12 comments
Closed

Nomad doesn't re-schedule jobs if Docker daemon is restarted #629

c4milo opened this issue Dec 24, 2015 · 12 comments

Comments

@c4milo
Copy link
Contributor

c4milo commented Dec 24, 2015

I've been testing the ability of a cluster to recover its state, especially when the Docker daemon is restarted. But, every time this happens, the allocations moved to failed status and Nomad didn't restart them ever again, not even if I re-run the job. In order to fix it, I had to stop the job and start over.

@c4milo c4milo changed the title Nomad doesn't restart containers once docker daemon is restarted Nomad doesn't restart containers once Docker daemon is restarted Dec 24, 2015
@c4milo c4milo changed the title Nomad doesn't restart containers once Docker daemon is restarted Nomad doesn't restart containers if Docker daemon is restarted Dec 24, 2015
@c4milo c4milo changed the title Nomad doesn't restart containers if Docker daemon is restarted Nomad doesn't reschedule containers if Docker daemon is restarted Dec 24, 2015
@c4milo c4milo changed the title Nomad doesn't reschedule containers if Docker daemon is restarted Nomad doesn't re-schedule jobs if Docker daemon is restarted Dec 24, 2015
@dadgar
Copy link
Contributor

dadgar commented Jan 4, 2016

Yeah this will hopefully be fixed by Nomad 0.3 and in a generic way too. We currently don't migrate failed allocs to new nodes and once we do that, this problem should be fixed

@c4milo
Copy link
Contributor Author

c4milo commented Feb 1, 2016

@dadgar is this still on track for 0.3?

@dadgar
Copy link
Contributor

dadgar commented Feb 2, 2016

No this will not be fixed in 0.3. We decided to not tackle server side restarts in this release.

@c4milo
Copy link
Contributor Author

c4milo commented Mar 14, 2016

@dadgar, Nomad making sure the number of instances is always honored is the only thing right now stopping me from using Nomad in production, instead of Kubernetes. Are there any plans to fix this in the short/mid term?

@dadgar
Copy link
Contributor

dadgar commented Mar 16, 2016

@c4milo So I just restarted a docker daemon that was running containers started by Nomad. Nomad detected it and restarted the task locally

@ketzacoatl
Copy link
Contributor

There are two separate issues:

  1. if the docker daemon restarts, and the docker container dies, nomad should restart that - it does for me too, in my tests now.
  2. if the node fails, nomad needs to have the task moved to a different node - in my tests, this has not worked

@dadgar
Copy link
Contributor

dadgar commented Mar 16, 2016

@ketzacoatl: Nomad actually does both of these. There is a node failure detection window and once the node hasn't talked to the servers for a period of time, Nomad will reschedule what it is running

@dadgar
Copy link
Contributor

dadgar commented Mar 16, 2016

@c4milo can you reproduce the behavior you were experiencing? If not we should probably close this

@c4milo
Copy link
Contributor Author

c4milo commented Mar 16, 2016

@dadgar I haven't tested but feel free to close it. I will re-open it if I'm able to reproduce.

@ketzacoatl
Copy link
Contributor

when I last tested, nomad did not reschedule the job on a node that failed,. Maybe that has changed, I will squeeze in a test in the time to come.. but I agree that is not what this issue intends to be about.
+1 on closing

@dadgar
Copy link
Contributor

dadgar commented Mar 16, 2016

Cool! Thanks guys!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants