Nomad doesn't re-schedule jobs if Docker daemon is restarted #629

c4milo · 2015-12-24T06:28:29Z

I've been testing the ability of a cluster to recover its state, especially when the Docker daemon is restarted. But, every time this happens, the allocations moved to failed status and Nomad didn't restart them ever again, not even if I re-run the job. In order to fix it, I had to stop the job and start over.

The text was updated successfully, but these errors were encountered:

dadgar · 2016-01-04T18:04:34Z

Yeah this will hopefully be fixed by Nomad 0.3 and in a generic way too. We currently don't migrate failed allocs to new nodes and once we do that, this problem should be fixed

c4milo · 2016-02-01T17:25:03Z

@dadgar is this still on track for 0.3?

dadgar · 2016-02-02T17:28:46Z

No this will not be fixed in 0.3. We decided to not tackle server side restarts in this release.

c4milo · 2016-03-14T17:23:29Z

@dadgar, Nomad making sure the number of instances is always honored is the only thing right now stopping me from using Nomad in production, instead of Kubernetes. Are there any plans to fix this in the short/mid term?

dadgar · 2016-03-16T19:51:15Z

@c4milo So I just restarted a docker daemon that was running containers started by Nomad. Nomad detected it and restarted the task locally

ketzacoatl · 2016-03-16T21:17:21Z

There are two separate issues:

if the docker daemon restarts, and the docker container dies, nomad should restart that - it does for me too, in my tests now.
if the node fails, nomad needs to have the task moved to a different node - in my tests, this has not worked

dadgar · 2016-03-16T21:31:06Z

@ketzacoatl: Nomad actually does both of these. There is a node failure detection window and once the node hasn't talked to the servers for a period of time, Nomad will reschedule what it is running

dadgar · 2016-03-16T21:31:44Z

@c4milo can you reproduce the behavior you were experiencing? If not we should probably close this

c4milo · 2016-03-16T21:42:34Z

@dadgar I haven't tested but feel free to close it. I will re-open it if I'm able to reproduce.

ketzacoatl · 2016-03-16T21:43:12Z

when I last tested, nomad did not reschedule the job on a node that failed,. Maybe that has changed, I will squeeze in a test in the time to come.. but I agree that is not what this issue intends to be about.
+1 on closing

dadgar · 2016-03-16T21:43:31Z

Cool! Thanks guys!

github-actions · 2022-12-25T02:15:10Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

c4milo changed the title ~~Nomad doesn't restart containers once docker daemon is restarted~~ Nomad doesn't restart containers once Docker daemon is restarted Dec 24, 2015

c4milo changed the title ~~Nomad doesn't restart containers once Docker daemon is restarted~~ Nomad doesn't restart containers if Docker daemon is restarted Dec 24, 2015

c4milo changed the title ~~Nomad doesn't restart containers if Docker daemon is restarted~~ Nomad doesn't reschedule containers if Docker daemon is restarted Dec 24, 2015

c4milo changed the title ~~Nomad doesn't reschedule containers if Docker daemon is restarted~~ Nomad doesn't re-schedule jobs if Docker daemon is restarted Dec 24, 2015

diptanu added type/enhancement theme/scheduling labels Jan 4, 2016

dadgar closed this as completed Mar 16, 2016

sugdyzhekov mentioned this issue Oct 19, 2017

Nomad does not handle container names conflict #3419

Closed

github-actions bot locked as resolved and limited conversation to collaborators Dec 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nomad doesn't re-schedule jobs if Docker daemon is restarted #629

Nomad doesn't re-schedule jobs if Docker daemon is restarted #629

c4milo commented Dec 24, 2015

dadgar commented Jan 4, 2016

c4milo commented Feb 1, 2016

dadgar commented Feb 2, 2016

c4milo commented Mar 14, 2016

dadgar commented Mar 16, 2016

ketzacoatl commented Mar 16, 2016

dadgar commented Mar 16, 2016

dadgar commented Mar 16, 2016

c4milo commented Mar 16, 2016

ketzacoatl commented Mar 16, 2016

dadgar commented Mar 16, 2016

github-actions bot commented Dec 25, 2022

Nomad doesn't re-schedule jobs if Docker daemon is restarted #629

Nomad doesn't re-schedule jobs if Docker daemon is restarted #629

Comments

c4milo commented Dec 24, 2015

dadgar commented Jan 4, 2016

c4milo commented Feb 1, 2016

dadgar commented Feb 2, 2016

c4milo commented Mar 14, 2016

dadgar commented Mar 16, 2016

ketzacoatl commented Mar 16, 2016

dadgar commented Mar 16, 2016

dadgar commented Mar 16, 2016

c4milo commented Mar 16, 2016

ketzacoatl commented Mar 16, 2016

dadgar commented Mar 16, 2016

github-actions bot commented Dec 25, 2022