-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clients try to rerun old allocations after machine reboot #1795
Comments
Hey @Gerrrr, This is actually expected behavior. So what the client is doing is attempting to re-attach to anything that was already running. This can be useful if you kill Nomad Client and do an in-place upgrade for example, start it up again and have it find all the processes. In your case, there is nothing to connect to anymore because the tasks are dead, so it is just cleaning up. Let me know if that made sense! |
Hi @dadgar, Thanks for the explanation, it makes sense to me for in-place upgrades or when you just restart Nomad Client. In our case the problem was that after the VM reboot, Nomad Client was starting allocations that were already rescheduled so we ended up having jobs running multiple times. So the rebooted Nomad client immediately sent However, |
Ah thanks for the clarification. Will re-open |
Fixes #1795 Running restored allocations and pulling what allocations to run from the server happen concurrently. This means that if a client is rebooted, and has its allocations rescheduled, it may restart the dead allocations before it contacts the server and determines they should be dead. This commit makes tasks that fail to reattach on restore wait until the server is contacted before restarting.
Fixes #1795 Running restored allocations and pulling what allocations to run from the server happen concurrently. This means that if a client is rebooted, and has its allocations rescheduled, it may restart the dead allocations before it contacts the server and determines they should be dead. This commit makes tasks that fail to reattach on restore wait until the server is contacted before restarting.
Fixes #1795 Running restored allocations and pulling what allocations to run from the server happen concurrently. This means that if a client is rebooted, and has its allocations rescheduled, it may restart the dead allocations before it contacts the server and determines they should be dead. This commit makes tasks that fail to reattach on restore wait until the server is contacted before restarting.
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.3.2
Operating system and Environment details
Ubuntu 14.04
Issue
Nomad client puts some of the temporary files to
/tmp
. In the situation whenalloc_dir
is not erased on reboot, Nomad client tries to restart all the allocations that were run previously on this node and terminates them after receiving a command from server (noteStarted
,Started
,Killed
event sequence in the allocation status). Since the socket in/tmp
is gone after reboot, the client produces an error log (see below).In our setup we solved it by putting
alloc_dir
to/tmp
.Reproduction steps
Nomad job status
Nomad alloc-status
Nomad Client configuration
Nomad Client logs (if appropriate)
The text was updated successfully, but these errors were encountered: