-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad doesn't stop allocations sometimes #2779
Comments
Hi! Thanks for testing master. Sorry you hit a bug. Any chance you could post your client logs (you can attach a file as a comment or use gist.github.com if they're long)? You could even grep by that allocation ID if you wanted to post a minimal set, but that might miss some relevant lines. |
The As for how to recover simply restarting the nomad client daemon should get things back into a clean state. Worst case scenario this should get you back to a clean slate:
|
As it turned out, nomad agent can't stop allocations placed on it(is simple ignores desired state, and not even tried anything to do with allocations which was placed on the node
) So we tried to restart nomad agent. Before we restart nomad we see follow in logs:
after that we try to restart nomad agent we see follow:
|
Then we simple cleanup all nomad state, remove nomad's |
I just noticed you're using an older version of master. Mind testing with the attached? |
Sorry but the problem is the same we make nomad node-darin to stop all allocations and they removed from node, the we stop nomad, then upgrade it binary with that you attach to this issue, and start nomad, and it can't start
is is the same as #2715. So for now nomad restart operation is very veryy unstable |
Also when it stop it doens't umount secrets folders from allocations |
@tantra35 I'm afraid there are incompatible changes between the two custom builds you're using. You'll have to remove your |
@schmichael Sorry but seems i can't see any commits between 52ffc01(our old version) and d80aee0(version that you post in this issue) which can improve situation with nomad restart behaivour and also there is no any changes that can change binary format of the state nomad file |
Also seems that if we simple kill nomad with SIGKILL it also can't start, but to clarify this thesis i must to check it on our test stand |
@schmichael Just test on test stand(3 virtual machines) with nomad rc1, and got restart error again
What i do multiple times call Here is our upstart script:
On test stand we have simple one job:
|
@schmichael we got this error on rc1
|
@schmichael maybe I was misled, I meant that nomad after the restart corrupts its state |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.6.0-dev (52ffc01+CHANGES)
we try to stop one job with
nomad stop Grafana
and job allocaions doesn't stopAlso we see follow running allocation status:
And we concluded that nomad client didn't even a try to stop allocation. And what we can do now? How we can stop this buggy allocation? We try simple kill nomad watch dog process for allocation but it created again
The text was updated successfully, but these errors were encountered: