-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restarting nomad agent restarts successfully completed allocs #5883
Labels
Comments
notnoop
pushed a commit
that referenced
this issue
Jun 26, 2019
This change fixes a bug where nomad would avoid running alloc tasks if the alloc is client terminal but the server copy on the client isn't marked as running. Here, we fix the case by having task runner uses the allocRunner.shouldRun() instead of only checking the server updated alloc. Here, we preserve much of the invariants such that `tr.Run()` is always run, and don't change the overall alloc runner and task runner lifecycles. Fixes #5883
notnoop
pushed a commit
that referenced
this issue
Jun 27, 2019
This change fixes a bug where nomad would avoid running alloc tasks if the alloc is client terminal but the server copy on the client isn't marked as running. Here, we fix the case by having task runner uses the allocRunner.shouldRun() instead of only checking the server updated alloc. Here, we preserve much of the invariants such that `tr.Run()` is always run, and don't change the overall alloc runner and task runner lifecycles. Fixes #5883
chrisboulton
pushed a commit
to bigcommerce/nomad
that referenced
this issue
Jul 15, 2019
This change fixes a bug where nomad would avoid running alloc tasks if the alloc is client terminal but the server copy on the client isn't marked as running. Here, we fix the case by having task runner uses the allocRunner.shouldRun() instead of only checking the server updated alloc. Here, we preserve much of the invariants such that `tr.Run()` is always run, and don't change the overall alloc runner and task runner lifecycles. Fixes hashicorp#5883
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Nomad version
Operating system and Environment details
macOS, but OS is irrelevant
Issue
When a client restarts, it may restart successfully completed batch allocs unexpectedly. When the alloc is restarted, it will keep running until completion without being terminated. This seems to affect allocations that haven't been garbage collection.
This is a regression introduced in 0.9.0. Nomad 0.8.7 did not attempt to re-run such tasks. Nomad 0.9.0 and 0.9.1/0.9.2 attempt to re-run tasks, though 0.9.0 had logmon reattachment related bugs that complicates the matter a bit.
The cause seems to be an interaction of client state tracking, client storage, and garv
Reproduction steps
nomad agent --config=./config.hcl
nomad job run ./example-restored.hcl
/tmp/debug-alloc-restore
.One would expect alloc to only run once, but one would notice that the alloc was re-running again and moved from a dead status into a running one.
/tmp/debug-alloc-restore
ran twice.Job file (if appropriate)
Nomad config file
Reproduction script
Here is a full script and the output, note that on restarting the nomad agent, the alloc will be re-run to a full completion and /tmp/debug-alloc-restore contains the alloc id twice:
The text was updated successfully, but these errors were encountered: