Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#2563 fixed pending state for allocations with terminal status #2816

Merged
merged 3 commits into from
Aug 4, 2017

Conversation

capone212
Copy link
Contributor

@capone212 capone212 commented Jul 9, 2017

Fixes ticket #2563
What happens:

Prerequisites: Allocation A and B are created for the same job, B is successor of A, it is created after updating job spec. Nomad client was running alloc A, then the client went down, job update occured with creating allocation B, then nomad client goes up. Allocations A and B can be run only on the client.

The issue: After nomad client starts, allocation B stay's in pending state for several hours, because it waits for allocation A to be terminated. Allocation A in terminating state(desired state = stopped), but reaches terminated state (clientstatus = compleated) only after GC.

Why allocation B slow to reach terminated state ? what happens:

  • the client restores allocation A
  • it receives fresh allocations for the server, and notice A should be stopped.
  • allocation A is stopped, and GC-ed immediately. completed state for the allocation is scheduled to send in func (c *Client) allocSync()
  • client receives fresh allocations, client thinks that A is new allocation, because there is no allocation runner and allocation is not in terminated state
  • new allocation runner created for A, which is blocked waiting to be destroyed
  • new alloc_runner syncs state from the server which is "running" and start reporting it state so, replacing (clientstatus = compleated) that scheduled to be sent on server.
  • net result: we have zombie allocation which waits for terminated state from the server.

Solution: report completed state if allocation runner is not going to run tasks.

@schmichael
Copy link
Member

Thanks for looking into this, but I think we've at least partially fixed this issue in #2852. There are still some deficiencies in Nomad's GC logic which I'm working on right now.

@schmichael schmichael closed this Jul 31, 2017
@schmichael schmichael reopened this Aug 4, 2017
@schmichael
Copy link
Member

After looking into blocking-allocs/GC logic a bit more I noticed your PR does cover a case I had not considered! Great work, and I'm sorry for not taking it pre-0.6! I'll be adding some logic to only set the status if it's not already terminal, but otherwise this seems correct.

@schmichael schmichael merged commit 9692eef into hashicorp:master Aug 4, 2017
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants