-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client: fix waiting on preempted alloc #12779
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
It looks like this is a very old bug (pre-0.9) but I spent a little time going thru lingering open issues and didn't manage to find one for it. #10200 looks like it could be related but we kind of went off in the weeds of the cluster topology in that issue so I'm not sure.
Don't forget the changelog entry and backport labels!
Fixes #10200 **The bug** A user reported receiving the following error when an alloc was placed that needed to preempt existing allocs: ``` [ERROR] client.alloc_watcher: error querying previous alloc: alloc_id=28... previous_alloc=8e... error="rpc error: alloc lookup failed: index error: UUID must be 36 characters" ``` The previous alloc (8e) was already complete on the client. This is possible if an alloc stops *after* the scheduling decision was made to preempt it, but *before* the node running both allocations was able to pull and start the preemptor. While that is hopefully a narrow window of time, you can expect it to occur in high throughput batch scheduling heavy systems. However the RPC error made no sense! `previous_alloc` in the logs was a valid 36 character UUID! **The fix** The fix is: ``` - prevAllocID: c.Alloc.PreviousAllocation, + prevAllocID: watchedAllocID, ``` The alloc watcher new func used for preemption improperly referenced Alloc.PreviousAllocation instead of the passed in watchedAllocID. When multiple allocs are preempted, a watcher is created for each with watchedAllocID set properly by the caller. In this case Alloc.PreviousAllocation="" -- which is where the `UUID must be 36 characters` error was coming from! Sadly we were properly referencing watchedAllocID in the log, so it made the error make no sense! **The repro** I was able to reproduce this with a dev agent with [preemption enabled](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-hcl) and [lowered limits](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-limits-hcl) for ease of repro. First I started a [low priority count 3 job](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-lo-nomad), then a [high priority job](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-hi-nomad) that evicts 2 low priority jobs. Everything worked as expected. However if I force it to use the [remotePrevAlloc implementation](https://github.com/hashicorp/nomad/blob/v1.3.0-beta.1/client/allocwatcher/alloc_watcher.go#L147), it reproduces the bug because the watcher references PreviousAllocation instead of watchedAllocID.
a606db0
to
5328f3c
Compare
@tgross Thanks for making me take another look at #10200. At first I assumed these were unrelated, but now I'm not so sure! At the very least #10200 exhibits the same So I'm actually going the ambitious route of closing that one with this. If there's another compounding bug in #10200 hopefully someone will reopen and file a new issue. I just don't see us making progress on #10200 without a repro after this lands, and I didn't want folks to think there was no progress on it. (I'll leave a similar note on that issue when merging this.) |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Fixes #10200
The bug
https://github.com/hashicorp/nomad-enterprise/issues/707
A user reported receiving the following error when an alloc was placed
that needed to preempt existing allocs:
The previous alloc (8e) was already complete on the client. This is
possible if an alloc stops after the scheduling decision was made to
preempt it, but before the node running both allocations was able to
pull and start the preemptor. While that is hopefully a narrow window of
time, you can expect it to occur in high throughput batch scheduling
heavy systems.
However the RPC error made no sense!
previous_alloc
in the logs was avalid 36 character UUID!
The fix
The fix is:
The alloc watcher new func used for preemption improperly referenced
Alloc.PreviousAllocation instead of the passed in watchedAllocID. When
multiple allocs are preempted, a watcher is created for each with
watchedAllocID set properly by the caller. In this case
Alloc.PreviousAllocation="" -- which is where the
UUID must be 36 characters
error was coming from! Sadly we were properly referencing
watchedAllocID in the log, so it made the error make no sense!
The repro
I was able to reproduce this with a dev agent with preemption enabled and lowered limits for ease of repro.
First I started a low priority count 3 job, then a high priority job that evicts 2 low priority jobs. Everything worked as expected.
However if I force it to use the remotePrevAlloc implementation, it reproduces the bug because the watcher references PreviousAllocation instead of watchedAllocID.