Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atomic eval insertion with job (de-)registration #8435
Atomic eval insertion with job (de-)registration #8435
Changes from 1 commit
97c69ee
6a082ad
a6a96c4
921a42b
37ad947
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that here we attempt to insert the eval even if the job deregistration fails. I find this behavior very odd but it's explicitly tested in
nomad/nomad/job_endpoint_test.go
Line 2969 in 97c69ee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this was intended to protect against the previous non-atomic behavior? Ex. if you deregister but the eval wasn't persisted, but then tried to deregister again, the job would be potentially gone but you'd still want an eval to clean it up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually seems like a bug to me, but I think one that does not impact correctness (only waste some work doing nothing).
From peeking around
generic_sched.go
andreconcile.go
, I'm not seeing us check if the evaluation is a Deregister. We always seem to checked if Job.Stopped istrue
! This means if we fail to update the statestore with the stopped Job, the Deregister evaluation will be the same as any other evaluation and end up being a noop for the job as it is not stopped and presumably already scheduled/allocated.I suspect that @tgross is correct and that the test is merely asserting the non-atomic behavior existed: #981
Since we're making Job+Eval submissions atomic, I think we should at least trying to make this section atomic as well. I can't figure out where a Deregister eval for a non-Stopped job would have a desirable affect, but perhaps somebody else can find a case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to let the current behavior stand as-is, as stopping making an eval is a user visible change. Will follow up in another PR and we can discuss this issue further there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find this reason sufficient to keep it since the eval in question is a
dereg
but the effect would be a noop. If that's the behavior I feel like we're emitting a useless and actively misleading eval which should be treated like a bug and removed. There's no benefit to leaving it in place as anyone observing it would only be confused by its lack of affect.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it should be changed - but I believe such a user-visible behavior (albeit a small one) change is outside the scope of this PR, so I will follow up in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this
args.Eval != nil
is always true; we're returning early from checkingargs.Eval == nil
right above it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, do we need to remove the returning early above for multi region deployment? Returning early means that periodic/dispatch jobs will not be handled by
multiregionStart
? This PR doesn't change the behavior, but just noticed the MRD call.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Periodic and dispatch jobs get kicked off with their normal dispatch mechanisms in MRD. We special case them in
schedule/reconcile.go
, rather than running them through the MRD in deploymentwatcher. (There might be future work there but that's a later phase of work.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I'm tempted to remove the early return and somehow make it more
multiregionStart
that doesn't apply to paramterized/periodic jobs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a bad idea. There's a big blog of
return nil
at the top of the ENT functionality where we return early when we don't need it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'll follow up in another PR - I'm a bit confused about the interaction and would love to do more testing.