-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multiple bugs with progress deadline handling #4842
Conversation
Fix an issue in which the deployment watcher would fail the deployment based on the earliest progress deadline of the deployment regardless of if the task group has finished. Further fix an issue where the blocked eval optimization would make it so no evals were created to progress the deployment. To reproduce this issue, prior to this commit, you can create a job with two task groups. The first group has count 1 and resources such that it can not be placed. The second group has count 3, max_parallel=1, and can be placed. Run this first and then update the second group to do a deployment. It will place the first of three, but never progress since there exists a blocked eval. However, that doesn't capture the fact that there are two groups being deployed.
@@ -419,7 +417,12 @@ FAIL: | |||
default: | |||
} | |||
} | |||
deadlineTimer.Reset(next.Sub(time.Now())) | |||
|
|||
// If the next deadline is zero, we should not reset the timer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this clarify what problem this !zero check is actually fixing. i'm concerned that this means under some conditions the deadlinetimer would never get reset
@@ -820,5 +871,10 @@ func (w *deploymentWatcher) jobEvalStatus() (latestIndex uint64, blocked bool, e | |||
} | |||
} | |||
|
|||
return max, false, nil | |||
if max == uint64(0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This index returns the max eval index across all jobs, but we care about a single job. We could miss an update if evals were gced before this ran. This should return zero instead.
copyAlloc.DeploymentStatus.Canary = true | ||
// The client can only set its deployment health and timestamp, so just take | ||
// those | ||
if copyAlloc.DeploymentStatus != nil && alloc.DeploymentStatus != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but there was a drive by fix in the state store that looked like it was about minimizing the amount of settable fields in an alloc update. That didn't seem related to the bugfix,
@preetapan The server was actually panicking if the client sent an update without the deployment status set, so the new code, is more correct and safer. |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Fix an issue in which the deployment watcher would fail the deployment
based on the earliest progress deadline of the deployment regardless of
if the task group has finished.
The PR also makes handling deployment status updates from the client more robust.