Fix multiple bugs with progress deadline handling #4842

dadgar · 2018-11-06T01:07:56Z

Fix an issue in which the deployment watcher would fail the deployment
based on the earliest progress deadline of the deployment regardless of
if the task group has finished.

The PR also makes handling deployment status updates from the client more robust.

Fix an issue in which the deployment watcher would fail the deployment based on the earliest progress deadline of the deployment regardless of if the task group has finished. Further fix an issue where the blocked eval optimization would make it so no evals were created to progress the deployment. To reproduce this issue, prior to this commit, you can create a job with two task groups. The first group has count 1 and resources such that it can not be placed. The second group has count 3, max_parallel=1, and can be placed. Run this first and then update the second group to do a deployment. It will place the first of three, but never progress since there exists a blocked eval. However, that doesn't capture the fact that there are two groups being deployed.

…m the client

preetapan · 2018-11-06T17:39:32Z

nomad/deploymentwatcher/deployment_watcher.go

@@ -419,7 +417,12 @@ FAIL:
 					default:
 					}
 				}
-				deadlineTimer.Reset(next.Sub(time.Now()))
+
+				// If the next deadline is zero, we should not reset the timer


Could this clarify what problem this !zero check is actually fixing. i'm concerned that this means under some conditions the deadlinetimer would never get reset

preetapan · 2018-11-06T20:00:10Z

nomad/deploymentwatcher/deployment_watcher.go

@@ -820,5 +871,10 @@ func (w *deploymentWatcher) jobEvalStatus() (latestIndex uint64, blocked bool, e
 		}
 	}

-	return max, false, nil
+	if max == uint64(0) {


This index returns the max eval index across all jobs, but we care about a single job. We could miss an update if evals were gced before this ran. This should return zero instead.

preetapan · 2018-11-08T19:24:05Z

nomad/state/state_store.go

-		copyAlloc.DeploymentStatus.Canary = true
+	// The client can only set its deployment health and timestamp, so just take
+	// those
+	if copyAlloc.DeploymentStatus != nil && alloc.DeploymentStatus != nil {


why did this change

preetapan

LGTM but there was a drive by fix in the state store that looked like it was about minimizing the amount of settable fields in an alloc update. That didn't seem related to the bugfix,

dadgar · 2018-11-08T21:31:28Z

@preetapan The server was actually panicking if the client sent an update without the deployment status set, so the new code, is more correct and safer.

github-actions · 2023-02-25T02:18:13Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

dadgar added 2 commits November 5, 2018 16:06

more robust merging of the deployment status when getting updates fro…

ccb7440

…m the client

dadgar requested a review from preetapan November 6, 2018 01:08

preetapan changed the title ~~Fix multiple tgs with progress deadline handling~~ Fix multiple bugs with progress deadline handling Nov 6, 2018

preetapan suggested changes Nov 6, 2018

View reviewed changes

preetapan reviewed Nov 6, 2018

View reviewed changes

review fixes

ad15649

preetapan reviewed Nov 8, 2018

View reviewed changes

preetapan approved these changes Nov 8, 2018

View reviewed changes

dadgar merged commit 08b75d4 into master Nov 8, 2018

tantra35 mentioned this pull request Nov 18, 2018

Run job deregistering in a single transaction #4861

Merged

github-actions bot locked as resolved and limited conversation to collaborators Feb 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix multiple bugs with progress deadline handling #4842

Fix multiple bugs with progress deadline handling #4842

dadgar commented Nov 6, 2018

preetapan Nov 6, 2018

preetapan Nov 6, 2018

preetapan Nov 8, 2018

preetapan left a comment

dadgar commented Nov 8, 2018

github-actions bot commented Feb 25, 2023

Fix multiple bugs with progress deadline handling #4842

Fix multiple bugs with progress deadline handling #4842

Conversation

dadgar commented Nov 6, 2018

preetapan Nov 6, 2018

Choose a reason for hiding this comment

preetapan Nov 6, 2018

Choose a reason for hiding this comment

preetapan Nov 8, 2018

Choose a reason for hiding this comment

preetapan left a comment

Choose a reason for hiding this comment

dadgar commented Nov 8, 2018

github-actions bot commented Feb 25, 2023