Update eval modify index as part of plan apply. #3669

preetapan · 2017-12-18T17:39:46Z

No description provided.

dadgar · 2017-12-18T18:10:57Z

nomad/structs/structs.go

+	// EvalID is the eval ID of the plan being applied. We also update the modify
+	// index of the eval ID as part of applying plan results. This is to ensure that
+	// other workers that are dequeing evaluations don't miss updates that can affect
+	// scheduling decisions.


"This is to ensure..." -> The modify index of the evaluation is updated as part of applying the plan to ensure that subsequent scheduling events for the same job will wait for the index that last produced state changes. This is necessary for blocked evaluations since they can be processed many times, potentially making state updates, without the state of the evaluation itself being updated.

dadgar · 2017-12-18T18:12:21Z

nomad/eval_endpoint_test.go

@@ -286,6 +286,73 @@ func TestEvalEndpoint_Dequeue_WaitIndex(t *testing.T) {
 	}
 }

+func TestEvalEndpoint_Dequeue_UpdateWaitIndex(t *testing.T) {
+	// test enqueing an eval, updating a plan result for the same eval and dequeing the eval


dadgar · 2017-12-18T18:16:41Z

nomad/state/state_store.go

+	}
+	if existing == nil {
+		// return if there isn't an eval with this ID.
+		// In some cases (like snapshot restores), we process evals that are not already in the state store.


Snapshot restores don't go through the normal upsert codepath. They have there own, direct insertion to avoid these problems.

dadgar · 2017-12-18T18:17:30Z

nomad/state/state_store.go

+	if existing == nil {
+		// return if there isn't an eval with this ID.
+		// In some cases (like snapshot restores), we process evals that are not already in the state store.
+		s.logger.Printf("[WARN] state_store: unable to find eval ID %v, cannot update modify index ", evalID)


I would do %q instead and there is an extra space on the end.

dadgar · 2017-12-18T18:18:04Z

nomad/state/state_store.go

+	}
+	eval := existing.(*structs.Evaluation).Copy()
+	// Update the indexes
+	eval.CreateIndex = existing.(*structs.Evaluation).CreateIndex


Shouldn't the copy capture this one?

dadgar · 2017-12-18T18:18:46Z

nomad/state/state_store_test.go

+	evalOut, err := state.EvalByID(ws, eval.ID)
+	assert.Nil(err)
+	assert.NotNil(evalOut)
+	assert.Equal(uint64(1000), evalOut.ModifyIndex)


assert.EqualValues(1000, evalOut.ModifyIndex)

…ply time.

dadgar · 2017-12-18T23:20:58Z

nomad/state/state_store.go

+		return fmt.Errorf("eval lookup failed: %v", err)
+	}
+	if existing == nil {
+		return fmt.Errorf("[ERR] state_store: unable to find eval id %q", evalID)


err := fmt.Errorf("unable to find eval id %q", evalID) s.logger.Printf("[ERR] state_store: %v", err) return err

For this one, do you mind if the second line was just s.logger.Printf(err) so it doesn't double print [ERR] state_store?

jippi · 2017-12-19T14:27:54Z

What does this actually fix in symptoms and problems? Would be nice with some more context of the root cause and fix for it

preetapan · 2017-12-19T15:28:51Z

@jippi It fixes a rare race that shows up under high load and limited resources. This manifested as subtle confusing bugs like duplicate deployments created, or reusing the same alloc index for a different allocation.

Bug: During high load and limited resources, if the scheduler made some placements but was blocked on creating more, and then got unblocked, it could miss the placements it already made.

Root cause: Scheduler workers processing an evaluation and not seeing state updates from previous plans that were applied associated with the same evaluation.

Fix : Update the eval's last modify index when plans are applied, so that scheduler workers can wait for the raft index to catch up before starting to process the eval.

github-actions · 2023-03-14T02:10:50Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Update eval modify index as part of plan apply.

f12255e

preetapan requested a review from dadgar December 18, 2017 17:39

dadgar requested changes Dec 18, 2017

View reviewed changes

dadgar mentioned this pull request Dec 18, 2017

Under high load, repeated use of allocation name #3593

Closed

Preetha Appan added 2 commits December 18, 2017 14:55

Return an error if evaluation doesn't exist in state store at plan ap…

aa35b5b

…ply time.

Address some code review comments

a49db95

preetapan force-pushed the b-planapply-eval-modindex branch from 1223158 to a49db95 Compare December 18, 2017 21:22

dadgar reviewed Dec 18, 2017

View reviewed changes

dadgar and others added 2 commits December 18, 2017 15:51

Handle upgrade path

0401c22

Clean up error logging

039942f

dadgar merged commit 259ee6b into master Dec 19, 2017

dadgar deleted the b-planapply-eval-modindex branch December 19, 2017 00:10

github-actions bot locked as resolved and limited conversation to collaborators Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update eval modify index as part of plan apply. #3669

Update eval modify index as part of plan apply. #3669

preetapan commented Dec 18, 2017

dadgar Dec 18, 2017

dadgar Dec 18, 2017

dadgar Dec 18, 2017

dadgar Dec 18, 2017

dadgar Dec 18, 2017

dadgar Dec 18, 2017

dadgar Dec 18, 2017 •

edited

Loading

preetapan Dec 18, 2017

jippi commented Dec 19, 2017

preetapan commented Dec 19, 2017

github-actions bot commented Mar 14, 2023

Update eval modify index as part of plan apply. #3669

Update eval modify index as part of plan apply. #3669

Conversation

preetapan commented Dec 18, 2017

dadgar Dec 18, 2017

Choose a reason for hiding this comment

dadgar Dec 18, 2017

Choose a reason for hiding this comment

dadgar Dec 18, 2017

Choose a reason for hiding this comment

dadgar Dec 18, 2017

Choose a reason for hiding this comment

dadgar Dec 18, 2017

Choose a reason for hiding this comment

dadgar Dec 18, 2017

Choose a reason for hiding this comment

dadgar Dec 18, 2017 • edited Loading

Choose a reason for hiding this comment

preetapan Dec 18, 2017

Choose a reason for hiding this comment

jippi commented Dec 19, 2017

preetapan commented Dec 19, 2017

github-actions bot commented Mar 14, 2023

dadgar Dec 18, 2017 •

edited

Loading