Normalize plan to increase the plan apply throughput #5602

arshjohar · 2019-04-23T17:25:06Z

This PR adds normalization of the plan to commit only the diff for stopped and preempted allocs to the raft log to enable better throughput. It also starts using omitempty on some of the structs during msgpack serialization to omit the empty fields.

arshjohar · 2019-04-23T17:40:44Z

nomad/state/state_store.go

+			return nil, fmt.Errorf("alloc lookup failed: %v", err)
+		}
+		if alloc == nil {
+			continue


Continuing the discussion from
https://github.com/hashicorp/nomad/pull/5407/files/c242adea8786e7541c8c108c342026730b028ff4#diff-ccbd515c67aa55098b48f1106de134aaR4134

There are a few reasons I don't think returning an error is the right approach here:

We do diffs for only stopped or preempted allocs. If they don’t exist, it means they have already been stopped/preempted. If that's the case, it’s fine for the update to not make it to the alloc in that case.

The raft log has already been dispatched at this point, and this code just applies the log to the state store. I'm not sure it'd even be 'correct' to not apply this to the state store, given that we can't remove the raft log entry. Wouldn't the state store be stuck whenever it has to apply this log entry?

It looks like existing code for preempted allocs dealt with this in the same way.

nomad/nomad/state/state_store.go

Line 236 in 987ed01

if existing == nil {

should not happen unless the plan applier is acting on stale state store data. We asked about returning an error so that its very visible when this happens. ie. this suggestion was for safety

Re: 2) - returning the error will fail the optimistic apply to the leader's state store in

nomad/nomad/plan_apply.go

Line 226 in 95297c6

if snap != nil {

and propagate up to the worker in

nomad/nomad/worker.go

Line 324 in 95297c6

if err := w.srv.RPC("Plan.Submit", &req, &resp); err != nil {

, eventually that will fail the evaluation and retry upon which the scheduler can make a new plan with updated information. The state store will always fail on that entry, but a future entry with correct information should succeed. Also see

nomad/nomad/plan_queue.go

Line 20 in 95297c6

type PlanFuture interface {

and how we block on the future from raft in

nomad/nomad/plan_endpoint.go

Line 43 in 95297c6

result, err := future.Wait()

.

Re: 3) that logic could be worth changing as well, I will address that in a future PR since I am going to be touching the plan applier again after this PR is merged, you don't need to change that right now though.

schmichael · 2019-04-24T15:17:53Z

nomad/util.go

+// MinVersionPlanNormalization is the minimum version to support the
+// normalization of Plan in SubmitPlan, and the denormalization raft log entry committed
+// in ApplyPlanResultsRequest
+var MinVersionPlanNormalization = version.Must(version.NewVersion("0.9.1"))


Suggested change

var MinVersionPlanNormalization = version.Must(version.NewVersion("0.9.1"))

var MinVersionPlanNormalization = version.Must(version.NewVersion("0.9.2"))

…alization

preetapan

LGTM, great work overall

github-actions · 2023-02-11T02:16:42Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

arshjohar added 5 commits April 23, 2019 09:18

Add code for plan normalization

4eedab1

Add tests for plan normalization

f75c6b4

Compat tags

02b832c

Remove allowPlanOptimization from schedulers

97686e3

Add comments to functions, and use require instead of assert

ee268a5

arshjohar commented Apr 23, 2019

View reviewed changes

arshjohar requested review from preetapan and schmichael April 23, 2019 17:43

schmichael approved these changes Apr 24, 2019

View reviewed changes

arshjohar force-pushed the normalized-plan branch from 31ec8d8 to 37f6757 Compare April 24, 2019 18:09

arshjohar added 2 commits April 24, 2019 12:36

Return error when preempted/stopped alloc doesn't exist during denorm…

ab2718c

…alization

Change min version required for plan optimization

23bc1f2

arshjohar force-pushed the normalized-plan branch from 37f6757 to 23bc1f2 Compare April 24, 2019 19:36

preetapan approved these changes Apr 24, 2019

View reviewed changes

arshjohar merged commit 8d48b77 into master Apr 24, 2019

endocrimes deleted the normalized-plan branch April 25, 2019 13:25

github-actions bot locked as resolved and limited conversation to collaborators Feb 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize plan to increase the plan apply throughput #5602

Normalize plan to increase the plan apply throughput #5602

arshjohar commented Apr 23, 2019

arshjohar Apr 23, 2019

preetapan Apr 24, 2019 •

edited

Loading

schmichael Apr 24, 2019

preetapan left a comment

github-actions bot commented Feb 11, 2023

	var MinVersionPlanNormalization = version.Must(version.NewVersion("0.9.1"))
	var MinVersionPlanNormalization = version.Must(version.NewVersion("0.9.2"))

Normalize plan to increase the plan apply throughput #5602

Normalize plan to increase the plan apply throughput #5602

Conversation

arshjohar commented Apr 23, 2019

arshjohar Apr 23, 2019

Choose a reason for hiding this comment

preetapan Apr 24, 2019 • edited Loading

Choose a reason for hiding this comment

schmichael Apr 24, 2019

Choose a reason for hiding this comment

preetapan left a comment

Choose a reason for hiding this comment

github-actions bot commented Feb 11, 2023

preetapan Apr 24, 2019 •

edited

Loading