-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
admission: introduce soft slots for speculative concurrency #78519
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick)
pkg/util/admission/granter.go, line 481 at r1 (raw file):
usedSoftSlots int totalSlots int totalSoftSlots int
don't look at this yet -- I added totalSoftSlots in a last iteration of this code and it is somewhat bolted on without any clear invariants relating totalSlots and totalSoftSlots.
e97d670
to
451f7c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick)
pkg/util/admission/granter.go, line 481 at r1 (raw file):
Previously, sumeerbhola wrote…
don't look at this yet -- I added totalSoftSlots in a last iteration of this code and it is somewhat bolted on without any clear invariants relating totalSlots and totalSoftSlots.
cleaned this up, so it is ready for a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving some questions here for my own understanding, which I felt was lacking as I was performing the benchmarking.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick and @sumeerbhola)
pkg/util/admission/granter.go, line 470 at r2 (raw file):
// causes a delay in returning slots to admission control, and we don't want // that delay to cause admission delay for normal work. Hence, we model slots // granted to background activities as "soft-slots". Granting a soft-slot has
If I understand this correctly, then it is possible that at some point in time totalSlots < usedSlots + usedSoftSlots
, right?
For example:
We could have one total slot available, and we could have a get for a soft slot, which would succeed, followed by a get for a regular slot, which would also succeed.
It seems as if we try to enforce this invariant: usedSoftSlots+usedSlots <= totalSlots
at soft slot grant time, but it doesn't hold throughout the lifetime of the program.
pkg/util/admission/granter.go, line 590 at r2 (raw file):
spareSoftSlots = spareSlots } if spareSoftSlots <= 0 {
spareSlots < 0
could evaluate to true right?
pkg/util/admission/granter.go, line 1508 at r2 (raw file):
// So the natural behavior of the slot adjustments does not guarantee // totalSlots >= totalSoftSlots. We add logic to impose this on top of the
Is this the only invariant which needs to be maintained throughout the lifetime of a program?
Let's say this invariant wasn't true. I'm curious about the bad behaviours which would occur.
totalSoftSlots
is only used in tryGetSoftSlots
for any decision making:
spareSoftSlots := sg.totalSoftSlots - sg.usedSoftSlots
spareSlots := sg.totalSlots - (sg.usedSlots + sg.usedSoftSlots)
if spareSlots < spareSoftSlots {
spareSoftSlots = spareSlots
}
Since, spareSoftSlots
is set to the minimum of the spareSoftSlots
, and spareSlots
, is the totalSlots >= totalSoftSlots
invariant required?
I think we want to make sure that totalSoftSlots
is actually decremented when it should be. Which leads me to the question, why do we have a used > 0
check in tryDecreaseSlots
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick and @sumeerbhola)
pkg/util/admission/granter.go, line 470 at r2 (raw file):
Previously, bananabrick (Arjun Nair) wrote…
If I understand this correctly, then it is possible that at some point in time
totalSlots < usedSlots + usedSoftSlots
, right?For example:
We could have one total slot available, and we could have a get for a soft slot, which would succeed, followed by a get for a regular slot, which would also succeed.It seems as if we try to enforce this invariant:
usedSoftSlots+usedSlots <= totalSlots
at soft slot grant time, but it doesn't hold throughout the lifetime of the program.
Correct. We are willing to over-subscribe for the sake of not blocking regular work. What we tend to observe when there is a high amount of regular work is that the slot utilization stays high (i.e., usedSlots/totalSlots is close to 1). The hope is that in that case totalSoftSlots will be < totalSlots and no soft slots will be granted.
But I've realized there is some dubious thinking here, which I need to correct.
There ought to be two slot limits: totalHighLoadSlots, totalLowLoadSlots, with totalHighLoadSlots >= totalLowLoadSlots. The code tries to do something like this in the slot adjustment logic but isn't quite right.
Then there are two kinds of slot consumers: regular and optional background activities, which will have usedRegularSlots, usedSoftSlots.
The invariant at granting a regular slot: usedRegularSlots <= totalHighLoadSlots
The invariant at granting a soft slot: usedRegularSlots + usedSoftSlots <= totalLowLoadSlots
This isn't just a name change from what is here. We are incorrectly doing usedRegularSlots + usedSoftSlots <= totalHighLoadSlots and usedSoftSlots <= totalLowLoadSlots when granting soft slots. And we are incorrectly doing calls like try{Increase,Decrease}Slots(usedSoftSlots, totalLowLoadSlots).
pkg/util/admission/granter.go, line 590 at r2 (raw file):
Previously, bananabrick (Arjun Nair) wrote…
spareSlots < 0
could evaluate to true right?
Correct
451f7c6
to
1c6b211
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick and @sumeerbhola)
pkg/util/admission/granter.go, line 470 at r2 (raw file):
Previously, sumeerbhola wrote…
Correct. We are willing to over-subscribe for the sake of not blocking regular work. What we tend to observe when there is a high amount of regular work is that the slot utilization stays high (i.e., usedSlots/totalSlots is close to 1). The hope is that in that case totalSoftSlots will be < totalSlots and no soft slots will be granted.
But I've realized there is some dubious thinking here, which I need to correct.
There ought to be two slot limits: totalHighLoadSlots, totalLowLoadSlots, with totalHighLoadSlots >= totalLowLoadSlots. The code tries to do something like this in the slot adjustment logic but isn't quite right.
Then there are two kinds of slot consumers: regular and optional background activities, which will have usedRegularSlots, usedSoftSlots.
The invariant at granting a regular slot: usedRegularSlots <= totalHighLoadSlots
The invariant at granting a soft slot: usedRegularSlots + usedSoftSlots <= totalLowLoadSlotsThis isn't just a name change from what is here. We are incorrectly doing usedRegularSlots + usedSoftSlots <= totalHighLoadSlots and usedSoftSlots <= totalLowLoadSlots when granting soft slots. And we are incorrectly doing calls like try{Increase,Decrease}Slots(usedSoftSlots, totalLowLoadSlots).
I've changed the code along these lines and added a comment that makes a distinction between the squishiness of soft slots and the idea of moderate load vs high load slots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick and @sumeerbhola)
pkg/util/admission/granter.go, line 475 at r3 (raw file):
Thanks for the extra documentation. It answers some of my questions.
I'll think about this:
// the slot is still "using" it, while a soft-slot is "squishy" and in some
// cases we can pretend that it is not being used. Say we are allowed
Recording what we discovered in experiments run by @bananabrick with kv50. The total-moderate-load-slot mechanism doesn't work well in that even though CPU utilization is consistently high, and runnable counts are usually on the high side, compactions are sometimes able to get slots for concurrent compression because the used (moderate-load and regular) slots can be low.
The latter log format is One observation here is that we can afford to be more conservative in calculating total-moderate-load slots since we don't care about saturating CPU for the less important work that is controlled by these slots. So we could use a slow reacting and conservative signal to decide on the value of total-moderate-load slots. Sketch of approach: Keep the current total-moderate-load slot adjustment logic, but add an additional cap on the value using the runnable goroutine observations, since that observation accurately reflects that the CPU load is high even when slot usage is low.
|
1c6b211
to
9827839
Compare
Just rebased this. |
821a044
to
0b5f6eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick and @sumeerbhola)
pkg/util/admission/granter.go
line 1558 at r5 (raw file):
threshold := int(KVSlotAdjusterOverloadThreshold.Get(&kvsa.settings.SV)) // Moderate slots can be adjusted using two separate mechanisms. We tune the
moderateSlotsClamp
is what I came up with to make the tuning of totalModerateSlots
testable. I'm happy to learn about a better way to solve this.
pkg/util/admission/granter.go
line 394 at r6 (raw file):
totalModerateLoadSlots int moderateSlotsClamp int failedSoftSlotsGet bool
I'm unsure that optimization with failedSoftSlotGet
is necessary. It seems like it's handling a specific case, and it didn't seem to do much during benchmarking.
pkg/util/admission/granter.go
line 1600 at r6 (raw file):
// also handle the case where the used slots are a bit less than total // slots, since callers for soft slots don't block. if (usedSlots >= total || (usedSlots >= closeToTotalSlots && kvsa.granter.failedSoftSlotsGet)) &&
I'm not con
0b5f6eb
to
89ca827
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick and @sumeerbhola)
-- commits
line 4 at r7:
threads
-- commits
line 5 at r7:
and later may be used for ...
-- commits
line 8 at r7:
in a way
-- commits
line 15 at r7:
leftover comment. I am assuming the PR description will talk about the approach where this runnable stuff will come up.
pkg/util/admission/granter.go
line 1558 at r5 (raw file):
Previously, bananabrick (Arjun Nair) wrote…
moderateSlotsClamp
is what I came up with to make the tuning oftotalModerateSlots
testable. I'm happy to learn about a better way to solve this.
Is the issue here mainly about printing kvSlotAdjuster state for datadriven testing output?
I don't think we should delay the clamping. It is confusing, and it is possible that something that was clamped to 0 because moderateSlotsClamp was negative now rises to 1, which means we could use up 1 CPU on compression when we don't actually want to.
There's literally one of these structs in a whole CockroachDB process, so nothing here needs to be space efficient. We can keep whatever extra state we want (like ioLoadListener.adjustTokenResult does). So how about keeping the following in the kvSlotAdjuster state
totalModerateSlotsPreClamp
runnableEWMA
moderateSlotsClamp
and then write min(totalModerateSlotsPreClamp, moderateSlotsClamp) to granter.totalModerateSlots.
The next time the additive increase/decrease logic will of course start with granter.totalModerateSlots, as you have here.
pkg/util/admission/granter.go
line 394 at r6 (raw file):
Previously, bananabrick (Arjun Nair) wrote…
I'm unsure that optimization with
failedSoftSlotGet
is necessary. It seems like it's handling a specific case, and it didn't seem to do much during benchmarking.
responded below
pkg/util/admission/granter.go
line 1600 at r6 (raw file):
Previously, bananabrick (Arjun Nair) wrote…
I'm not con
?
I think it's fine to remove failedSoftSlotsGet
and closeToTotalSlots
, since you are not seeing it being useful in experiments.
pkg/util/admission/granter.go
line 393 at r7 (raw file):
totalHighLoadSlots int totalModerateLoadSlots int moderateSlotsClamp int
Can you add code comments for moderateSlotsClamp, failedSoftSlotsGet, runnableEWMA, runnableAlpha.
Also move all these into kvSlotAdjuster
. The only reason kvSlotAdjuster
was reaching into slotGranter
for the total and used slots is that those were sufficient for the slot adjustment logic, and those were needed by the slotGranter for its granting logic. These new fields are only needed for adjustment, so they belong in kvSlotAdjuster.
pkg/util/admission/granter.go
line 457 at r7 (raw file):
} sg.usedSoftSlots += allocatedSlots sg.usedSlotsMetric.Update(int64(sg.usedSoftSlots))
usedSoftSlotsMetric
pkg/util/admission/granter.go
line 465 at r7 (raw file):
defer sg.coord.mu.Unlock() sg.usedSoftSlots -= count sg.usedSlotsMetric.Update(int64(sg.usedSoftSlots))
usedSoftSlotsMetric
pkg/util/admission/granter.go
line 741 at r7 (raw file):
SQLStatementLeafStartWorkSlots: 100, /* arbitrary, and unused */ SQLStatementRootStartWorkSlots: 100, /* arbitrary, and unused */ RunnableAverageAlpha: 0.991, /* Gives enough weight to at least a few hundred samples. */
I think the convention is for alpha to be the multiplier for the latest sample, so ewma = samplealpha + ewma(1-alpha).
There's a bit of a problem here in that the CPULoad can sometimes be reported at 250ms intervals and then we don't want this 0.991 value.
We could have something like
// RunnableAverageAlphaPerMillisecond (R) is used to compute the alpha for the exponentially weighted runnable count. The actual alpha is R * sampleDurationInMillis, clamped by [0.001,0.5]. Note that this value is made configurable via options only for unit testing purposes.
RunnableAverageAlphaPerMillisecond: 0.009
func (kvsa *kvSlotAdjuster) CPULoad(runnable int, procs int, samplePeriod time.Duration) {
alpha = RunnableAverageAlphaPerMillisecond * (samplePeriod/time.Millisecond)
if alpha > 0.5 {
alpha = 0.5
} else if alpha < 0.001 {
alpha = 0.001
}
...
``
pkg/util/admission/granter.go
line 1568 at r7 (raw file):
// the next time CPULoad is called. Since CPULoad is called at least 1000 times // a second, we're only delaying the adjustment of totalModerateSlots by 1ms // which is fine.
Can you cleanup #78519 (comment) and include most of the substance in that, including justification for this design, in a code comment here.
89ca827
to
c711888
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick and @sumeerbhola)
Previously, sumeerbhola wrote…
leftover comment. I am assuming the PR description will talk about the approach where this runnable stuff will come up.
It was from a commit squash. Updated the commit message.
c711888
to
7507986
Compare
7507986
to
870072a
Compare
This is for using additional threads for Pebble compaction compression and later may be used for adjusting the number of concurrent compactions. Since these slots are not acquired or released in a very fine grained manner, we've chosen to model these in a way that allows for over-commitment. The SoftSlotGranter should be used for acquiring and releasing such slots, instead of WorkQueue, since there is no queueing for such slots. Moderate slots can be adjusted using two separate mechanisms. We tune the moderate slots in an additive increase/additive decrease manner based on the current CPU load, and we also use a historic average of the runnable count to clamp down on the moderate slots. We use the runnable count average because work not accounted for by admission control can still use up CPU resources and cause queueing of runnable goroutines. Release note: None
870072a
to
3f1b12b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r8, 1 of 3 files at r9, 2 of 2 files at r10, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick and @sumeerbhola)
pkg/util/admission/granter.go
line 1299 at r10 (raw file):
switch g := coord.granters[i].(type) { case *slotGranter: kvsa := coord.cpuLoadListener.(*kvSlotAdjuster)
Accessing the kvSlotAdjuster
like this doesn't seem great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r8, 1 of 3 files at r9, 2 of 2 files at r10, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @bananabrick and @sumeerbhola)
-- commits
line 18 at r10:
can you change this phrase to the following, since the SQL work is accounted for by admission control -- its just that the accounting is indirect and isn't reflected in the KV slot usage:
... not accounted for by KV admission control slot usage can still ...
pkg/util/admission/granter.go
line 1568 at r7 (raw file):
Previously, sumeerbhola wrote…
Can you cleanup #78519 (comment) and include most of the substance in that, including justification for this design, in a code comment here.
For the future, please add "Done" to the comments that have been addressed so its clearer when doing review passes (I'm assuming you are using reviewable).
pkg/util/admission/granter.go
line 1299 at r10 (raw file):
Previously, bananabrick (Arjun Nair) wrote…
Accessing the
kvSlotAdjuster
like this doesn't seem great.
Just add a comment like
// kvSlotAdjuster is the only production implementation of CPULoadListener for nodes with a KV layer.
Added @irfansharif for awareness of this change -- moderate load slots may be something we end up enhancing and using for less important KV work too, if we find that scheduling latency is to blame for higher latency of more important user facing traffic. @bananabrick github will not me approve this, because I opened the PR and did the initial moderate load/soft slots changes. But I am confident that the spirit of code reviewing has been fully met since we've reviewed each other's changes. So if github lets you stamp an approval, I don't think there is a need to wait. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @bananabrick, @irfansharif, and @sumeerbhola)
pkg/util/admission/granter.go
line 1568 at r7 (raw file):
For the future, please add "Done" to the comments that have been addressed so its clearer when doing review passes (I'm assuming you are using reviewable).
Yea, reviewable used to do that automatically. I presume this was your pr, which I was modifying so the mechanics were different.
bors r+ |
Build failed (retrying...): |
Build succeeded: |
We originally introduced these notions in admission control (cockroachdb#78519) for additional threads for Pebble compaction compression. We envisioned granting these "squishy" slots to background activities and permit work only under periods of low load. In working through cockroachdb#86638 (as part of \cockroachdb#75066), we observed experimentally that the moderate-slots count was not sensitive enough to scheduling latency, and consequently latency observed by foreground traffic. Elastic CPU tokens, the kind now being used for backups, offer an alternative to soft slots. We've since replaced uses of soft slots with elastic CPU tokens. This PR just removes the now dead-code code around soft/moderate load slots (it's better to minimize the number of mechanisms in the admission package) Release note: None
We originally introduced these notions in admission control (cockroachdb#78519) for additional threads for Pebble compaction compression. We envisioned granting these "squishy" slots to background activities and permit work only under periods of low load. In working through cockroachdb#86638 (as part of \cockroachdb#75066), we observed experimentally that the moderate-slots count was not sensitive enough to scheduling latency, and consequently latency observed by foreground traffic. Elastic CPU tokens, the kind now being used for backups, offer an alternative to soft slots. We've since replaced uses of soft slots with elastic CPU tokens. This PR just removes the now dead-code code around soft/moderate load slots (it's better to minimize the number of mechanisms in the admission package). Fixes cockroachdb#95590. Release note: None
95590: admission: remove soft/moderate load slots r=irfansharif a=irfansharif We originally introduced these notions in admission control (#78519) for additional threads for Pebble compaction compression. We envisioned granting these "squishy" slots to background activities and permit work only under periods of low load. In working through #86638 (as part of \#75066), we observed experimentally that the moderate-slots count was not sensitive enough to scheduling latency, and consequently latency observed by foreground traffic. Elastic CPU tokens, the kind now being used for backups, offer an alternative to soft slots. We've since replaced uses of soft slots with elastic CPU tokens. This PR just removes the now dead-code code around soft/moderate load slots (it's better to minimize the number of mechanisms in the admission package). Fixes #88032. Release note: None --- First commit is from #95007. Co-authored-by: irfan sharif <[email protected]>
This is for using additional theads for Pebble compaction compression
and later for adjusting the number of concurrent compactions.
Since these slots are not acquired or released in a very fine grained
manner, we've chosen to model these in way that allows for
over-commitment. The SoftSlotGranter should be used for acquiring
and releasing such slots, instead of WorkQueue, since there is no
queueing for such slots.
Release note: None