-
Notifications
You must be signed in to change notification settings - Fork 3.8k
/
admission.go
581 lines (549 loc) · 28.3 KB
/
admission.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
// Copyright 2021 The Cockroach Authors.
//
// Use of this software is governed by the Business Source License
// included in the file licenses/BSL.txt.
//
// As of the Change Date specified in that file, in accordance with
// the Business Source License, use of this software will be governed
// by the Apache License, Version 2.0, included in the file
// licenses/APL.txt.
// The admission package contains abstractions for admission control for
// CockroachDB nodes, both for single-tenant and multi-tenant (aka serverless)
// clusters. In the latter, both KV and SQL nodes are expected to use these
// abstractions.
//
// Admission control has the goal of
// - Limiting node overload, so that bad things don't happen due to starvation
// of resources.
// - Providing performance isolation between low and high importance
// activities, so that overload caused by the latter does not impact the
// latency of the former. Additionally, for multi-tenant KV nodes, the
// isolation should extend to inter-tenant performance isolation.
// Isolation is strictly harder than limiting node overload, and the
// abstractions here are likely to be average quality in doing so.
//
// At a high-level we are trying to shift queueing from system-provided
// resource allocation abstractions that we do not control, like the goroutine
// scheduler, to queueing in admission control, where we can reorder. This
// needs to be done while maintaining high utilization of the resource.
//
// Note that everything here operates at a single node level, and not at a
// cluster level. Cluster level admission control is insufficient for limiting
// node overload or to provide performance isolation in a distributed system
// with strong work affinity (which is true for a stateful system like
// CockroachDB, since rebalancing operates at time scales that can be higher
// than what we need). Cluster level admission control can complement node
// level admission control, in that it can prevent severe abuse, or provide
// cost controls to tenants.
//
// It is possible to also have intermediate mechanisms that gate admission of
// work on load signals of all the nodes in the raft group of the range. This
// could be especially useful for writes where non-leaseholder nodes could be
// suffering from cpu or disk IO overload. This is not considered in the
// following interfaces.
//
// TODO(sumeer): describe more of the design thinking documented in
// https://github.com/sumeerbhola/cockroach/blob/27ab4062ad1b036ab1e686a66a04723bd9f2b5a0/pkg/util/cpupool/cpu_pool.go
// either in a comment here or a separate RFC.
//
// TODO(sumeer): update with all the recent changes.
// Internal organization:
//
// The package is mostly structured as a set of interfaces that are meant to
// provide a general framework, and specific implementations that are
// initially quite simple in their heuristics but may become more
// sophisticated over time. The concrete abstractions:
// - Tokens and slots are the two ways admission is granted (see grantKind).
// - Categorization of kinds of work (see WorkKind), and a priority ordering
// across WorkKinds that is used to reflect their shared need for underlying
// resources.
// - The top-level GrantCoordinator which coordinates grants across these
// WorkKinds. The WorkKinds handled by an instantiation of GrantCoordinator
// will differ for single-tenant clusters, and multi-tenant clusters
// consisting of (multi-tenant) KV nodes and (single-tenant) SQL nodes.
//
// The interfaces involved:
// - requester: handles all requests for a particular WorkKind. Implemented by
// WorkQueue. The requester implementation is responsible for controlling
// the admission order within a WorkKind based on tenant fairness,
// importance of work etc.
// - granter: the counterpart to requester which grants admission tokens or
// slots. The implementations are slotGranter, tokenGranter,
// kvStoreTokenGranter. The implementation of requester interacts with the
// granter interface.
// - granterWithLockedCalls: this is an extension of granter that is used
// as part of the implementation of GrantCoordinator. This arrangement
// is partly to centralize locking in the GrantCoordinator (except for
// the lock in WorkQueue).
// - cpuOverloadIndicator: this serves as an optional additional gate on
// granting, by providing an (ideally) instantaneous signal of cpu overload.
// The kvSlotAdjuster is the concrete implementation, except for SQL
// nodes, where this will be implemented by sqlNodeCPUOverloadIndicator.
// CPULoadListener is also implemented by these structs, to listen to
// the latest CPU load information from the scheduler.
//
// Load observation and slot count or token burst adjustment: Dynamic
// adjustment is performed by kvSlotAdjuster for KVWork slots. This is because
// KVWork is expected to usually be CPU bound (due to good caching), and
// unlike SQLKVResponseWork and SQLSQLResponseWork (which are even more CPU
// bound), we have a completion indicator -- so we can expect to have a
// somewhat stable KVWork slot count even if the work sizes are extremely
// heterogeneous.
//
// There isn't token burst adjustment (except for each store -- see below),
// and the burst limits should be chosen to err on the side of fully
// saturating CPU, since we have the fallback of the cpuOverloadIndicator to
// stop granting even if tokens are available. If we figure out a way to
// dynamically tune the token burst count, or (even more ambitious) figure out
// a way to come up with a token rate, it should fit in the general framework
// that is setup here.
//
// Partial usage example (regular cluster):
//
// var metricRegistry *metric.Registry = ...
// coord, metrics := admission.NewGrantCoordinator(admission.Options{...})
// for i := range metrics {
// registry.AddMetricStruct(metrics[i])
// }
// kvQueue := coord.GetWorkQueue(admission.KVWork)
// // Pass kvQueue to server.Node that implements kvpb.InternalServer.
// ...
// // Do similar things with the other WorkQueues.
//
// Usage of WorkQueue for KV:
// // Before starting some work
// if enabled, err := kvQueue.Admit(ctx, WorkInfo{TenantID: tid, ...}); err != nil {
// return err
// }
// doWork()
// if enabled { kvQueue.AdmittedWorkDone(tid) }
// Additionally, each store has a single StoreWorkQueue and GrantCoordinator
// for writes. See kvStoreTokenGranter and how its tokens are dynamically
// adjusted based on Pebble metrics.
package admission
import (
"time"
"github.com/cockroachdb/cockroach/pkg/util/admission/admissionpb"
"github.com/cockroachdb/errors"
"github.com/cockroachdb/pebble"
)
// requester is an interface implemented by an object that orders admission
// work for a particular WorkKind. See WorkQueue for the implementation of
// requester.
type requester interface {
// hasWaitingRequests returns whether there are any waiting/queued requests
// of this WorkKind.
hasWaitingRequests() bool
// granted is called by a granter to grant admission to a single queued
// request. It returns > 0 if the grant was accepted, else returns 0. A
// grant may not be accepted if the grant raced with request cancellation
// and there are now no waiting requests. The grantChainID is used when
// calling continueGrantChain -- see the comment with that method below.
// When accepted, the return value indicates the number of slots/tokens that
// were used.
// REQUIRES: count <= 1 for slots.
granted(grantChainID grantChainID) int64
close()
}
// granter is paired with a requester in that a requester for a particular
// WorkKind will interact with a granter. See admission.go for an overview of
// how this fits into the overall structure.
type granter interface {
grantKind() grantKind
// tryGet is used by a requester to get slots/tokens for a piece of work
// that has encountered no waiting/queued work. This is the fast path that
// avoids queueing in the requester.
//
// REQUIRES: count > 0. count == 1 for slots.
tryGet(count int64) (granted bool)
// returnGrant is called for:
// - returning slots after use.
// - returning either slots or tokens when the grant raced with the work
// being canceled, and the grantee did not end up doing any work.
//
// The last case occurs despite the return value on the requester.granted
// method -- it is possible that the work was not canceled at the time when
// requester.grant was called, and hence returned a count > 0, but later
// when the goroutine doing the work noticed that it had been granted, there
// is a possibility that that raced with cancellation.
//
// Do not use this for doing store IO-related token adjustments when work is
// done -- that should be done via granterWithStoreReplicatedWorkAdmitted.storeWriteDone.
//
// REQUIRES: count > 0. count == 1 for slots.
returnGrant(count int64)
// tookWithoutPermission informs the granter that a slot or tokens were
// taken unilaterally, without permission. This is useful:
// - Slots: this is useful since KVWork is allowed to bypass admission
// control for high priority internal activities (e.g. node liveness) and
// for KVWork that generates other KVWork (like intent resolution of
// discovered intents). Not bypassing for the latter could result in
// single node or distributed deadlock, and since such work is typically
// not a major (on average) consumer of resources, we consider bypassing
// to be acceptable.
// - Tokens: this is useful when the initial estimated tokens for a unit of
// work turned out to be an underestimate.
//
// Do not use this for doing store IO-related token adjustments when work is
// done -- that should be done via granterWithStoreReplicatedWorkAdmitted.storeWriteDone.
//
// REQUIRES: count > 0. count == 1 for slots.
tookWithoutPermission(count int64)
// continueGrantChain is called by the requester at some point after grant
// was called on the requester. The expectation is that this is called by
// the grantee after its goroutine runs and notices that it has been granted
// a slot/tokens. This provides a natural throttling that reduces grant
// bursts by taking into immediate account the capability of the goroutine
// scheduler to schedule such work.
//
// In an experiment, using such grant chains reduced burstiness of grants by
// 5x and shifted ~2s of latency (at p99) from the scheduler into admission
// control (which is desirable since the latter is where we can
// differentiate between work).
//
// TODO(sumeer): the "grant chain" concept is subtle and under-documented.
// It's easy to go through most of this package thinking it has something to
// do with dependent requests (e.g. intent resolution chains on an end txn).
// It would help for a top-level comment on grantChainID or continueGrantChain
// to spell out what grant chains are, their purpose, and how they work with
// an example.
continueGrantChain(grantChainID grantChainID)
}
// granterWithLockedCalls is an encapsulation of typically one
// granter-requester pair, and for kvStoreTokenGranter of two
// granter-requester pairs (one for each workClass). It is used as an internal
// implementation detail of the GrantCoordinator. An implementer of
// granterWithLockedCalls responds to calls from its granter(s) by calling
// into the GrantCoordinator, which then calls the various *Locked() methods.
// The demuxHandle is meant to be opaque to the GrantCoordinator, and is used
// when this interface encapsulates multiple granter-requester pairs -- it is
// currently used only by kvStoreTokenGranter, where it is a workClass. The
// *Locked() methods are where the differences in slots and various kinds of
// tokens are handled.
type granterWithLockedCalls interface {
// tryGetLocked is the real implementation of tryGet from the granter
// interface. demuxHandle is an opaque handle that was passed into the
// GrantCoordinator.
tryGetLocked(count int64, demuxHandle int8) grantResult
// returnGrantLocked is the real implementation of returnGrant from the
// granter interface. demuxHandle is an opaque handle that was passed into
// the GrantCoordinator.
returnGrantLocked(count int64, demuxHandle int8)
// tookWithoutPermissionLocked is the real implementation of
// tookWithoutPermission from the granter interface. demuxHandle is an
// opaque handle that was passed into the GrantCoordinator.
tookWithoutPermissionLocked(count int64, demuxHandle int8)
// The following methods are for direct use by GrantCoordinator.
// requesterHasWaitingRequests returns whether some requester associated
// with the granter has waiting requests.
requesterHasWaitingRequests() bool
// tryGrantLocked is used to attempt to grant to waiting requests.
tryGrantLocked(grantChainID grantChainID) grantResult
}
// granterWithIOTokens is used to abstract kvStoreTokenGranter for testing.
// The interface is used by the entity that periodically looks at load and
// computes the tokens to grant (ioLoadListener).
type granterWithIOTokens interface {
// setAvailableTokens bounds the available {io,elastic disk bandwidth}
// tokens that can be granted to the value provided in the
// {io,elasticDiskBandwidth}Tokens parameter. elasticDiskBandwidthTokens
// bounds what can be granted to elastic work, and is based on disk
// bandwidth being a bottleneck resource. These are not tight bounds when
// the callee has negative available tokens, due to the use of
// granter.tookWithoutPermission, since in that the case the callee
// increments that negative value with the value provided by tokens. This
// method needs to be called periodically. The return value is the number of
// used tokens in the interval since the prior call to this method. Note
// that tokensUsed can be negative, though that will be rare, since it is
// possible for tokens to be returned.
setAvailableTokens(ioTokens int64, elasticDiskBandwidthTokens int64) (tokensUsed int64)
// getDiskTokensUsedAndReset returns the disk bandwidth tokens used
// since the last such call.
getDiskTokensUsedAndReset() [admissionpb.NumWorkClasses]int64
// setLinearModels supplies the models to use when storeWriteDone or
// storeReplicatedWorkAdmittedLocked is called, to adjust token consumption.
// Note that these models are not used for token adjustment at admission
// time -- that is handled by StoreWorkQueue and is not in scope of this
// granter. This asymmetry is due to the need to use all the functionality
// of WorkQueue at admission time. See the long explanatory comment at the
// beginning of store_token_estimation.go, regarding token estimation.
setLinearModels(l0WriteLM, l0IngestLM, ingestLM tokensLinearModel)
}
// granterWithStoreReplicatedWorkAdmitted is used to abstract
// kvStoreTokenGranter for testing. The interface is used by StoreWorkQueue to
// pass on sizing information provided when the work is either done (for legacy,
// above-raft IO admission) or admitted (for below-raft, asynchronous admission
// control.
type granterWithStoreReplicatedWorkAdmitted interface {
granter
// storeWriteDone is used by legacy, above-raft IO admission control to
// inform granters of when the write was actually done, post-admission. At
// admit-time we did not have sizing info for these writes, so by
// intercepting these writes at admit time we're able to make any
// outstanding token adjustments in the granter.
storeWriteDone(originalTokens int64, doneInfo StoreWorkDoneInfo) (additionalTokens int64)
// storeReplicatedWorkAdmittedLocked is used by below-raft admission control
// to inform granters of work being admitted in order for them. It's invoked
// with the coord.mu held.
storeReplicatedWorkAdmittedLocked(originalTokens int64, admittedInfo storeReplicatedWorkAdmittedInfo) (additionalTokens int64)
}
// cpuOverloadIndicator is meant to be an instantaneous indicator of cpu
// availability. Since actual scheduler stats are periodic, we prefer to use
// the KV slot availability, since it is instantaneous. The
// cpuOverloadIndicator is used to gate admission of work other than KVWork
// (KVWork only looks at slot availability). An instantaneous indicator limits
// over-admission and queueing in the scheduler, and thereby provider better
// isolation, especially in multi-tenant environments where tenants not
// responsible for a load spike expect to suffer no increase in latency.
//
// In multi-tenant settings, for single-tenant SQL nodes, which do not do KV
// work, we do not have an instantaneous indicator and instead use
// sqlNodeCPUOverloadIndicator.
type cpuOverloadIndicator interface {
isOverloaded() bool
}
// CPULoadListener listens to the latest CPU load information. Currently we
// expect this to be called every 1ms, unless the cpu is extremely
// underloaded. If the samplePeriod is > 1ms, admission control enforcement
// for CPU is disabled.
type CPULoadListener interface {
CPULoad(runnable int, procs int, samplePeriod time.Duration)
}
// storeRequester is used to abstract *StoreWorkQueue for testing.
type storeRequester interface {
requesterClose
getRequesters() [admissionpb.NumWorkClasses]requester
getStoreAdmissionStats() storeAdmissionStats
setStoreRequestEstimates(estimates storeRequestEstimates)
}
// elasticCPULimiter is used to set the CPU utilization limit for elastic work
// (defined as a % of available system CPU).
type elasticCPULimiter interface {
getUtilizationLimit() float64
setUtilizationLimit(limit float64)
hasWaitingRequests() bool
computeUtilizationMetric()
}
// SchedulerLatencyListener listens to the latest scheduler latency data. We
// expect this to be called every scheduler_latency.sample_period.
type SchedulerLatencyListener interface {
SchedulerLatency(p99, period time.Duration)
}
// grantKind represents the two kind of ways we grant admission: using a slot
// or a token. The slot terminology is akin to a scheduler, where a scheduling
// slot must be free for a thread to run. But unlike a scheduler, we don't
// have visibility into the fact that work execution may be blocked on IO. So
// a slot can also be viewed as a limit on concurrency of ongoing work. The
// token terminology is inspired by token buckets. In this case the token is
// handed out for admission but it is not returned (unlike a slot). Unlike a
// token bucket, which shapes the rate, the current implementation (see
// tokenGranter) limits burstiness and does not do rate shaping -- this is
// because it is hard to predict what rate is appropriate given the difference
// in sizes of the work. This lack of rate shaping may change in the future
// and is not a limitation of the interfaces. Similarly, there is no rate
// shaping applied when granting slots and that may also change in the future.
// The main difference between a slot and a token is that a slot is used when
// we can know when the work is complete. Having this extra completion
// information can be advantageous in admission control decisions, so
// WorkKinds where this information is easily available use slots.
//
// StoreGrantCoordinators and its corresponding StoreWorkQueues are a hybrid
// -- they use tokens (as explained later). However, there is useful
// completion information such as how many tokens were actually used, which
// can differ from the up front information, and is utilized to adjust the
// available tokens.
type grantKind int8
const (
slot grantKind = iota
token
)
type grantResult int8
const (
grantSuccess grantResult = iota
// grantFailDueToSharedResource is returned when the granter is unable to
// grant because a shared resource (CPU or memory) is overloaded. For grant
// chains, this is a signal to terminate.
grantFailDueToSharedResource
// grantFailLocal is returned when the granter is unable to grant due to (a)
// a local constraint -- insufficient tokens or slots, or (b) no work is
// waiting.
grantFailLocal
)
// grantChainID is the ID for a grant chain. See continueGrantChain for
// details.
type grantChainID uint64
// WorkKind represents various types of work that are subject to admission
// control.
type WorkKind int8
// The list of WorkKinds are ordered from lower level to higher level, and
// also serves as a hard-wired ordering from most important to least important
// (for details on how this ordering is enacted, see the GrantCoordinator
// code).
//
// KVWork, SQLKVResponseWork, SQLSQLResponseWork are the lower-level work
// units that are expected to be primarily CPU bound (with disk IO for KVWork,
// but cache hit rates are typically high), and expected to be where most of
// the CPU consumption happens. These are prioritized in the order
//
// KVWork > SQLKVResponseWork > SQLSQLResponseWork
//
// The high prioritization of KVWork reduces the likelihood that non-SQL KV
// work will be starved. SQLKVResponseWork is prioritized over
// SQLSQLResponseWork since the former includes leaf DistSQL processing and we
// would like to release memory used up in RPC responses at lower layers of
// RPC tree. We expect that if SQLSQLResponseWork is delayed, it will
// eventually reduce new work being issued, which is a desirable form of
// natural backpressure.
//
// Furthermore, SQLStatementLeafStartWork and SQLStatementRootStartWork are
// prioritized lowest with
//
// SQLStatementLeafStartWork > SQLStatementRootStartWork
//
// This follows the same idea of prioritizing lower layers above higher layers
// since it releases memory caught up in lower layers, and exerts natural
// backpressure on the higher layer.
//
// Consider the example of a less important long-running single statement OLAP
// query competing with more important small OLTP queries in a single node
// setting. Say the OLAP query starts first and uses up all the KVWork slots,
// and the OLTP queries queue up for the KVWork slots. As the OLAP query
// KVWork completes, it will queue up for SQLKVResponseWork, which will not
// start because the OLTP queries are using up all available KVWork slots. As
// this OLTP KVWork completes, their SQLKVResponseWork will queue up. The
// WorkQueue for SQLKVResponseWork, when granting tokens, will first admit
// those for the more important OLTP queries. This will prevent or slow down
// admission of further work by the OLAP query.
//
// In an ideal world with the only shared resource (across WorkKinds) being
// CPU, and control over the CPU scheduler, we could pool all work, regardless
// of WorkKind into a single queue, and would not need to rely on this
// indirect backpressure and hard-wired ordering. However, we do not have
// control over the CPU scheduler, so we cannot preempt work with widely
// different cpu consumption. Additionally, (non-preemptible) memory is also a
// shared resource, and we wouldn't want to have partially done KVWork not
// finish, due to preemption in the CPU scheduler, since it can be holding
// significant amounts of memory (e.g. in scans).
//
// The aforementioned prioritization also enables us to get instantaneous
// feedback on CPU resource overload. This instantaneous feedback for a grant
// chain (mentioned earlier) happens in two ways:
// - the chain requires the grantee's goroutine to run.
// - the cpuOverloadIndicator (see later), specifically the implementation
// provided by kvSlotAdjuster, provides instantaneous feedback (which is
// viable only because KVWork is the highest priority).
//
// Weaknesses of this strict prioritization across WorkKinds:
// - Priority inversion: Lower importance KVWork, not derived from SQL, like
// GC of MVCC versions, will happen before user-facing SQLKVResponseWork.
// This is because the backpressure, described in the example above, does
// not apply to work generated from within the KV layer.
// TODO(sumeer): introduce a KVLowPriWork and put it last in this ordering,
// to get over this limitation.
// - Insufficient competition leading to poor isolation: Putting
// SQLStatementLeafStartWork, SQLStatementRootStartWork in this list, within
// the same GrantCoordinator, does provide node overload protection, but not
// necessarily performance isolation when we have WorkKinds of different
// importance. Consider the same OLAP example above: if the KVWork slots
// being full due to the OLAP query prevents SQLStatementRootStartWork for
// the OLTP queries, the competition is starved out before it has an
// opportunity to submit any KVWork. Given that control over admitting
// SQLStatement{Leaf,Root}StartWork is not primarily about CPU control (the
// lower-level work items are where cpu is consumed), we could decouple
// these two into a separate GrantCoordinator and only gate them with (high)
// fixed slot counts that allow for enough competition, plus a memory
// overload indicator.
// TODO(sumeer): experiment with this approach.
// - Continuing the previous bullet, low priority long-lived
// {SQLStatementLeafStartWork, SQLStatementRootStartWork} could use up all
// the slots, if there was no high priority work for some period of time,
// and therefore starve admission of the high priority work when it does
// appear. The typical solution to this is to put a max on the number of
// slots low priority can use. This would be viable if we did not allow
// arbitrary int8 values to be set for Priority.
const (
// KVWork represents requests submitted to the KV layer, from the same node
// or a different node. They may originate from the SQL layer or the KV
// layer.
KVWork WorkKind = iota
// SQLKVResponseWork is response processing in SQL for a KV response from a
// local or remote node. This can be either leaf or root DistSQL work, i.e.,
// this is inter-layer and not necessarily inter-node.
SQLKVResponseWork
// SQLSQLResponseWork is response processing in SQL, for DistSQL RPC
// responses. This is root work happening in response to leaf SQL work,
// i.e., it is inter-node.
SQLSQLResponseWork
// SQLStatementLeafStartWork represents the start of leaf-level processing
// for a SQL statement.
SQLStatementLeafStartWork
// SQLStatementRootStartWork represents the start of root-level processing
// for a SQL statement.
SQLStatementRootStartWork
numWorkKinds
)
func workKindString(workKind WorkKind) string {
switch workKind {
case KVWork:
return "kv"
case SQLKVResponseWork:
return "sql-kv-response"
case SQLSQLResponseWork:
return "sql-sql-response"
case SQLStatementLeafStartWork:
return "sql-leaf-start"
case SQLStatementRootStartWork:
return "sql-root-start"
default:
panic(errors.AssertionFailedf("unknown WorkKind"))
}
}
// storeAdmissionStats are stats maintained by a storeRequester. The non-test
// implementation of storeRequester is StoreWorkQueue. StoreWorkQueue updates
// all of these when StoreWorkQueue.AdmittedWorkDone is called, so that these
// cumulative values are mutually consistent.
type storeAdmissionStats struct {
// Total requests that called {Admitted,Bypassed}WorkDone, or in the case of
// replicated writes, the requests that called Admit.
workCount uint64
// Sum of StoreWorkDoneInfo.WriteBytes.
//
// TODO(sumeer): writeAccountedBytes and ingestedAccountedBytes are not
// actually comparable, since the former is uncompressed. We may need to fix
// this inaccuracy if it turns out to be an issue.
writeAccountedBytes uint64
// Sum of StoreWorkDoneInfo.IngestedBytes.
ingestedAccountedBytes uint64
// statsToIgnore represents stats that we should exclude from token
// consumption, and estimation of per-work-tokens. Currently, this is
// limited to range snapshot ingestion. These are likely to usually land in
// levels lower than L0, so may not fit the existing per-work-tokens model
// well. Additionally, we do not want large range snapshots to consume a
// huge number of tokens (see
// https://github.com/cockroachdb/cockroach/pull/80914 for justification --
// that PR is closer to the final solution, and this is a step in that
// direction).
statsToIgnore struct {
pebble.IngestOperationStats
}
// aux represents additional information carried for informational purposes
// (e.g. for logging).
aux struct {
// These bypassed numbers are already included in the corresponding
// {workCount, writeAccountedBytes, ingestedAccountedBytes}.
bypassedCount uint64
writeBypassedAccountedBytes uint64
ingestedBypassedAccountedBytes uint64
}
}
// storeRequestEstimates are estimates that the storeRequester should use for
// its future requests.
type storeRequestEstimates struct {
// writeTokens is the tokens to request at admission time. Must be > 0.
writeTokens int64
}
// PacerFactory is used to construct a new admission.Pacer.
type PacerFactory interface {
NewPacer(unit time.Duration, wi WorkInfo) *Pacer
}
var _ PacerFactory = &ElasticCPUGrantCoordinator{}