admission,kvadmission: use tenant cpu consumption for inter-tenant fa… #108364

sumeerbhola · 2023-08-08T16:07:11Z

…irness

Previously, we were using the instantaneous slots consumed, since that code predated the grunning instrumentation for cpu consumption. The reset logic for tenantInfo.used is now the same for WorkQueues that use slots and tokens. Additionally, there was a bug in WorkQueue.adjustTenantTokens in that it forgot to fix the heap -- this is fixed and tested.

Fixes #91533

Epic: none

Release note: None

cockroach-teamcity · 2023-08-08T16:07:23Z

This change is

irfansharif

LGTM. Did you happen to run the admission_control_multitenant_fairness tests with these changes? Would be interesting to see this CPU-time-based fair sharing in play. Would be more interesting with mixed workloads (kv + tpcc).

irfansharif · 2023-08-24T17:17:30Z

pkg/kv/kvserver/kvadmission/kvadmission.go

@@ -383,7 +390,8 @@ func (n *controllerImpl) AdmitKVWork(
 func (n *controllerImpl) AdmittedKVWorkDone(ah Handle, writeBytes *StoreWriteBytes) {
 	n.elasticCPUGrantCoordinator.ElasticCPUWorkQueue.AdmittedWorkDone(ah.elasticCPUWorkHandle)
 	if ah.callAdmittedWorkDoneOnKVAdmissionQ {
-		n.kvAdmissionQ.AdmittedWorkDone(ah.tenantID)
+		cpuTime := grunning.Time() - ah.cpuStart


Use grunning.Difference() instead to paper over #95529.

irfansharif · 2023-08-24T17:28:21Z

pkg/util/admission/testdata/work_queue

@@ -105,25 +105,103 @@ closed epoch: 0 tenantHeap len: 2 top tenant: 53
 tenant-id: 53 used: 0, w: 1, fifo: -128 waiting work heap: [0: pri: normal-pri, ct: 3, epoch: 0, qt: 100]
 tenant-id: 71 used: 1, w: 1, fifo: -128 waiting work heap: [0: pri: low-pri, ct: 4, epoch: 0, qt: 100]

+# The system tenant work is done and consumed 10 cpu.


"10 cpu" reads oddly, perhaps "10 cpu nanos" instead? Ditto further below.

irfansharif · 2023-08-24T19:09:44Z

pkg/util/admission/testdata/elastic_cpu_work_queue

@@ -63,7 +63,7 @@ handle:      50ms
 admitted-work-done running=10ms allotted=50ms
 ----
 granter:    return-grant=40ms
-work-queue: 
+work-queue: adjustTenantUsed: tenantID=system additionalUsed=-40000000


[minor nit] Could we format this in the time.Duration units like we do for the other parameters? (running=, allotted=, return-grant=). Perhaps avoid the snake casing too, formatting using something like:

work-queue: adjust-used: tenant=system [-/+]40ms

…irness Previously, we were using the instantaneous slots consumed, since that code predated the grunning instrumentation for cpu consumption. The reset logic for tenantInfo.used is now the same for WorkQueues that use slots and tokens. Additionally, there was a bug in WorkQueue.adjustTenantTokens in that it forgot to fix the heap -- this is fixed and tested. Fixes cockroachdb#91533 Epic: none Release note: None

sumeerbhola

TFTR!

Did you happen to run the admission_control_multitenant_fairness tests with these changes? Would be interesting to see this CPU-time-based fair sharing in play. Would be more interesting with mixed workloads (kv + tpcc).

I didn't. I've added a TODO to alter that test to have different sized work in the two workloads.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @renatolabs, and @smg260)

pkg/kv/kvserver/kvadmission/kvadmission.go line 393 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Use grunning.Difference() instead to paper over #95529.

I forgot about that -- thanks for the reminded. I've changed this to 1 nano instead.

pkg/util/admission/testdata/elastic_cpu_work_queue line 66 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

[minor nit] Could we format this in the time.Duration units like we do for the other parameters? (running=, allotted=, return-grant=). Perhaps avoid the snake casing too, formatting using something like:
work-queue: adjust-used: tenant=system [-/+]40ms

Done

pkg/util/admission/testdata/work_queue line 108 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

"10 cpu" reads oddly, perhaps "10 cpu nanos" instead? Ditto further below.

Done

sumeerbhola · 2023-08-25T01:38:55Z

bors r=irfansharif

craig · 2023-08-25T02:17:50Z

Build succeeded:

Bazel Essential CI (Cockroach)

sumeerbhola requested a review from irfansharif August 8, 2023 16:07

sumeerbhola requested a review from a team as a code owner August 8, 2023 16:07

irfansharif approved these changes Aug 24, 2023

View reviewed changes

sumeerbhola force-pushed the tenant_cpu branch from 970caa0 to 8e307e0 Compare August 24, 2023 22:17

sumeerbhola requested a review from a team as a code owner August 24, 2023 22:17

sumeerbhola requested review from smg260 and renatolabs and removed request for a team August 24, 2023 22:17

sumeerbhola commented Aug 24, 2023

View reviewed changes

craig bot merged commit 1b8b7c8 into cockroachdb:master Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

admission,kvadmission: use tenant cpu consumption for inter-tenant fa… #108364

admission,kvadmission: use tenant cpu consumption for inter-tenant fa… #108364

sumeerbhola commented Aug 8, 2023

cockroach-teamcity commented Aug 8, 2023

irfansharif left a comment

irfansharif Aug 24, 2023

irfansharif Aug 24, 2023

irfansharif Aug 24, 2023

sumeerbhola left a comment

sumeerbhola commented Aug 25, 2023

craig bot commented Aug 25, 2023

admission,kvadmission: use tenant cpu consumption for inter-tenant fa… #108364

admission,kvadmission: use tenant cpu consumption for inter-tenant fa… #108364

Conversation

sumeerbhola commented Aug 8, 2023

cockroach-teamcity commented Aug 8, 2023

irfansharif left a comment

Choose a reason for hiding this comment

irfansharif Aug 24, 2023

Choose a reason for hiding this comment

irfansharif Aug 24, 2023

Choose a reason for hiding this comment

irfansharif Aug 24, 2023

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

sumeerbhola commented Aug 25, 2023

craig bot commented Aug 25, 2023