Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

admission,kvadmission: use tenant cpu consumption for inter-tenant fa… #108364

Merged
merged 1 commit into from
Aug 25, 2023

Conversation

sumeerbhola
Copy link
Collaborator

…irness

Previously, we were using the instantaneous slots consumed, since that code predated the grunning instrumentation for cpu consumption. The reset logic for tenantInfo.used is now the same for WorkQueues that use slots and tokens. Additionally, there was a bug in WorkQueue.adjustTenantTokens in that it forgot to fix the heap -- this is fixed and tested.

Fixes #91533

Epic: none

Release note: None

@sumeerbhola sumeerbhola requested a review from irfansharif August 8, 2023 16:07
@sumeerbhola sumeerbhola requested a review from a team as a code owner August 8, 2023 16:07
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Did you happen to run the admission_control_multitenant_fairness tests with these changes? Would be interesting to see this CPU-time-based fair sharing in play. Would be more interesting with mixed workloads (kv + tpcc).

@@ -383,7 +390,8 @@ func (n *controllerImpl) AdmitKVWork(
func (n *controllerImpl) AdmittedKVWorkDone(ah Handle, writeBytes *StoreWriteBytes) {
n.elasticCPUGrantCoordinator.ElasticCPUWorkQueue.AdmittedWorkDone(ah.elasticCPUWorkHandle)
if ah.callAdmittedWorkDoneOnKVAdmissionQ {
n.kvAdmissionQ.AdmittedWorkDone(ah.tenantID)
cpuTime := grunning.Time() - ah.cpuStart
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use grunning.Difference() instead to paper over #95529.

@@ -105,25 +105,103 @@ closed epoch: 0 tenantHeap len: 2 top tenant: 53
tenant-id: 53 used: 0, w: 1, fifo: -128 waiting work heap: [0: pri: normal-pri, ct: 3, epoch: 0, qt: 100]
tenant-id: 71 used: 1, w: 1, fifo: -128 waiting work heap: [0: pri: low-pri, ct: 4, epoch: 0, qt: 100]

# The system tenant work is done and consumed 10 cpu.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"10 cpu" reads oddly, perhaps "10 cpu nanos" instead? Ditto further below.

@@ -63,7 +63,7 @@ handle: 50ms
admitted-work-done running=10ms allotted=50ms
----
granter: return-grant=40ms
work-queue:
work-queue: adjustTenantUsed: tenantID=system additionalUsed=-40000000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor nit] Could we format this in the time.Duration units like we do for the other parameters? (running=, allotted=, return-grant=). Perhaps avoid the snake casing too, formatting using something like:

work-queue: adjust-used: tenant=system [-/+]40ms

…irness

Previously, we were using the instantaneous slots consumed, since that
code predated the grunning instrumentation for cpu consumption. The
reset logic for tenantInfo.used is now the same for WorkQueues that use
slots and tokens. Additionally, there was a bug in
WorkQueue.adjustTenantTokens in that it forgot to fix the heap -- this
is fixed and tested.

Fixes cockroachdb#91533

Epic: none

Release note: None
@sumeerbhola sumeerbhola requested a review from a team as a code owner August 24, 2023 22:17
@sumeerbhola sumeerbhola requested review from smg260 and renatolabs and removed request for a team August 24, 2023 22:17
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Did you happen to run the admission_control_multitenant_fairness tests with these changes? Would be interesting to see this CPU-time-based fair sharing in play. Would be more interesting with mixed workloads (kv + tpcc).

I didn't. I've added a TODO to alter that test to have different sized work in the two workloads.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @renatolabs, and @smg260)


pkg/kv/kvserver/kvadmission/kvadmission.go line 393 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Use grunning.Difference() instead to paper over #95529.

I forgot about that -- thanks for the reminded. I've changed this to 1 nano instead.


pkg/util/admission/testdata/elastic_cpu_work_queue line 66 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

[minor nit] Could we format this in the time.Duration units like we do for the other parameters? (running=, allotted=, return-grant=). Perhaps avoid the snake casing too, formatting using something like:

work-queue: adjust-used: tenant=system [-/+]40ms

Done


pkg/util/admission/testdata/work_queue line 108 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

"10 cpu" reads oddly, perhaps "10 cpu nanos" instead? Ditto further below.

Done

@sumeerbhola
Copy link
Collaborator Author

bors r=irfansharif

@craig
Copy link
Contributor

craig bot commented Aug 25, 2023

Build succeeded:

@craig craig bot merged commit 1b8b7c8 into cockroachdb:master Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

admission: improve inter-tenant fair sharing of CPU
3 participants