kvserver: add average cpu nanos per replica #92858

kvoli · 2022-12-01T21:28:56Z

This patch adds partial accounting for the time spent on processing requests, for a specific replica. The time spent is recorded in ReplicaStats and allows for observing the aggregate cpu time spent on replicas via an exported metric rebalancing.cpunanospersecond.

Release note: None

cockroach-teamcity · 2022-12-01T21:29:06Z

This change is

kvoli · 2022-12-02T09:27:26Z

Will link the issue. The extended ci appears like a flake.

andrewbaptist

Is there a reason you didn't add timing to store_raft.processRequestQueue? It seems like some work happens there as well.

Reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @kvoli, and @nvanbenschoten)

pkg/kv/kvserver/metrics.go line 357 at r1 (raw file):

		Unit:        metric.Unit_BYTES,
	}
	metaAverageNanosPerSecond = metric.Metadata{

Nit: I don't really like this variable name (nanos/second is always 1 billion). Maybe something like metaNanosProcessingPerSecond or metaNanosRunningPerSecond or metaCPUNanosPerSecond.

pkg/kv/kvserver/replica_load.go line 27 at r1 (raw file):

	writeBytes    *replicastats.ReplicaStats
	readBytes     *replicastats.ReplicaStats
	nanos         *replicastats.ReplicaStats

Nit: Don't necessarily change in this PR, but the fact that the same operations need to be repeated in 4 different places (construction, split counts, merge counts, and reset) for 6 different stats feels a little like a code smell. Maybe structure these into an array so they can just be looped over once?

pkg/kv/kvserver/replica_metrics.go line 339 at r1 (raw file):

func (r *Replica) RequestLocality() *replicastats.RatedSummary {
	return r.loadStats.batchRequests.SnapshotRatedSummary()
}

What is this method for - it doesn't seem to be called or used anywhere. Should this be in a different PR?

pkg/kv/kvserver/store.go line 3215 at r1 (raw file):

		averageReadBytesPerSecond     float64
		averageWriteBytesPerSecond    float64
		averageNanosPerSecond         float64

Nit: Similar naming issue as above.

kvoli

TYFTR - replied to comments.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @nvanbenschoten)

pkg/kv/kvserver/metrics.go line 357 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

Nit: I don't really like this variable name (nanos/second is always 1 billion). Maybe something like metaNanosProcessingPerSecond or metaNanosRunningPerSecond or metaCPUNanosPerSecond.

Updated to be consistently ...CPUNanosPerSecond across the patch.

pkg/kv/kvserver/replica_load.go line 27 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

Nit: Don't necessarily change in this PR, but the fact that the same operations need to be repeated in 4 different places (construction, split counts, merge counts, and reset) for 6 different stats feels a little like a code smell. Maybe structure these into an array so they can just be looped over once?

I agree that this structure isn't ideal. Overall this area needs a bit of a refactor. A similar vector structure to #91593 is the likely outcome eventually.

This is tracked in #87187.

pkg/kv/kvserver/replica_metrics.go line 339 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

What is this method for - it doesn't seem to be called or used anywhere. Should this be in a different PR?

Removed.

pkg/kv/kvserver/store.go line 3215 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

Nit: Similar naming issue as above.

Updated.

kvoli

Is there a reason you didn't add timing to store_raft.processRequestQueue? It seems like some work happens there as well.

Initially it wasn't included simply because we didn't include it in the cputime prototype. I've added it, good suggestion.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @nvanbenschoten)

nvanbenschoten

It's cool to see this.

What is the plan for tracking and balancing on load in cases where grunning is disabled?

Also, do we intend to split the CPU attribution across work that a replica performs in the leaseholder role vs. work that it performs in the follower replica role? Now that we can attribute work with precision, it seems valuable to measure exactly the amount of work a lease transfer can shed, and then also exactly the amount of work a replica rebalance can shed.

Reviewed 4 of 10 files at r1, 6 of 6 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @kvoli)

pkg/kv/kvserver/store_raft.go line 353 at r2 (raw file):

	// Record the CPU time processing the request for this replica. This is
	// recorded regardless of errors that are encountered.
	grunningStart := grunning.Time()

We have this same logic in a number of places. Consider pushing more of it into a library so that it's easier to use (harder to misuse) and less redundant.

For instance, we use the following pattern elsewhere. As long as it doesn't heap allocate (please check if you go with it), it feels like an improvement:

func (r *Replica) MeasureNanosRunning() func() {
	start := grunning.Time()
	return func() {
		end :=  grunning.Time()
		dur := grunning.Difference(start, end).Nanoseconds()
		r.loadStats.cpuNanos.RecordCount(float64(dur), 0 /* nodeID */)
	}
}

...

// to use
defer r.MeasureNanosRunning()()

pkg/kv/kvserver/store_raft.go line 643 at r2 (raw file):

	stats, err := r.handleRaftReady(ctx, noSnap)
	maybeFatalOnRaftReadyErr(ctx, err)
	grunningStart := grunning.Time()

This is starting the timer after the handleRaftReady call, so we're missing most of the CPU time inside of Raft. Fixing this should improve the accuracy of the CPU attribution.

pkg/kv/kvserver/store_send.go line 173 at r2 (raw file):

	startGrunning := grunning.Time()
	defer func() {
		repl, err := s.GetReplica(ba.RangeID)

Can we find a way to record this without passing back through the replicasByRangeID sync map? We have the Replica down below and it's probably safe to assume that all of the work was done by the last Replica touched in that loop.

pkg/ts/catalog/chart_catalog.go line 689 at r2 (raw file):

			},
			{

nit: stray line

kvoli

What is the plan for tracking and balancing on load in cases where grunning is disabled?

tracking: It will record 0.
balancing on load: call to https://github.com/kvoli/cockroach/blob/bbc219e2b6863aa82b27751deba67b6aea0d27b8/pkg/util/grunning/grunning.go#L43-L43 and set the cluster setting to QPS rebalancing or "legacy" in that case, if Supported returns false.

do we intend to split the CPU attribution across work that a replica performs in the leaseholder role vs. work that it performs in the follower replica role?

That is the intention. I was concerned on the memory usage of adding two different replica stats objects at once. They need to be refactored but I was intending on doing that after merging #91593. I don't see why that should block this so I've added both.

I have opted not to include two separate metrics yet. Keeping the same metric as before.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @nvanbenschoten)

pkg/kv/kvserver/store_raft.go line 353 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We have this same logic in a number of places. Consider pushing more of it into a library so that it's easier to use (harder to misuse) and less redundant.

For instance, we use the following pattern elsewhere. As long as it doesn't heap allocate (please check if you go with it), it feels like an improvement:
func (r *Replica) MeasureNanosRunning() func() {
	start := grunning.Time()
	return func() {
		end :=  grunning.Time()
		dur := grunning.Difference(start, end).Nanoseconds()
		r.loadStats.cpuNanos.RecordCount(float64(dur), 0 /* nodeID */)
	}
}

...

// to use
defer r.MeasureNanosRunning()()

Adopted this method instead.

pkg/kv/kvserver/store_raft.go line 643 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is starting the timer after the handleRaftReady call, so we're missing most of the CPU time inside of Raft. Fixing this should improve the accuracy of the CPU attribution.

That's an error. Updated to be above.

pkg/ts/catalog/chart_catalog.go line 689 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: stray line

Updated.

pkg/kv/kvserver/store_send.go line 173 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can we find a way to record this without passing back through the replicasByRangeID sync map? We have the Replica down below and it's probably safe to assume that all of the work was done by the last Replica touched in that loop.

Moved to repl.SendWithWriteBytes() for now.

kvoli

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @nvanbenschoten)

pkg/kv/kvserver/store_raft.go line 353 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Adopted this method instead.

It looks like it does heap alloc w/ that approach.

This is recent allocs running kv max-rate=10k.

nvanbenschoten

Reviewed 7 of 8 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @kvoli)

pkg/kv/kvserver/store_raft.go line 353 at r2 (raw file):

Previously, kvoli (Austen) wrote…

It looks like it does heap alloc w/ that approach.

Albeit a relatively tiny amount (kv max-rate=1k running).

I took a look at this and was also unable to find a way to avoid the heap allocation. It's unfortunate, but the heap allocation is probably not worth paying for. A slightly worse version ergonomically that avoids the closure allocation is:

// MeasureReqCPUNanos measures the cpu time spent on this replica processing
// requests.
func (r *Replica) MeasureReqCPUNanos(start time.Duration) {
	r.measureNanosRunning(start, func(dur float64) {
		r.loadStats.reqCPUNanos.RecordCount(dur, 0 /* nodeID */)
	})
}

// measureNanosRunning measures the difference in cpu time from when this
// method is called, to when the returned function is called. This difference
// is recorded against the replica's cpu time attribution.
func (r *Replica) measureNanosRunning(start time.Duration, f func(float64)) {
	end := grunning.Time()
	dur := grunning.Difference(start, end).Nanoseconds()
	f(float64(dur))
}

...

func (r *Replica) SendWithWriteBytes(
	ctx context.Context, ba *roachpb.BatchRequest,
) (*roachpb.BatchResponse, *kvadmission.StoreWriteBytes, *roachpb.Error) {
   
    ...

    defer r.MeasureReqCPUNanos(grunning.Time())

    ...

}

Consider switching to an approach along these lines.

pkg/kv/kvserver/store_send.go line 173 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Moved to repl.SendWithWriteBytes() for now.

That seems like a better place for it. Thanks.

This patch adds partial accounting for the time spent on processing requests, for a specific replica. The time spent is recorded in `ReplicaStats` and allows for observing the aggregate cpu time spent on replicas via an exported metric `rebalancing.cpunanospersecond`. resolves cockroachdb#90589 Release note: None

kvoli

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @nvanbenschoten)

pkg/kv/kvserver/store_raft.go line 353 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I took a look at this and was also unable to find a way to avoid the heap allocation. It's unfortunate, but the heap allocation is probably not worth paying for. A slightly worse version ergonomically that avoids the closure allocation is:

// MeasureReqCPUNanos measures the cpu time spent on this replica processing
// requests.
func (r *Replica) MeasureReqCPUNanos(start time.Duration) {
	r.measureNanosRunning(start, func(dur float64) {
		r.loadStats.reqCPUNanos.RecordCount(dur, 0 /* nodeID */)
	})
}

// measureNanosRunning measures the difference in cpu time from when this
// method is called, to when the returned function is called. This difference
// is recorded against the replica's cpu time attribution.
func (r *Replica) measureNanosRunning(start time.Duration, f func(float64)) {
	end := grunning.Time()
	dur := grunning.Difference(start, end).Nanoseconds()
	f(float64(dur))
}

...

func (r *Replica) SendWithWriteBytes(
	ctx context.Context, ba *roachpb.BatchRequest,
) (*roachpb.BatchResponse, *kvadmission.StoreWriteBytes, *roachpb.Error) {
   
    ...

    defer r.MeasureReqCPUNanos(grunning.Time())

    ...

}

Consider switching to an approach along these lines.

looks good, adopted.

nvanbenschoten

Reviewed 1 of 6 files at r3, 4 of 4 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist and @irfansharif)

kvoli · 2022-12-13T14:35:35Z

bors r=nvanbenschoten

craig · 2022-12-13T14:35:39Z

👎 Rejected by code reviews

updated.

kvoli · 2022-12-13T14:38:38Z

bors r=nvanbenschoten

craig · 2022-12-13T15:11:18Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-12-13T16:44:46Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-12-13T17:32:29Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-12-13T20:09:45Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-12-14T00:00:56Z

Build succeeded:

Bazel Essential CI (Cockroach)

kvoli self-assigned this Dec 1, 2022

kvoli requested review from nvanbenschoten, irfansharif and andrewbaptist December 2, 2022 09:26

kvoli marked this pull request as ready for review December 2, 2022 09:26

kvoli requested a review from a team December 2, 2022 09:26

kvoli requested a review from a team as a code owner December 2, 2022 09:26

andrewbaptist previously requested changes Dec 2, 2022

View reviewed changes

kvoli force-pushed the 221201.cputime-acct branch from 7588441 to 2dca625 Compare December 5, 2022 08:40

kvoli commented Dec 5, 2022

View reviewed changes

kvoli requested a review from andrewbaptist December 5, 2022 08:41

kvoli force-pushed the 221201.cputime-acct branch 2 times, most recently from abfdded to 2063297 Compare December 5, 2022 09:14

kvoli commented Dec 5, 2022

View reviewed changes

nvanbenschoten requested changes Dec 6, 2022

View reviewed changes

kvoli force-pushed the 221201.cputime-acct branch 2 times, most recently from 2e6a3dd to bbc219e Compare December 8, 2022 03:28

kvoli commented Dec 8, 2022

View reviewed changes

kvoli requested a review from nvanbenschoten December 8, 2022 08:50

nvanbenschoten reviewed Dec 8, 2022

View reviewed changes

kvoli force-pushed the 221201.cputime-acct branch from bbc219e to 88c86eb Compare December 9, 2022 05:36

kvoli commented Dec 9, 2022

View reviewed changes

kvoli requested a review from nvanbenschoten December 10, 2022 00:27

nvanbenschoten approved these changes Dec 10, 2022

View reviewed changes

craig bot merged commit a2631a3 into cockroachdb:master Dec 14, 2022

kvoli mentioned this pull request Dec 14, 2022

roachtest: kv/splits/nodes=3/quiesce=true failed #93579

Closed

This was referenced Jan 2, 2023

kvserver: explore quiescence removal #94592

Open

roachtest: splits/largerange/size=32GiB,nodes=6 failed [OOM] #94371

Closed

kvoli mentioned this pull request Jan 17, 2023

kvserver: store cpu rebalancing #95380

Closed

This was referenced Mar 6, 2023

grunning: determine whether grunning includes gc assist time #98110

Closed

storage: attribute compaction load to a replica #98377

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: add average cpu nanos per replica #92858

kvserver: add average cpu nanos per replica #92858

kvoli commented Dec 1, 2022

cockroach-teamcity commented Dec 1, 2022

kvoli commented Dec 2, 2022

andrewbaptist left a comment

kvoli left a comment

kvoli left a comment

nvanbenschoten left a comment

kvoli left a comment •

edited

Loading

kvoli left a comment •

edited

Loading

nvanbenschoten left a comment

kvoli left a comment

nvanbenschoten left a comment

kvoli commented Dec 13, 2022

craig bot commented Dec 13, 2022

kvoli commented Dec 13, 2022

craig bot commented Dec 13, 2022

craig bot commented Dec 13, 2022

craig bot commented Dec 13, 2022

craig bot commented Dec 13, 2022

craig bot commented Dec 14, 2022

kvserver: add average cpu nanos per replica #92858

kvserver: add average cpu nanos per replica #92858

Conversation

kvoli commented Dec 1, 2022

cockroach-teamcity commented Dec 1, 2022

kvoli commented Dec 2, 2022

andrewbaptist left a comment

Choose a reason for hiding this comment

kvoli left a comment

Choose a reason for hiding this comment

kvoli left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

kvoli left a comment • edited Loading

Choose a reason for hiding this comment

kvoli left a comment • edited Loading

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

kvoli left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

kvoli commented Dec 13, 2022

craig bot commented Dec 13, 2022

kvoli commented Dec 13, 2022

craig bot commented Dec 13, 2022

craig bot commented Dec 13, 2022

craig bot commented Dec 13, 2022

craig bot commented Dec 13, 2022

craig bot commented Dec 14, 2022

kvoli left a comment •

edited

Loading

kvoli left a comment •

edited

Loading