Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: add average cpu nanos per replica #92858

Merged
merged 1 commit into from
Dec 14, 2022

Conversation

kvoli
Copy link
Collaborator

@kvoli kvoli commented Dec 1, 2022

This patch adds partial accounting for the time spent on processing requests, for a specific replica. The time spent is recorded in ReplicaStats and allows for observing the aggregate cpu time spent on replicas via an exported metric rebalancing.cpunanospersecond.

Release note: None

@kvoli kvoli self-assigned this Dec 1, 2022
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kvoli kvoli marked this pull request as ready for review December 2, 2022 09:26
@kvoli kvoli requested a review from a team December 2, 2022 09:26
@kvoli kvoli requested a review from a team as a code owner December 2, 2022 09:26
@kvoli
Copy link
Collaborator Author

kvoli commented Dec 2, 2022

Will link the issue. The extended ci appears like a flake.

Copy link
Collaborator

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you didn't add timing to store_raft.processRequestQueue? It seems like some work happens there as well.

Reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @kvoli, and @nvanbenschoten)


pkg/kv/kvserver/metrics.go line 357 at r1 (raw file):

		Unit:        metric.Unit_BYTES,
	}
	metaAverageNanosPerSecond = metric.Metadata{

Nit: I don't really like this variable name (nanos/second is always 1 billion). Maybe something like metaNanosProcessingPerSecond or metaNanosRunningPerSecond or metaCPUNanosPerSecond.


pkg/kv/kvserver/replica_load.go line 27 at r1 (raw file):

	writeBytes    *replicastats.ReplicaStats
	readBytes     *replicastats.ReplicaStats
	nanos         *replicastats.ReplicaStats

Nit: Don't necessarily change in this PR, but the fact that the same operations need to be repeated in 4 different places (construction, split counts, merge counts, and reset) for 6 different stats feels a little like a code smell. Maybe structure these into an array so they can just be looped over once?


pkg/kv/kvserver/replica_metrics.go line 339 at r1 (raw file):

func (r *Replica) RequestLocality() *replicastats.RatedSummary {
	return r.loadStats.batchRequests.SnapshotRatedSummary()
}

What is this method for - it doesn't seem to be called or used anywhere. Should this be in a different PR?


pkg/kv/kvserver/store.go line 3215 at r1 (raw file):

		averageReadBytesPerSecond     float64
		averageWriteBytesPerSecond    float64
		averageNanosPerSecond         float64

Nit: Similar naming issue as above.

@kvoli kvoli force-pushed the 221201.cputime-acct branch from 7588441 to 2dca625 Compare December 5, 2022 08:40
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TYFTR - replied to comments.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @nvanbenschoten)


pkg/kv/kvserver/metrics.go line 357 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

Nit: I don't really like this variable name (nanos/second is always 1 billion). Maybe something like metaNanosProcessingPerSecond or metaNanosRunningPerSecond or metaCPUNanosPerSecond.

Updated to be consistently ...CPUNanosPerSecond across the patch.


pkg/kv/kvserver/replica_load.go line 27 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

Nit: Don't necessarily change in this PR, but the fact that the same operations need to be repeated in 4 different places (construction, split counts, merge counts, and reset) for 6 different stats feels a little like a code smell. Maybe structure these into an array so they can just be looped over once?

I agree that this structure isn't ideal. Overall this area needs a bit of a refactor. A similar vector structure to #91593 is the likely outcome eventually.

This is tracked in #87187.


pkg/kv/kvserver/replica_metrics.go line 339 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

What is this method for - it doesn't seem to be called or used anywhere. Should this be in a different PR?

Removed.


pkg/kv/kvserver/store.go line 3215 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

Nit: Similar naming issue as above.

Updated.

@kvoli kvoli requested a review from andrewbaptist December 5, 2022 08:41
@kvoli kvoli force-pushed the 221201.cputime-acct branch 2 times, most recently from abfdded to 2063297 Compare December 5, 2022 09:14
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you didn't add timing to store_raft.processRequestQueue? It seems like some work happens there as well.

Initially it wasn't included simply because we didn't include it in the cputime prototype. I've added it, good suggestion.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @nvanbenschoten)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cool to see this.

What is the plan for tracking and balancing on load in cases where grunning is disabled?

Also, do we intend to split the CPU attribution across work that a replica performs in the leaseholder role vs. work that it performs in the follower replica role? Now that we can attribute work with precision, it seems valuable to measure exactly the amount of work a lease transfer can shed, and then also exactly the amount of work a replica rebalance can shed.

Reviewed 4 of 10 files at r1, 6 of 6 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @kvoli)


pkg/kv/kvserver/store_raft.go line 353 at r2 (raw file):

	// Record the CPU time processing the request for this replica. This is
	// recorded regardless of errors that are encountered.
	grunningStart := grunning.Time()

We have this same logic in a number of places. Consider pushing more of it into a library so that it's easier to use (harder to misuse) and less redundant.

For instance, we use the following pattern elsewhere. As long as it doesn't heap allocate (please check if you go with it), it feels like an improvement:

func (r *Replica) MeasureNanosRunning() func() {
	start := grunning.Time()
	return func() {
		end :=  grunning.Time()
		dur := grunning.Difference(start, end).Nanoseconds()
		r.loadStats.cpuNanos.RecordCount(float64(dur), 0 /* nodeID */)
	}
}

...

// to use
defer r.MeasureNanosRunning()()

pkg/kv/kvserver/store_raft.go line 643 at r2 (raw file):

	stats, err := r.handleRaftReady(ctx, noSnap)
	maybeFatalOnRaftReadyErr(ctx, err)
	grunningStart := grunning.Time()

This is starting the timer after the handleRaftReady call, so we're missing most of the CPU time inside of Raft. Fixing this should improve the accuracy of the CPU attribution.


pkg/kv/kvserver/store_send.go line 173 at r2 (raw file):

	startGrunning := grunning.Time()
	defer func() {
		repl, err := s.GetReplica(ba.RangeID)

Can we find a way to record this without passing back through the replicasByRangeID sync map? We have the Replica down below and it's probably safe to assume that all of the work was done by the last Replica touched in that loop.


pkg/ts/catalog/chart_catalog.go line 689 at r2 (raw file):

			},
			{

nit: stray line

@kvoli kvoli force-pushed the 221201.cputime-acct branch 2 times, most recently from 2e6a3dd to bbc219e Compare December 8, 2022 03:28
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the plan for tracking and balancing on load in cases where grunning is disabled?

tracking: It will record 0.
balancing on load: call to https://github.com/kvoli/cockroach/blob/bbc219e2b6863aa82b27751deba67b6aea0d27b8/pkg/util/grunning/grunning.go#L43-L43 and set the cluster setting to QPS rebalancing or "legacy" in that case, if Supported returns false.

do we intend to split the CPU attribution across work that a replica performs in the leaseholder role vs. work that it performs in the follower replica role?

That is the intention. I was concerned on the memory usage of adding two different replica stats objects at once. They need to be refactored but I was intending on doing that after merging #91593. I don't see why that should block this so I've added both.

I have opted not to include two separate metrics yet. Keeping the same metric as before.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @nvanbenschoten)


pkg/kv/kvserver/store_raft.go line 353 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We have this same logic in a number of places. Consider pushing more of it into a library so that it's easier to use (harder to misuse) and less redundant.

For instance, we use the following pattern elsewhere. As long as it doesn't heap allocate (please check if you go with it), it feels like an improvement:

func (r *Replica) MeasureNanosRunning() func() {
	start := grunning.Time()
	return func() {
		end :=  grunning.Time()
		dur := grunning.Difference(start, end).Nanoseconds()
		r.loadStats.cpuNanos.RecordCount(float64(dur), 0 /* nodeID */)
	}
}

...

// to use
defer r.MeasureNanosRunning()()

Adopted this method instead.


pkg/kv/kvserver/store_raft.go line 643 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is starting the timer after the handleRaftReady call, so we're missing most of the CPU time inside of Raft. Fixing this should improve the accuracy of the CPU attribution.

That's an error. Updated to be above.


pkg/ts/catalog/chart_catalog.go line 689 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: stray line

Updated.


pkg/kv/kvserver/store_send.go line 173 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can we find a way to record this without passing back through the replicasByRangeID sync map? We have the Replica down below and it's probably safe to assume that all of the work was done by the last Replica touched in that loop.

Moved to repl.SendWithWriteBytes() for now.

Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @nvanbenschoten)


pkg/kv/kvserver/store_raft.go line 353 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Adopted this method instead.

It looks like it does heap alloc w/ that approach.

image

This is recent allocs running kv max-rate=10k.

@kvoli kvoli requested a review from nvanbenschoten December 8, 2022 08:50
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 8 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @kvoli)


pkg/kv/kvserver/store_raft.go line 353 at r2 (raw file):

Previously, kvoli (Austen) wrote…

It looks like it does heap alloc w/ that approach.

image.png

Albeit a relatively tiny amount (kv max-rate=1k running).

I took a look at this and was also unable to find a way to avoid the heap allocation. It's unfortunate, but the heap allocation is probably not worth paying for. A slightly worse version ergonomically that avoids the closure allocation is:

// MeasureReqCPUNanos measures the cpu time spent on this replica processing
// requests.
func (r *Replica) MeasureReqCPUNanos(start time.Duration) {
	r.measureNanosRunning(start, func(dur float64) {
		r.loadStats.reqCPUNanos.RecordCount(dur, 0 /* nodeID */)
	})
}

// measureNanosRunning measures the difference in cpu time from when this
// method is called, to when the returned function is called. This difference
// is recorded against the replica's cpu time attribution.
func (r *Replica) measureNanosRunning(start time.Duration, f func(float64)) {
	end := grunning.Time()
	dur := grunning.Difference(start, end).Nanoseconds()
	f(float64(dur))
}

...

func (r *Replica) SendWithWriteBytes(
	ctx context.Context, ba *roachpb.BatchRequest,
) (*roachpb.BatchResponse, *kvadmission.StoreWriteBytes, *roachpb.Error) {
   
    ...

    defer r.MeasureReqCPUNanos(grunning.Time())

    ...

}

Consider switching to an approach along these lines.


pkg/kv/kvserver/store_send.go line 173 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Moved to repl.SendWithWriteBytes() for now.

That seems like a better place for it. Thanks.

This patch adds partial accounting for the time spent on processing
requests, for a specific replica. The time spent is recorded in
`ReplicaStats` and allows for observing the aggregate cpu time spent on
replicas via an exported metric `rebalancing.cpunanospersecond`.

resolves cockroachdb#90589

Release note: None
@kvoli kvoli force-pushed the 221201.cputime-acct branch from bbc219e to 88c86eb Compare December 9, 2022 05:36
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @nvanbenschoten)


pkg/kv/kvserver/store_raft.go line 353 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I took a look at this and was also unable to find a way to avoid the heap allocation. It's unfortunate, but the heap allocation is probably not worth paying for. A slightly worse version ergonomically that avoids the closure allocation is:

// MeasureReqCPUNanos measures the cpu time spent on this replica processing
// requests.
func (r *Replica) MeasureReqCPUNanos(start time.Duration) {
	r.measureNanosRunning(start, func(dur float64) {
		r.loadStats.reqCPUNanos.RecordCount(dur, 0 /* nodeID */)
	})
}

// measureNanosRunning measures the difference in cpu time from when this
// method is called, to when the returned function is called. This difference
// is recorded against the replica's cpu time attribution.
func (r *Replica) measureNanosRunning(start time.Duration, f func(float64)) {
	end := grunning.Time()
	dur := grunning.Difference(start, end).Nanoseconds()
	f(float64(dur))
}

...

func (r *Replica) SendWithWriteBytes(
	ctx context.Context, ba *roachpb.BatchRequest,
) (*roachpb.BatchResponse, *kvadmission.StoreWriteBytes, *roachpb.Error) {
   
    ...

    defer r.MeasureReqCPUNanos(grunning.Time())

    ...

}

Consider switching to an approach along these lines.

looks good, adopted.

@kvoli kvoli requested a review from nvanbenschoten December 10, 2022 00:27
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 6 files at r3, 4 of 4 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist and @irfansharif)

@kvoli
Copy link
Collaborator Author

kvoli commented Dec 13, 2022

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Dec 13, 2022

👎 Rejected by code reviews

@kvoli
Copy link
Collaborator Author

kvoli commented Dec 13, 2022

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Dec 13, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Dec 13, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Dec 13, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Dec 13, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Dec 14, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants