kvserver: log a trace for slow splits #120943

miraradeva · 2024-03-23T21:39:59Z

This change enables logging a trace of a split request if it exceeds a certain duration. The duration is controlled by a new cluster setting: kv.split.slow_split_tracing_threshold, which defaults to 2s.

Fixes: #81152

Release note: None

cockroach-teamcity · 2024-03-23T21:40:07Z

This change is

miraradeva · 2024-03-25T12:57:08Z

Looking for feedback on:

What should the default value for the tracing threshold duration be? I have it at 2s now. Should it be a property of the range that's being split, similarly to the replicate queue's threshold.
Do we want to refactor processOneChangeWithTracing as a function of the base queue? E.g. as a wrapper around process. That way any specific queue that wants tracing for slow operations can just set its own tracing threshold. I didn't do this right away because the different queues implement process differently: some have retries built into process, and others don't. I'm assuming we don't want a trace across multiple retries, but happy to discuss.

kvoli · 2024-03-25T14:00:53Z

What should the default value for the tracing threshold duration be? I have it at 2s now. Should it be a property of the range that's being split, similarly to the replicate queue's threshold.

2s sounds reasonable. Re-using something similar to the replicate queue would be a bit rough to define an expected rate based off the range size given different hardware/conditions. In comparison, the rate for the replicate queue is mostly bounded by the configured snapshot rate.

Do we want to refactor processOneChangeWithTracing as a function of the base queue? E.g. as a wrapper around process. That way any specific queue that wants tracing for slow operations can just set its own tracing threshold. I didn't do this right away because the different queues implement process differently: some have retries built into process, and others don't. I'm assuming we don't want a trace across multiple retries, but happy to discuss.

I didn't take a look at other queues but if this is doable without too much added effort, then that sounds good. We wouldn't want to trace across multiple retries, since a trace on the last retry would be sufficient.

miraradeva · 2024-03-25T18:14:09Z

We wouldn't want to trace across multiple retries, since a trace on the last retry would be sufficient.

I looked at the various queues again. I think if we want to add optional tracing for slow queue processing in the base queue, we'd wrap the timing and tracing logic around this call to process(...). I think that makes sense since the timeout logic is there as well. Almost all queues implement the processing logic as a single attempt, except the replicate queue, which retries within the process(...) function. So I see two options:

Do the above refactor, as long as we're ok with tracing the process in replicate queue with all its retries.
Do this kind of change one queue at a time as needed. I can try to refactor a helper function to do the actual tracing to avoid code duplication.

Edited to add: the refactor in option 2 doesn't seem particularly easy because the internal processing functions (e.g. processAttempt for the split queue, and processOneChange for the replicate queue) don't have the same signatures. So, probably either do option 1 or leave it as is, unless you have another suggestion.

kvoli · 2024-03-26T13:55:24Z

I'd prefer to leave as is for now, unless we won't be backporting this to the 24.1 release branch.

kvoli

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @miraradeva)

pkg/kv/kvserver/client_split_test.go line 1070 at r1 (raw file):

// TestStoreRangeSplitWithTracing tests that the split queue logs traces for
// slow splits.
func TestStoreRangeSplitWithTracing(t *testing.T) {

Nice test!

pkg/kv/kvserver/client_split_test.go line 1074 at r1 (raw file):

	defer log.Scope(t).Close(t)
	l := log.ScopeWithoutShowLogs(t)
	_ = log.SetVModule("split_queue=2")

nit: change this to split_queue=1 to match the expensive log check exactly inprocessAttemptWithTracing.

This change enables logging a trace of a split request if it exceeds a certain duration. The duration is controlled by a new cluster setting: `kv.split.slow_split_tracing_threshold`, which defaults to 2s. Fixes: cockroachdb#81152 Release note: None

miraradeva · 2024-03-26T19:34:19Z

bors r=kvoli

craig · 2024-03-26T20:32:40Z

Build succeeded:

miraradeva marked this pull request as ready for review March 25, 2024 12:57

miraradeva requested a review from a team as a code owner March 25, 2024 12:57

miraradeva requested a review from kvoli March 25, 2024 12:57

kvoli approved these changes Mar 26, 2024

View reviewed changes

kvserver: log a trace for slow splits

6f9b7ec

This change enables logging a trace of a split request if it exceeds a certain duration. The duration is controlled by a new cluster setting: `kv.split.slow_split_tracing_threshold`, which defaults to 2s. Fixes: cockroachdb#81152 Release note: None

miraradeva force-pushed the mira-81152-split-tracing branch from b69bfdf to 6f9b7ec Compare March 26, 2024 17:52

craig bot merged commit 2cf36f7 into cockroachdb:master Mar 26, 2024
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: log a trace for slow splits #120943

kvserver: log a trace for slow splits #120943

miraradeva commented Mar 23, 2024

cockroach-teamcity commented Mar 23, 2024

miraradeva commented Mar 25, 2024

kvoli commented Mar 25, 2024

miraradeva commented Mar 25, 2024 •

edited

Loading

kvoli commented Mar 26, 2024

kvoli left a comment

miraradeva commented Mar 26, 2024

craig bot commented Mar 26, 2024

kvserver: log a trace for slow splits #120943

kvserver: log a trace for slow splits #120943

Conversation

miraradeva commented Mar 23, 2024

cockroach-teamcity commented Mar 23, 2024

miraradeva commented Mar 25, 2024

kvoli commented Mar 25, 2024

miraradeva commented Mar 25, 2024 • edited Loading

kvoli commented Mar 26, 2024

kvoli left a comment

Choose a reason for hiding this comment

miraradeva commented Mar 26, 2024

craig bot commented Mar 26, 2024

miraradeva commented Mar 25, 2024 •

edited

Loading