kvserver: disable pre-AC above-raft AddSST throttling #104861

irfansharif · 2023-06-14T07:53:05Z

These were added way back in #36403 and #73904, pre-dating much of IO admission control for leaseholder writes. With #95563, we now have IO admission control for follower writes. Put together, have ample LSM read-amp protection through AC alone. These concurrency limiters are now redundant and oblivious to more sophisticated AC measures. We recently removed the below-raft equivalents of these limiters (#98762), and like mentioned there, these limiters can exacerbate memory pressure. Separately, we're looking to work on speedier restores, and these limiters are starting to get in the way.

While here, we also disable the pre-ingest delay mechanism in pebble, which too pre-dates AC, introduced way back in #34258 for RocksDB and in #41839 for Pebble. IO AC is able to limit the number of L0 files, and this pre-ingest delay with its maximum per-request delay time of 5s can be less than effective. It's worth noting that the L0 file count threshold at which this pre-ingest delay mechanism kicked in was 20, while AC aims for 1000¹.

This commit doesn't go as far as removing these limiters outright, merely disabling them. This is just out of an overabundance of caution. We can probably remove them once kvflowcontrol.enabled has had >1 release worth of baking time. Until then, it's nice to know we have these old safety hatches. We have ample time in the release to assess fallout from this commit, and also use this increased AddSST concurrency to stress the kvflowcontrol machinery.

Release note: None

The 1000 file limit exists to bound how long it takes to clear L0 completely. Envelope math cribbed from elsewhere: With 2MiB files, 1000 files is ~2GB, which at 40MB/s of compaction throughput (with a compaction slot consistently dedicated to L0) takes < 60s to clear the backlog. So the 'recovery' time is modest in that operators should not need to take manual action. ↩

blathers-crl · 2023-06-14T07:53:11Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2023-06-14T07:53:16Z

This change is

tbg · 2023-06-14T08:23:45Z

pkg/util/limit/limiter.go

@@ -25,6 +25,7 @@ import (
 type ConcurrentRequestLimiter struct {
 	spanName string
 	sem      *quotapool.IntPool
+	disabled bool


Speaking of overabundance of caution, how about an atomic here?

jbowens

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @itsbilal, and @lidorcarmel)

-- commits line 37 at r1:
This envelope math seems very optimistic to me. Typically when the LSM inverts, Lbase is inflated as well. Compactions out of L0 need to rewrite both the bytes in L0 and Lbase. I guess in typical inversions we typically hit the sublevel count limit far before the 1000 limit. We might hit the 1000 file limit in the case where we have lots of nonoverlapping incoming writes, in which case maybe we expect less inflation of Lbase?

irfansharif

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @jbowens, @lidorcarmel, and @tbg)

-- commits line 37 at r1:

I guess in typical inversions we typically hit the sublevel count limit far before the 1000 limit. We might hit the 1000 file limit in the case where we have lots of nonoverlapping incoming writes, in which case maybe we expect less inflation of Lbase?

Yup. BTW, the envelope math was cribbed from (here and in various other places when asking Sumeer to help me reason about AC's file limit).

pkg/util/limit/limiter.go line 28 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Speaking of overabundance of caution, how about an atomic here?

There's something wrong with the locking structure here. I'll get back to this PR later, marking as draft until.

Fixes cockroachdb#102683. Part of cockroachdb#104154. These were added way back in cockroachdb#36403 and cockroachdb#73904, pre-dating much of IO admission control for leaseholder writes. With cockroachdb#95563, we now have IO admission control for follower writes. Put together, have ample LSM read-amp protection through AC alone. These concurrency limiters are now redundant and oblivious to more sophisticated AC measures. We recently removed the below-raft equivalents of these limiters (cockroachdb#98762), and like mentioned there, these limiters can exacerbate memory pressure. Separately, we're looking to work on speedier restores, and these limiters are starting to get in the way. While here, we also disable the pre-ingest delay mechanism in pebble, which too pre-dates AC, introduced way back in cockroachdb#34258 for RocksDB and in \cockroachdb#41839 for Pebble. IO AC is able to limit the number of L0 files, and this pre-ingest delay with its maximum per-request delay time of 5s can be less than effective. It's worth noting that the L0 file count threshold at which this pre-ingest delay mechanism kicked in was 20, while AC aims for 1000[^1]. This commit doesn't go as far as removing these limiters outright, merely disabling them. This is just out of an overabundance of caution. We can probably remove them once kvflowcontrol.enabled has had >1 release worth of baking time. Until then, it's nice to know we have these old safety hatches. We have ample time in the release to assess fallout from this commit, and also use this increased AddSST concurrency to stress the kvflowcontrol machinery. [^1]: The 1000 file limit exists to bound how long it takes to clear L0 completely. Envelope math cribbed from elsewhere: With 2MiB files, 1000 files is ~2GB, which at 40MB/s of compaction throughput (with a compaction slot consistently dedicated to L0) takes < 60s to clear the backlog. So the 'recovery' time is modest in that operators should not need to take manual action Release note: None

irfansharif

Tacked on a commit removing uses of the addsst concurrency limiter cluster settings in roachtests. Things might break (@lidorcarmel saw a restore attempt run into proposal quota pool limits and fail the entire restore job), but we can tackle those independently.

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @jbowens, @lidorcarmel, and @tbg)

pkg/util/limit/limiter.go line 28 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

There's something wrong with the locking structure here. I'll get back to this PR later, marking as draft until.

Removed this field altogether by not trying to use "0" as the disabled mode (math.MaxInt works too, and avoids any additional synchronization or complexity around the limiter being set to enforce 0 concurrency).

Now that we've disabled pre-AC above-raft AddSST throttling by default, remove its uses in tests, specifically: - admission-control/index-backfill - admission-control/database-drop - clearrange/* - import-cancellation/* - tpce/* Do the same for ccl/backupccl:TestFullClusterBackup. It's possible that some of these tests will now fail, perhaps by running into the per-replica quota pool limits and tripping up imports. We'll address them independently (perhaps by improving client code to retry, or only keep a certain amount inflight, or in the medium term, removing the quota pool altogether: cockroachdb#106063. Release note: None

craig · 2023-07-04T21:29:07Z

Canceled.

irfansharif · 2023-07-04T21:29:20Z

bors r+

craig · 2023-07-04T22:01:10Z

Build succeeded:

Bazel Essential CI (Cockroach)

sumeerbhola

Reviewable status: complete! 0 of 0 LGTMs obtained

-- commits line 22 at r3:
nit: even though the cluster setting is named rocksdb.ingest_backpressure.l0_file_count_threshold, it is compared to the number of sub-levels due to the following code in calculatePreIngestDelay. So the AC threshold and the preingest delay thresholds are the same, and AC additionally uses the real L0 file count.

	l0ReadAmp := metrics.Levels[0].NumFiles
	if metrics.Levels[0].Sublevels >= 0 {
		l0ReadAmp = int64(metrics.Levels[0].Sublevels)
	}

irfansharif requested review from tbg and a team June 14, 2023 07:53

irfansharif requested review from a team as code owners June 14, 2023 07:53

irfansharif requested review from lidorcarmel and itsbilal and removed request for a team June 14, 2023 07:53

tbg approved these changes Jun 14, 2023

View reviewed changes

jbowens reviewed Jun 14, 2023

View reviewed changes

irfansharif commented Jun 14, 2023

View reviewed changes

irfansharif marked this pull request as draft June 14, 2023 20:15

irfansharif force-pushed the 230614.noop-addsst-limiters branch from 1352d02 to 32bcef3 Compare July 4, 2023 21:12

irfansharif commented Jul 4, 2023

View reviewed changes

irfansharif marked this pull request as ready for review July 4, 2023 21:17

irfansharif requested review from a team as code owners July 4, 2023 21:17

irfansharif requested review from herkolategan and renatolabs and removed request for a team July 4, 2023 21:17

irfansharif force-pushed the 230614.noop-addsst-limiters branch from 32bcef3 to 71f5557 Compare July 4, 2023 21:29

craig bot merged commit ee1d5cc into cockroachdb:master Jul 4, 2023

irfansharif deleted the 230614.noop-addsst-limiters branch July 4, 2023 22:02

tbg mentioned this pull request Jul 5, 2023

roachtest: tpccbench/nodes=9/cpu=4/multi-region failed #105970

Closed

sumeerbhola reviewed Jul 5, 2023

View reviewed changes

tbg mentioned this pull request Jul 6, 2023

kvserver: splits retrying forever #106285

Closed

irfansharif mentioned this pull request Jul 10, 2023

roachtest: restore/tpce/32TB/inc-count=400/aws/nodes=15/cpus=16 failed #106486

Closed

tbg mentioned this pull request Jul 19, 2023

roachtest: restore/tpce/400GB/aws/nodes=4/cpus=8 failed [CRDB-25503 replication send oom] #106248

Closed

msbutler mentioned this pull request Aug 8, 2023

roachtest: bump AWS provisioned write bandwidth for all restore tests #107609

Closed

irfansharif mentioned this pull request Sep 21, 2023

Nodes consistently OOMS during index backfill on large cluster #97801

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: disable pre-AC above-raft AddSST throttling #104861

kvserver: disable pre-AC above-raft AddSST throttling #104861

irfansharif commented Jun 14, 2023

blathers-crl bot commented Jun 14, 2023

cockroach-teamcity commented Jun 14, 2023

tbg Jun 14, 2023

jbowens left a comment

irfansharif left a comment

irfansharif left a comment

craig bot commented Jul 4, 2023

irfansharif commented Jul 4, 2023

craig bot commented Jul 4, 2023

sumeerbhola left a comment

kvserver: disable pre-AC above-raft AddSST throttling #104861

kvserver: disable pre-AC above-raft AddSST throttling #104861

Conversation

irfansharif commented Jun 14, 2023

Footnotes

blathers-crl bot commented Jun 14, 2023

cockroach-teamcity commented Jun 14, 2023

tbg Jun 14, 2023

Choose a reason for hiding this comment

jbowens left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

craig bot commented Jul 4, 2023

irfansharif commented Jul 4, 2023

craig bot commented Jul 4, 2023

sumeerbhola left a comment

Choose a reason for hiding this comment