Skip to content

Commit

Permalink
Merge #104861
Browse files Browse the repository at this point in the history
104861: kvserver: disable pre-AC above-raft AddSST throttling r=irfansharif a=irfansharif

Fixes #102683. Part of #104154.

These were added way back in #36403 and #73904, pre-dating much of IO admission control for leaseholder writes. With #95563, we now have IO admission control for follower writes. Put together, have ample LSM read-amp protection through AC alone. These concurrency limiters are now redundant and oblivious to more sophisticated AC measures. We recently removed the below-raft equivalents of these limiters (#98762), and like mentioned there, these limiters can exacerbate memory pressure. Separately, we're looking to work on speedier restores, and these limiters are starting to get in the way.

While here, we also disable the pre-ingest delay mechanism in pebble, which too pre-dates AC, introduced way back in #34258 for RocksDB and in \#41839 for Pebble. IO AC is able to limit the number of L0 files, and this pre-ingest delay with its maximum per-request delay time of 5s can be less than effective. It's worth noting that the L0 file count threshold at which this pre-ingest delay mechanism kicked in was 20, while AC aims for 1000[^1].

This commit doesn't go as far as removing these limiters outright, merely disabling them. This is just out of an overabundance of caution. We can probably remove them once kvflowcontrol.enabled has had >1 release worth of baking time. Until then, it's nice to know we have these old safety hatches. We have ample time in the release to assess fallout from this commit, and also use this increased AddSST concurrency to stress the kvflowcontrol machinery.

[^1]: The 1000 file limit exists to bound how long it takes to clear L0 completely. Envelope math cribbed from elsewhere: With 2MiB files, 1000 files is ~2GB, which at 40MB/s of compaction throughput (with a compaction slot consistently dedicated to L0) takes < 60s to clear the backlog. So the 'recovery' time is modest in that operators should not need to take manual action.

Release note: None

Co-authored-by: irfan sharif <[email protected]>
  • Loading branch information
craig[bot] and irfansharif committed Jul 4, 2023
2 parents 36d0de3 + 71f5557 commit ee1d5cc
Show file tree
Hide file tree
Showing 7 changed files with 14 additions and 34 deletions.
1 change: 0 additions & 1 deletion pkg/ccl/backupccl/full_cluster_backup_restore_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,6 @@ CREATE TABLE data2.foo (a int);
// This job will eventually fail since it will run from a new cluster.
sqlDB.Exec(t, `BACKUP data.bank TO 'nodelocal://1/throwawayjob'`)
// Populate system.settings.
sqlDB.Exec(t, `SET CLUSTER SETTING kv.bulk_io_write.concurrent_addsstable_requests = 5`)
sqlDB.Exec(t, `INSERT INTO system.ui (key, value, "lastUpdated") VALUES ($1, $2, now())`, "some_key", "some_val")
// Populate system.comments.
sqlDB.Exec(t, `COMMENT ON TABLE data.bank IS 'table comment string'`)
Expand Down
9 changes: 0 additions & 9 deletions pkg/cmd/roachtest/tests/admission_control_index_backfill.go
Original file line number Diff line number Diff line change
Expand Up @@ -166,15 +166,6 @@ func registerIndexBackfill(r registry.Registry) {
db := c.Conn(ctx, t.L(), 1)
defer db.Close()

// Crank up AddSST concurrency to increase the likelihood
// of getting into IO overload regime due to follower
// activity.
if _, err := db.ExecContext(ctx,
"SET CLUSTER SETTING kv.bulk_io_write.concurrent_addsstable_requests = 3",
); err != nil {
t.Fatal(err)
}

// Defeat https://github.com/cockroachdb/cockroach/issues/98311.
if _, err := db.ExecContext(ctx,
"SET CLUSTER SETTING kv.gc_ttl.strict_enforcement.enabled = false;",
Expand Down
9 changes: 0 additions & 9 deletions pkg/cmd/roachtest/tests/clearrange.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ import (
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test"
"github.com/cockroachdb/cockroach/pkg/roachprod/install"
"github.com/cockroachdb/cockroach/pkg/util/timeutil"
"github.com/cockroachdb/cockroach/pkg/util/version"
)

func registerClearRange(r registry.Registry) {
Expand Down Expand Up @@ -111,14 +110,6 @@ func runClearRange(
t.Status(`restoring tiny table`)
defer t.WorkerStatus()

if t.BuildVersion().AtLeast(version.MustParse("v19.2.0")) {
conn := c.Conn(ctx, t.L(), 1)
if _, err := conn.ExecContext(ctx, `SET CLUSTER SETTING kv.bulk_io_write.concurrent_addsstable_requests = 8`); err != nil {
t.Fatal(err)
}
conn.Close()
}

// Use a 120s connect timeout to work around the fact that the server will
// declare itself ready before it's actually 100% ready. See:
// https://github.com/cockroachdb/cockroach/issues/34897#issuecomment-465089057
Expand Down
6 changes: 0 additions & 6 deletions pkg/cmd/roachtest/tests/import_cancellation.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,12 +90,6 @@ func runImportCancellation(
if _, err := conn.Exec(stmt); err != nil {
t.Fatal(err)
}
// Increase AddSSTable concurrency to speed up the imports. Otherwise the
// lineitem (the largest tpch table) IMPORT will extend the test duration
// significantly.
if _, err := conn.Exec(`SET CLUSTER SETTING kv.bulk_io_write.concurrent_addsstable_requests = 10`); err != nil {
t.Fatal(err)
}

seed := int64(1666467482296309000)
rng := randutil.NewTestRandWithSeed(seed)
Expand Down
7 changes: 1 addition & 6 deletions pkg/cmd/roachtest/tests/tpce.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ type tpceOptions struct {
customers int // --customers
nodes int // use to determine where the workload is run from, how data is partitioned, and workload concurrency
cpus int // used to determine workload concurrency
ssds int // used during cluster init and permitted AddSST concurrency
ssds int // used during cluster init

// Promethues specific flags.
//
Expand Down Expand Up @@ -162,11 +162,6 @@ func runTPCE(ctx context.Context, t test.Test, c cluster.Cluster, opts tpceOptio
{
db := c.Conn(ctx, t.L(), 1)
defer db.Close()
if _, err := db.ExecContext(
ctx, "SET CLUSTER SETTING kv.bulk_io_write.concurrent_addsstable_requests = $1", 4*opts.ssds,
); err != nil {
t.Fatal(err)
}
if _, err := db.ExecContext(
ctx, "SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false",
); err != nil {
Expand Down
6 changes: 3 additions & 3 deletions pkg/kv/kvserver/store.go
Original file line number Diff line number Diff line change
Expand Up @@ -164,15 +164,15 @@ var bulkIOWriteLimit = settings.RegisterByteSizeSetting(
settings.SystemOnly,
"kv.bulk_io_write.max_rate",
"the rate limit (bytes/sec) to use for writes to disk on behalf of bulk io ops",
1<<40,
1<<40, // 1 TiB
).WithPublic()

// addSSTableRequestLimit limits concurrent AddSSTable requests.
var addSSTableRequestLimit = settings.RegisterIntSetting(
settings.SystemOnly,
"kv.bulk_io_write.concurrent_addsstable_requests",
"number of concurrent AddSSTable requests per store before queueing",
1,
math.MaxInt, // unlimited
settings.PositiveInt,
)

Expand All @@ -185,7 +185,7 @@ var addSSTableAsWritesRequestLimit = settings.RegisterIntSetting(
settings.SystemOnly,
"kv.bulk_io_write.concurrent_addsstable_as_writes_requests",
"number of concurrent AddSSTable requests ingested as writes per store before queueing",
10,
math.MaxInt, // unlimited
settings.PositiveInt,
)

Expand Down
10 changes: 10 additions & 0 deletions pkg/storage/engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -1554,6 +1554,13 @@ var ingestDelayTime = settings.RegisterDurationSetting(
time.Second*5,
)

var preIngestDelayEnabled = settings.RegisterBoolSetting(
settings.SystemOnly,
"pebble.pre_ingest_delay.enabled",
"controls whether the pre-ingest delay mechanism is active",
false,
)

// PreIngestDelay may choose to block for some duration if L0 has an excessive
// number of files in it or if PendingCompactionBytesEstimate is elevated. This
// it is intended to be called before ingesting a new SST, since we'd rather
Expand All @@ -1567,6 +1574,9 @@ func preIngestDelay(ctx context.Context, eng Engine, settings *cluster.Settings)
if settings == nil {
return
}
if !preIngestDelayEnabled.Get(&settings.SV) {
return
}
metrics := eng.GetMetrics()
targetDelay := calculatePreIngestDelay(settings, metrics.Metrics)

Expand Down

0 comments on commit ee1d5cc

Please sign in to comment.