workload: jitter the teardown of connections to prevent thundering herd #98689

sean- · 2023-03-15T16:38:11Z

workload: jitter the teardown of connections to prevent thundering herd

This change upgrades workload's use of pgx from v4 to v5 in order to allow jittering the teardown of connections. This change sets a max connection age of 5min and jitters the teardown by 30s. Upgrading to pgx v5 also adds non-blocking pgxpool connection acquisition.

workload: add flags to manage the age and lifecycle of connection pool

Add flags to all workload types to specify:

the max connection age: --max-conn-lifetime duration
the max connection age jitter: --max-conn-lifetime-jitter duration
the max connection idle time: --max-conn-idle-time duration
the connection health check interval: --conn-healthcheck-period duration
the min number of connections in the pool: --min-conns int

workload: add support for remaining pgx query modes

Add support for pgx.QueryExecModeCacheDescribe and pgx.QueryExecModeDescribeExec. Previously, only three of the five query modes were available.

workload: fix race condition when recording histogram data

Release note (cli change): workload jitters teardown of connections to prevent thundering herd impacting P99 latency results.

Release note (cli change): workload utility now has flags to tune the connection pool used for testing. See --conn-healthcheck-period, --min-conns, and the --max-conn-* flags for details.

Release note (cli change): workload now supports every PostgreSQL query mode available via the underlying pgx driver.

cockroach-teamcity · 2023-03-15T16:38:25Z

This change is

This includes one notable fix for `workload` (see #98689): - Fix MaxConnLifetimeJitter Release note: None

sean-

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs, @sravotto, and @srosenberg)

pkg/workload/pgx_helpers.go line 191 at r3 (raw file):

// MultiConnPool.  The max number of connections are warmed up if numConns is
// set to 0.
func (m *MultiConnPool) WarmupConns(ctx context.Context, numConns int) error {

I refactored connection pool warming out of NewMultiConnPool() into WarmupConns() so that a future engineer can potentially disable connection warmups when the --ramp argument is used. Right now --ramp'ed workloads will have all connections established, which is still useful, but at present connections don't ramp with the workload.

srosenberg

The changes seem sound to me. We should run some roachperfs before merging to the reduce the risk of a regression.

Reviewed 1 of 22 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs, @sean-, and @sravotto)

pkg/workload/pgx_helpers.go line 120 at r4 (raw file):

		logLevel = cfg.LogLevel
	}
	maxConnLifetime := 5 * time.Minute

Previously (in v4), this default was 1 hour. What's the rationale for reducing it down to 5 minutes?

pkg/workload/pgx_helpers.go line 160 at r4 (raw file):

			}

			// Disable the automatic prepared statement cache. We've seen a lot of

Actual examples would be preferred. Is prepared statement cache detrimental in a majority of the cases? Does the cache churn have a non-negligible performance penalty for workload throughput?

pkg/workload/pgx_helpers.go line 164 at r4 (raw file):

			connCfg := poolCfg.ConnConfig
			connCfg.DefaultQueryExecMode = pgx.QueryExecModeDescribeExec
			connCfg.StatementCacheCapacity = 0

These two cache capacity defaults corresponds to QueryExecModeCacheStatement and QueryExecModeCacheDescribe. Why are we overriding them when the default mode is QueryExecModeDescribeExec?

pkg/workload/pgx_helpers.go line 237 at r4 (raw file):

	var numWarmupConns int
	if numConns <= 0 {

< 0 is handled above. In fact, it feels like a precondition.

pkg/workload/sql_runner.go line 283 at r4 (raw file):

	args = append(args, simpleQuery)
	copy(args[1:], args)
	args[0] = simpleQuery

This looks like a typo. The first element should be of type pgx.QueryExecMode, not []pgx.QueryExecMode.

pkg/workload/kv/kv.go line 602 at r4 (raw file):

		var tx pgx.Tx
		pl := o.mcp.Get()
		conn, err := pl.Acquire(ctx)

BeginTx does invoke Acquire, why the more verbose version?

sean-

Additional testing would be welcome, but we don't see any major regressions. Thanks to @sravotto, the new binary is already being tested. Specifically, the following is a graph of the before and after, starting at ~8:25. To get this effect, both #98689 and #98701 are required. I expect we will see an improvement in tail latency with this new change, however, both because of the jitter, and because of a change in pgx 5.1.0 where pgx updated their internal version of Puddle to do a non-blocking, background acquisition of a resource:

https://github.com/jackc/pgx/blob/master/CHANGELOG.md#510-november-12-2022
https://github.com/jackc/puddle/blob/master/CHANGELOG.md#130-august-27-2022
jackc/pgx@c00fb5d

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs, @sravotto, and @srosenberg)

pkg/workload/pgx_helpers.go line 120 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Previously (in v4), this default was 1 hour. What's the rationale for reducing it down to 5 minutes?

To allow for reasonable management of crdb clusters. Specifically, if a crdb decommission was kicked off, we don't want to wait for 1hr for connections to drain. Instead, we can kick off a benchmark and wait for only 5min to have the workload driver to recycle all of its connections, which significantly shortens the amount of time required for an operator to wait when performing any kind of maintenance. The cost of establishing a TLS connection isn't so great that it can't be safely amortized over a 5min period of time.

pkg/workload/pgx_helpers.go line 160 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Actual examples would be preferred. Is prepared statement cache detrimental in a majority of the cases? Does the cache churn have a non-negligible performance penalty for workload throughput?

These comments were present before the refactoring required to update to pgx v5; therefore, I didn't revalidate them.

pkg/workload/pgx_helpers.go line 164 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

These two cache capacity defaults corresponds to QueryExecModeCacheStatement and QueryExecModeCacheDescribe. Why are we overriding them when the default mode is QueryExecModeDescribeExec?

Ditto. I wondered the same thing myself but didn't want to open a new investigation into a problem I wasn't attempting to solve.

pkg/workload/pgx_helpers.go line 237 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

< 0 is handled above. In fact, it feels like a precondition.

Updated this check to be == instead. The < 0 check short-circuits the warmup entirely, whereas == 0 auto-sizes the warmup to max conns. There is no change in behavior, but this is slightly more explicit (the previous < 0 short-circuit didn't exist until late in this update).

pkg/workload/sql_runner.go line 283 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

This looks like a typo. The first element should be of type pgx.QueryExecMode, not []pgx.QueryExecMode.

This is the same copy() logic that was there previously and just a slice trick to prepend a value:

cockroach/pkg/workload/sql_runner.go

Lines 280 to 282 in 85c6e38

    
           args = append(args, pgx.QuerySimpleProtocol(true)) 
        
           copy(args[1:], args) 
        
           args[0] = pgx.QuerySimpleProtocol(true)

pkg/workload/kv/kv.go line 602 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

BeginTx does invoke Acquire, why the more verbose version?

TIL. I was doing a strict refactor of what was there. I've updated this to use BeginTx(). Thank you!

This includes one notable fix for `workload` (see #98689): - Fix MaxConnLifetimeJitter Release note: None

nvanbenschoten

Reviewed 9 of 22 files at r1, 4 of 4 files at r8, 13 of 13 files at r11, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs, @sean-, @sravotto, and @srosenberg)

pkg/workload/pgx_helpers.go line 273 at r11 (raw file):

}

// AddPreparedStatement adds the given sql statement to the map of

The comment on this method is off.

pkg/workload/pgx_helpers.go line 323 at r11 (raw file):

		g.Go(func() error {
			if err := sem.Acquire(warmupCtx, 1); err != nil {
				warmupConns <- struct{}{}

With an error group, once any goroutine returns an error, the context is cancelled, so you should be able to select { <-warmupConns, <-warmupCtx } down below instead of doing this dance with the warmup conn accounting and sem release on the error path.

pkg/workload/sql_runner.go line 276 at r11 (raw file):

}

// prependQuerySimpleProtocol inserts pgx.QuerySimpleProtocol(true) at the

What allows us to remove this for the QueryExecModeSimpleProtocol case? Are we telling pgx to use QueryExecModeSimpleProtocol somewhere else?

pkg/workload/indexes/indexes.go line 162 at r11 (raw file):

	}
	cfg := workload.MultiConnPoolCfg{
		ConnHealthCheckPeriod: w.connFlags.ConnHealthCheckPeriod,

This looks tedious and error-prone. Should we pull out a constructor?

sean-

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @renatolabs, @sravotto, and @srosenberg)

pkg/workload/sql_runner.go line 276 at r11 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What allows us to remove this for the QueryExecModeSimpleProtocol case? Are we telling pgx to use QueryExecModeSimpleProtocol somewhere else?

Exec() via pgxpool calls Exec() in pgconn, which always uses the simple query protocol.

pkg/workload/indexes/indexes.go line 162 at r11 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This looks tedious and error-prone. Should we pull out a constructor?

I was cargo-culting the existing style here, but agree this should be revisited.

sean-

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @renatolabs, @sravotto, and @srosenberg)

pkg/workload/indexes/indexes.go line 162 at r11 (raw file):

Previously, sean- (Sean Chittenden) wrote…

I was cargo-culting the existing style here, but agree this should be revisited.

Revisited and created a helper function that creates a workload.MultiConnPoolCfg from a workload.ConnFlags

nvanbenschoten

the change here looks good. Thanks for making it and iterating on this.

Reviewed 1 of 2 files at r13, 1 of 1 files at r14, 6 of 6 files at r15, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @renatolabs, @sean-, @sravotto, and @srosenberg)

pkg/workload/histogram/histogram.go line 76 at r15 (raw file):

func (w *NamedHistogram) Record(elapsed time.Duration) {
	w.prometheusHistogram.Observe(float64(elapsed.Nanoseconds()) / float64(time.Second))
	w.mu.Lock()

Nice find. What brought you here?

sean-

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @bobvawter, @renatolabs, @sravotto, and @srosenberg)

pkg/workload/histogram/histogram.go line 76 at r15 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Nice find. What brought you here?

I started getting desperate to stamp out a race condition and ran the race detector as a quick first pass, and stumbled across this (h/t to @bobvawter for the suggestion). The bug is when running workload with <100% of the connections pre-warmed, one connection Acquire() succeeds but returns an invalid connection, leading to the operator being required to run the test with --tolerate-errors. This shouldn't be necessary but only happens at higher levels of concurrency and with low numbers of pre-warmed connections. The following is for posterity and not something I want to have this PR block on.

I haven't been able to hunt down the bug yet. I thought I resolved this, but by adding debug code, I masked the race condition. Infuriatingly, if I inject debug code in the error branch, the error never presents itself. (Mission accomplished? 😆). Specifically, if you warm up less than 100% of the connections (this is a new feature, so this isn't a regression), then the call to pgxpool.Acquire() can (and frequently does while the workload is just starting), return a nil conn. The err return value varies, and the most common error is read: connection reset by peer. With low levels of concurrency, this doesn't happen, but when using a concurrency of 300 and only warming up one connection, it happens nearly every run.

What's wild is putting in q.Q() (a debugging library I use) in the error path[2] seems to be all that's required to prevent the race from happening. Downgrading from q.Q() to log.Warningf() reintroduces the problem. I've been looking into pgxpool's Acquire() and Release() code paths, as well as jackc/puddle but haven't been able to figure out where the bug is. My current suspicion is a pgxpool.Conn is being Release()'ed back into the pool in an invalid state. I've added AfterRelease() and BeforeAcquire() handlers to try and catch things inside of pgxpool, but struck out there.
I'm starting to suspect there's something subtle in the memory model that isn't being handled correctly in jackc/puddle's Acquire() or Release(). I'm going to continue to pursue this in the background because my gaze is starting to move external to workload.

[1] Sample workload output w/ bug:

$ workload run kv --tolerate-errors --db movr '--concurrency=300' --min-conns 1 '--duration=3s' '--warmup-conns=1' 'postgresql://[email protected]:26257/movr?sslmode=disable&name=url1'
INFO: Build completed successfully, 32 total actions
I230329 15:37:52.285739 1 workload/cli/run.go:622  [T1] 1  random seed: -2816927724130576403
I230329 15:37:52.285813 1 workload/cli/run.go:429  [T1] 2  creating load generator...
I230329 15:37:52.292837 1 workload/cli/run.go:460  [T1] 3  creating load generator... done (took 7.022875ms)
E230329 15:37:52.321974 1 workload/cli/run.go:548  [T1] 4  read tcp 127.0.0.1:52708 -> 127.0.0.1:26257: read: connection reset by peer
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
    1.0s       96        19013.8        19155.6     14.2     23.1     75.5     96.5 write
    2.0s       96        19489.6        19322.9     16.3     26.2     37.7     52.4 write
    3.0s       96        19349.8        19331.8     16.8     28.3     37.7     58.7 write
Highest sequence written: 58300. Can be passed as --write-seq=R58300 to the next run.

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
    3.0s       96          58000        19330.0     15.4     15.7     26.2     39.8     96.5  write

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
    3.0s       96          58000        19330.0     15.4     15.7     26.2     39.8     96.5

[2] Sample patch that masks the problem:

➜ git diff pkg/workload/sql_runner.go
diff --git a/pkg/workload/sql_runner.go b/pkg/workload/sql_runner.go
index 8f579a69258..6c39e657e76 100644
--- a/pkg/workload/sql_runner.go
+++ b/pkg/workload/sql_runner.go
@@ -14,6 +14,7 @@ import (
        "context"
        "fmt"

+       // "github.com/cockroachdb/cockroach/pkg/util/log"
+       "github.com/ryboe/q"
        "github.com/cockroachdb/errors"
        "github.com/jackc/pgx/v5"
        "github.com/jackc/pgx/v5/pgconn"
@@ -110,8 +111,16 @@ type StmtHandle struct {
 // See pgx.Conn.Exec.
 func (h StmtHandle) Exec(ctx context.Context, args ...interface{}) (pgconn.CommandTag, error) {
        h.check()
        conn, err := h.s.sr.mcp.Get().Acquire(ctx)
        if err != nil {
+               // log.Warningf(ctx, "race conditions gunna race: %v", h.s.sr.mcp)
+               q.Q("mcp", h.s.sr.mcp)
                return pgconn.CommandTag{}, err
        }
        defer conn.Release()

sean- · 2023-03-29T15:49:11Z

bors try

craig · 2023-03-29T15:58:15Z

try

Build failed:

Bazel Essential CI (Cockroach)

sean- · 2023-03-29T16:40:51Z

bors retry

craig · 2023-03-29T16:47:04Z

try

Build failed:

Bazel Essential CI (Cockroach)

sean- · 2023-03-29T16:53:29Z

bors retry

craig · 2023-03-29T17:19:48Z

try

Build succeeded:

Bazel Essential CI (Cockroach)

This change upgrades workload's use of pgx from v4 to v5 in order to allow jittering the teardown of connections. This change sets a max connection age of 5min and jitters the teardown by 30s. Upgrading to pgx v5 also adds non-blocking pgxpool connection acquisition. workload: add flags to manage the age and lifecycle of connection pool Add flags to all workload types to specify: * the max connection age: `--max-conn-lifetime duration` * the max connection age jitter: `--max-conn-lifetime-jitter duration` * the max connection idle time: `--max-conn-idle-time duration` * the connection health check interval: `--conn-healthcheck-period duration` * the min number of connections in the pool: `--min-conns int` workload: add support for remaining pgx query modes Add support for pgx.QueryExecModeCacheDescribe and pgx.QueryExecModeDescribeExec. Previously, only three of the five query modes were available. workload: fix race condition when recording histogram data Release note (cli change): workload jitters teardown of connections to prevent thundering herd impacting P99 latency results. Release note (cli change): workload utility now has flags to tune the connection pool used for testing. See `--conn-healthcheck-period`, `--min-conns`, and the `--max-conn-*` flags for details. Release note (cli change): workload now supports every [PostgreSQL query mode](https://github.com/jackc/pgx/blob/fa5fbed497bc75acee05c1667a8760ce0d634cba/conn.go#L167-L182) available via the underlying pgx driver.

sean- · 2023-03-29T17:37:24Z

bors r+

craig · 2023-03-29T21:47:15Z

This PR was included in a batch that successfully built, but then failed to merge into master (it was a non-fast-forward update). It will be automatically retried.

craig · 2023-03-30T03:25:33Z

Build succeeded:

Bazel Essential CI (Cockroach)

sean- requested a review from sravotto March 15, 2023 16:38

sean- self-assigned this Mar 15, 2023

sean- requested a review from a team as a code owner March 15, 2023 16:38

sean- requested review from srosenberg and renatolabs and removed request for a team March 15, 2023 16:38

sean- force-pushed the upgrade-pkg-cmd-workload-pgxv5 branch from 8ddb59e to 3b07d59 Compare March 15, 2023 17:10

sean- mentioned this pull request Mar 15, 2023

go.mod: bump jackc/pgx v5 version #98701

Closed

sean- added a commit that referenced this pull request Mar 15, 2023

go.mod: bump jackc/pgx v5 version

663226d

This includes one notable fix for `workload` (see #98689): - Fix MaxConnLifetimeJitter Release note: None

sean- force-pushed the upgrade-pkg-cmd-workload-pgxv5 branch 3 times, most recently from 7f38a8a to 1e2efd5 Compare March 16, 2023 19:43

sean- commented Mar 16, 2023

View reviewed changes

srosenberg reviewed Mar 16, 2023

View reviewed changes

sean- commented Mar 17, 2023

View reviewed changes

sean- requested a review from srosenberg March 20, 2023 17:11

sean- added a commit that referenced this pull request Mar 20, 2023

go.mod: bump jackc/pgx v5 version

00ef432

This includes one notable fix for `workload` (see #98689): - Fix MaxConnLifetimeJitter Release note: None

sean- requested a review from a team as a code owner March 20, 2023 17:36

nvanbenschoten reviewed Mar 27, 2023

View reviewed changes

sean- commented Mar 27, 2023

View reviewed changes

sean- commented Mar 28, 2023

View reviewed changes

sean- requested a review from nvanbenschoten March 28, 2023 20:06

nvanbenschoten approved these changes Mar 28, 2023

View reviewed changes

sean- commented Mar 29, 2023

View reviewed changes

sean- force-pushed the upgrade-pkg-cmd-workload-pgxv5 branch 2 times, most recently from bdb8e3e to 072c876 Compare March 29, 2023 16:52

craig bot pushed a commit that referenced this pull request Mar 29, 2023

Try #98689:

9ea1b36

sean- force-pushed the upgrade-pkg-cmd-workload-pgxv5 branch from 072c876 to 98e324a Compare March 29, 2023 17:35

craig bot merged commit bba0bde into master Mar 30, 2023

craig bot deleted the upgrade-pkg-cmd-workload-pgxv5 branch March 30, 2023 03:25

renatolabs mentioned this pull request Mar 30, 2023

roachtest: tpccbench/nodes=6/cpu=16/multi-az failed #99523

Closed

ajwerner mentioned this pull request Apr 4, 2023

roachtest: schemachange/index/tpcc/w=100 failed #100212

Closed

srosenberg mentioned this pull request Apr 14, 2023

roachperf: gradual kv95 decline of 10% #101299

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workload: jitter the teardown of connections to prevent thundering herd #98689

workload: jitter the teardown of connections to prevent thundering herd #98689

sean- commented Mar 15, 2023 •

edited

Loading

cockroach-teamcity commented Mar 15, 2023

sean- left a comment

srosenberg left a comment

sean- left a comment

nvanbenschoten left a comment

sean- left a comment

sean- left a comment

nvanbenschoten left a comment

sean- left a comment

sean- commented Mar 29, 2023

craig bot commented Mar 29, 2023

sean- commented Mar 29, 2023

craig bot commented Mar 29, 2023

sean- commented Mar 29, 2023

craig bot commented Mar 29, 2023

sean- commented Mar 29, 2023

craig bot commented Mar 29, 2023

craig bot commented Mar 30, 2023

	args = append(args, pgx.QuerySimpleProtocol(true))
	copy(args[1:], args)
	args[0] = pgx.QuerySimpleProtocol(true)

workload: jitter the teardown of connections to prevent thundering herd #98689

workload: jitter the teardown of connections to prevent thundering herd #98689

Conversation

sean- commented Mar 15, 2023 • edited Loading

cockroach-teamcity commented Mar 15, 2023

sean- left a comment

Choose a reason for hiding this comment

srosenberg left a comment

Choose a reason for hiding this comment

sean- left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

sean- left a comment

Choose a reason for hiding this comment

sean- left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

sean- left a comment

Choose a reason for hiding this comment

sean- commented Mar 29, 2023

craig bot commented Mar 29, 2023

try

sean- commented Mar 29, 2023

craig bot commented Mar 29, 2023

try

sean- commented Mar 29, 2023

craig bot commented Mar 29, 2023

try

sean- commented Mar 29, 2023

craig bot commented Mar 29, 2023

craig bot commented Mar 30, 2023

sean- commented Mar 15, 2023 •

edited

Loading