Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload: jitter the teardown of connections to prevent thundering herd #98689

Merged
merged 1 commit into from
Mar 30, 2023

Conversation

sean-
Copy link
Collaborator

@sean- sean- commented Mar 15, 2023

workload: jitter the teardown of connections to prevent thundering herd

This change upgrades workload's use of pgx from v4 to v5 in order to allow jittering the teardown of connections. This change sets a max connection age of 5min and jitters the teardown by 30s. Upgrading to pgx v5 also adds non-blocking pgxpool connection acquisition.

workload: add flags to manage the age and lifecycle of connection pool

Add flags to all workload types to specify:

  • the max connection age: --max-conn-lifetime duration
  • the max connection age jitter: --max-conn-lifetime-jitter duration
  • the max connection idle time: --max-conn-idle-time duration
  • the connection health check interval: --conn-healthcheck-period duration
  • the min number of connections in the pool: --min-conns int

workload: add support for remaining pgx query modes

Add support for pgx.QueryExecModeCacheDescribe and pgx.QueryExecModeDescribeExec. Previously, only three of the five query modes were available.

workload: fix race condition when recording histogram data

Release note (cli change): workload jitters teardown of connections to prevent thundering herd impacting P99 latency results.

Release note (cli change): workload utility now has flags to tune the connection pool used for testing. See --conn-healthcheck-period, --min-conns, and the --max-conn-* flags for details.

Release note (cli change): workload now supports every PostgreSQL query mode available via the underlying pgx driver.

@sean- sean- requested a review from sravotto March 15, 2023 16:38
@sean- sean- self-assigned this Mar 15, 2023
@sean- sean- requested a review from a team as a code owner March 15, 2023 16:38
@sean- sean- requested review from srosenberg and renatolabs and removed request for a team March 15, 2023 16:38
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@sean- sean- force-pushed the upgrade-pkg-cmd-workload-pgxv5 branch from 8ddb59e to 3b07d59 Compare March 15, 2023 17:10
sean- added a commit that referenced this pull request Mar 15, 2023
This includes one notable fix for `workload` (see #98689):

- Fix MaxConnLifetimeJitter

Release note: None
@sean- sean- force-pushed the upgrade-pkg-cmd-workload-pgxv5 branch 3 times, most recently from 7f38a8a to 1e2efd5 Compare March 16, 2023 19:43
Copy link
Collaborator Author

@sean- sean- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs, @sravotto, and @srosenberg)


pkg/workload/pgx_helpers.go line 191 at r3 (raw file):

// MultiConnPool.  The max number of connections are warmed up if numConns is
// set to 0.
func (m *MultiConnPool) WarmupConns(ctx context.Context, numConns int) error {

I refactored connection pool warming out of NewMultiConnPool() into WarmupConns() so that a future engineer can potentially disable connection warmups when the --ramp argument is used. Right now --ramp'ed workloads will have all connections established, which is still useful, but at present connections don't ramp with the workload.

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes seem sound to me. We should run some roachperfs before merging to the reduce the risk of a regression.

Reviewed 1 of 22 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs, @sean-, and @sravotto)


pkg/workload/pgx_helpers.go line 120 at r4 (raw file):

		logLevel = cfg.LogLevel
	}
	maxConnLifetime := 5 * time.Minute

Previously (in v4), this default was 1 hour. What's the rationale for reducing it down to 5 minutes?


pkg/workload/pgx_helpers.go line 160 at r4 (raw file):

			}

			// Disable the automatic prepared statement cache. We've seen a lot of

Actual examples would be preferred. Is prepared statement cache detrimental in a majority of the cases? Does the cache churn have a non-negligible performance penalty for workload throughput?


pkg/workload/pgx_helpers.go line 164 at r4 (raw file):

			connCfg := poolCfg.ConnConfig
			connCfg.DefaultQueryExecMode = pgx.QueryExecModeDescribeExec
			connCfg.StatementCacheCapacity = 0

These two cache capacity defaults corresponds to QueryExecModeCacheStatement and QueryExecModeCacheDescribe. Why are we overriding them when the default mode is QueryExecModeDescribeExec?


pkg/workload/pgx_helpers.go line 237 at r4 (raw file):

	var numWarmupConns int
	if numConns <= 0 {

< 0 is handled above. In fact, it feels like a precondition.


pkg/workload/sql_runner.go line 283 at r4 (raw file):

	args = append(args, simpleQuery)
	copy(args[1:], args)
	args[0] = simpleQuery

This looks like a typo. The first element should be of type pgx.QueryExecMode, not []pgx.QueryExecMode.


pkg/workload/kv/kv.go line 602 at r4 (raw file):

		var tx pgx.Tx
		pl := o.mcp.Get()
		conn, err := pl.Acquire(ctx)

BeginTx does invoke Acquire, why the more verbose version?

Copy link
Collaborator Author

@sean- sean- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional testing would be welcome, but we don't see any major regressions. Thanks to @sravotto, the new binary is already being tested. Specifically, the following is a graph of the before and after, starting at ~8:25. To get this effect, both #98689 and #98701 are required. I expect we will see an improvement in tail latency with this new change, however, both because of the jitter, and because of a change in pgx 5.1.0 where pgx updated their internal version of Puddle to do a non-blocking, background acquisition of a resource:

https://github.com/jackc/pgx/blob/master/CHANGELOG.md#510-november-12-2022
https://github.com/jackc/puddle/blob/master/CHANGELOG.md#130-august-27-2022
jackc/pgx@c00fb5d

image.png

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs, @sravotto, and @srosenberg)


pkg/workload/pgx_helpers.go line 120 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Previously (in v4), this default was 1 hour. What's the rationale for reducing it down to 5 minutes?

To allow for reasonable management of crdb clusters. Specifically, if a crdb decommission was kicked off, we don't want to wait for 1hr for connections to drain. Instead, we can kick off a benchmark and wait for only 5min to have the workload driver to recycle all of its connections, which significantly shortens the amount of time required for an operator to wait when performing any kind of maintenance. The cost of establishing a TLS connection isn't so great that it can't be safely amortized over a 5min period of time.


pkg/workload/pgx_helpers.go line 160 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Actual examples would be preferred. Is prepared statement cache detrimental in a majority of the cases? Does the cache churn have a non-negligible performance penalty for workload throughput?

These comments were present before the refactoring required to update to pgx v5; therefore, I didn't revalidate them.


pkg/workload/pgx_helpers.go line 164 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

These two cache capacity defaults corresponds to QueryExecModeCacheStatement and QueryExecModeCacheDescribe. Why are we overriding them when the default mode is QueryExecModeDescribeExec?

Ditto. I wondered the same thing myself but didn't want to open a new investigation into a problem I wasn't attempting to solve.


pkg/workload/pgx_helpers.go line 237 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

< 0 is handled above. In fact, it feels like a precondition.

Updated this check to be == instead. The < 0 check short-circuits the warmup entirely, whereas == 0 auto-sizes the warmup to max conns. There is no change in behavior, but this is slightly more explicit (the previous < 0 short-circuit didn't exist until late in this update).


pkg/workload/sql_runner.go line 283 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

This looks like a typo. The first element should be of type pgx.QueryExecMode, not []pgx.QueryExecMode.

This is the same copy() logic that was there previously and just a slice trick to prepend a value:

args = append(args, pgx.QuerySimpleProtocol(true))
copy(args[1:], args)
args[0] = pgx.QuerySimpleProtocol(true)


pkg/workload/kv/kv.go line 602 at r4 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

BeginTx does invoke Acquire, why the more verbose version?

TIL. I was doing a strict refactor of what was there. I've updated this to use BeginTx(). Thank you!

@sean- sean- requested a review from srosenberg March 20, 2023 17:11
sean- added a commit that referenced this pull request Mar 20, 2023
This includes one notable fix for `workload` (see #98689):

- Fix MaxConnLifetimeJitter

Release note: None
@sean- sean- requested a review from a team as a code owner March 20, 2023 17:36
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 22 files at r1, 4 of 4 files at r8, 13 of 13 files at r11, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs, @sean-, @sravotto, and @srosenberg)


pkg/workload/pgx_helpers.go line 273 at r11 (raw file):

}

// AddPreparedStatement adds the given sql statement to the map of

The comment on this method is off.


pkg/workload/pgx_helpers.go line 323 at r11 (raw file):

		g.Go(func() error {
			if err := sem.Acquire(warmupCtx, 1); err != nil {
				warmupConns <- struct{}{}

With an error group, once any goroutine returns an error, the context is cancelled, so you should be able to select { <-warmupConns, <-warmupCtx } down below instead of doing this dance with the warmup conn accounting and sem release on the error path.


pkg/workload/sql_runner.go line 276 at r11 (raw file):

}

// prependQuerySimpleProtocol inserts pgx.QuerySimpleProtocol(true) at the

What allows us to remove this for the QueryExecModeSimpleProtocol case? Are we telling pgx to use QueryExecModeSimpleProtocol somewhere else?


pkg/workload/indexes/indexes.go line 162 at r11 (raw file):

	}
	cfg := workload.MultiConnPoolCfg{
		ConnHealthCheckPeriod: w.connFlags.ConnHealthCheckPeriod,

This looks tedious and error-prone. Should we pull out a constructor?

Copy link
Collaborator Author

@sean- sean- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @renatolabs, @sravotto, and @srosenberg)


pkg/workload/sql_runner.go line 276 at r11 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What allows us to remove this for the QueryExecModeSimpleProtocol case? Are we telling pgx to use QueryExecModeSimpleProtocol somewhere else?

Exec() via pgxpool calls Exec() in pgconn, which always uses the simple query protocol.


pkg/workload/indexes/indexes.go line 162 at r11 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This looks tedious and error-prone. Should we pull out a constructor?

I was cargo-culting the existing style here, but agree this should be revisited.

Copy link
Collaborator Author

@sean- sean- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @renatolabs, @sravotto, and @srosenberg)


pkg/workload/indexes/indexes.go line 162 at r11 (raw file):

Previously, sean- (Sean Chittenden) wrote…

I was cargo-culting the existing style here, but agree this should be revisited.

Revisited and created a helper function that creates a workload.MultiConnPoolCfg from a workload.ConnFlags

@sean- sean- requested a review from nvanbenschoten March 28, 2023 20:06
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: the change here looks good. Thanks for making it and iterating on this.

Reviewed 1 of 2 files at r13, 1 of 1 files at r14, 6 of 6 files at r15, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @renatolabs, @sean-, @sravotto, and @srosenberg)


pkg/workload/histogram/histogram.go line 76 at r15 (raw file):

func (w *NamedHistogram) Record(elapsed time.Duration) {
	w.prometheusHistogram.Observe(float64(elapsed.Nanoseconds()) / float64(time.Second))
	w.mu.Lock()

Nice find. What brought you here?

Copy link
Collaborator Author

@sean- sean- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @bobvawter, @renatolabs, @sravotto, and @srosenberg)


pkg/workload/histogram/histogram.go line 76 at r15 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Nice find. What brought you here?

I started getting desperate to stamp out a race condition and ran the race detector as a quick first pass, and stumbled across this (h/t to @bobvawter for the suggestion). The bug is when running workload with <100% of the connections pre-warmed, one connection Acquire() succeeds but returns an invalid connection, leading to the operator being required to run the test with --tolerate-errors. This shouldn't be necessary but only happens at higher levels of concurrency and with low numbers of pre-warmed connections. The following is for posterity and not something I want to have this PR block on.

I haven't been able to hunt down the bug yet. I thought I resolved this, but by adding debug code, I masked the race condition. Infuriatingly, if I inject debug code in the error branch, the error never presents itself. (Mission accomplished? 😆). Specifically, if you warm up less than 100% of the connections (this is a new feature, so this isn't a regression), then the call to pgxpool.Acquire() can (and frequently does while the workload is just starting), return a nil conn. The err return value varies, and the most common error is read: connection reset by peer. With low levels of concurrency, this doesn't happen, but when using a concurrency of 300 and only warming up one connection, it happens nearly every run.

What's wild is putting in q.Q() (a debugging library I use) in the error path[2] seems to be all that's required to prevent the race from happening. Downgrading from q.Q() to log.Warningf() reintroduces the problem. I've been looking into pgxpool's Acquire() and Release() code paths, as well as jackc/puddle but haven't been able to figure out where the bug is. My current suspicion is a pgxpool.Conn is being Release()'ed back into the pool in an invalid state. I've added AfterRelease() and BeforeAcquire() handlers to try and catch things inside of pgxpool, but struck out there.
I'm starting to suspect there's something subtle in the memory model that isn't being handled correctly in jackc/puddle's Acquire() or Release(). I'm going to continue to pursue this in the background because my gaze is starting to move external to workload.

[1] Sample workload output w/ bug:

$ workload run kv --tolerate-errors --db movr '--concurrency=300' --min-conns 1 '--duration=3s' '--warmup-conns=1' 'postgresql://[email protected]:26257/movr?sslmode=disable&name=url1'
INFO: Build completed successfully, 32 total actions
I230329 15:37:52.285739 1 workload/cli/run.go:622  [T1] 1  random seed: -2816927724130576403
I230329 15:37:52.285813 1 workload/cli/run.go:429  [T1] 2  creating load generator...
I230329 15:37:52.292837 1 workload/cli/run.go:460  [T1] 3  creating load generator... done (took 7.022875ms)
E230329 15:37:52.321974 1 workload/cli/run.go:548  [T1] 4  read tcp 127.0.0.1:52708 -> 127.0.0.1:26257: read: connection reset by peer
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
    1.0s       96        19013.8        19155.6     14.2     23.1     75.5     96.5 write
    2.0s       96        19489.6        19322.9     16.3     26.2     37.7     52.4 write
    3.0s       96        19349.8        19331.8     16.8     28.3     37.7     58.7 write
Highest sequence written: 58300. Can be passed as --write-seq=R58300 to the next run.

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
    3.0s       96          58000        19330.0     15.4     15.7     26.2     39.8     96.5  write

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
    3.0s       96          58000        19330.0     15.4     15.7     26.2     39.8     96.5

[2] Sample patch that masks the problem:

➜ git diff pkg/workload/sql_runner.go
diff --git a/pkg/workload/sql_runner.go b/pkg/workload/sql_runner.go
index 8f579a69258..6c39e657e76 100644
--- a/pkg/workload/sql_runner.go
+++ b/pkg/workload/sql_runner.go
@@ -14,6 +14,7 @@ import (
        "context"
        "fmt"

+       // "github.com/cockroachdb/cockroach/pkg/util/log"
+       "github.com/ryboe/q"
        "github.com/cockroachdb/errors"
        "github.com/jackc/pgx/v5"
        "github.com/jackc/pgx/v5/pgconn"
@@ -110,8 +111,16 @@ type StmtHandle struct {
 // See pgx.Conn.Exec.
 func (h StmtHandle) Exec(ctx context.Context, args ...interface{}) (pgconn.CommandTag, error) {
        h.check()
        conn, err := h.s.sr.mcp.Get().Acquire(ctx)
        if err != nil {
+               // log.Warningf(ctx, "race conditions gunna race: %v", h.s.sr.mcp)
+               q.Q("mcp", h.s.sr.mcp)
                return pgconn.CommandTag{}, err
        }
        defer conn.Release()

@sean-
Copy link
Collaborator Author

sean- commented Mar 29, 2023

bors try

@craig
Copy link
Contributor

craig bot commented Mar 29, 2023

try

Build failed:

@sean-
Copy link
Collaborator Author

sean- commented Mar 29, 2023

bors retry

@craig
Copy link
Contributor

craig bot commented Mar 29, 2023

try

Build failed:

@sean- sean- force-pushed the upgrade-pkg-cmd-workload-pgxv5 branch 2 times, most recently from bdb8e3e to 072c876 Compare March 29, 2023 16:52
@sean-
Copy link
Collaborator Author

sean- commented Mar 29, 2023

bors retry

craig bot pushed a commit that referenced this pull request Mar 29, 2023
@craig
Copy link
Contributor

craig bot commented Mar 29, 2023

try

Build succeeded:

This change upgrades workload's use of pgx from v4 to v5 in order to allow
jittering the teardown of connections.  This change sets a max connection age
of 5min and jitters the teardown by 30s.  Upgrading to pgx v5 also adds
non-blocking pgxpool connection acquisition.

workload: add flags to manage the age and lifecycle of connection pool

Add flags to all workload types to specify:

* the max connection age: `--max-conn-lifetime duration`
* the max connection age jitter: `--max-conn-lifetime-jitter duration`
* the max connection idle time: `--max-conn-idle-time duration`
* the connection health check interval: `--conn-healthcheck-period duration`
* the min number of connections in the pool: `--min-conns int`

workload: add support for remaining pgx query modes

Add support for pgx.QueryExecModeCacheDescribe and
pgx.QueryExecModeDescribeExec.  Previously, only three of the five query modes
were available.

workload: fix race condition when recording histogram data

Release note (cli change): workload jitters teardown of connections to prevent
thundering herd impacting P99 latency results.

Release note (cli change): workload utility now has flags to tune the
connection pool used for testing.  See `--conn-healthcheck-period`,
`--min-conns`, and the `--max-conn-*` flags for details.

Release note (cli change): workload now supports every [PostgreSQL query
mode](https://github.com/jackc/pgx/blob/fa5fbed497bc75acee05c1667a8760ce0d634cba/conn.go#L167-L182)
available via the underlying pgx driver.
@sean- sean- force-pushed the upgrade-pkg-cmd-workload-pgxv5 branch from 072c876 to 98e324a Compare March 29, 2023 17:35
@sean-
Copy link
Collaborator Author

sean- commented Mar 29, 2023

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 29, 2023

This PR was included in a batch that successfully built, but then failed to merge into master (it was a non-fast-forward update). It will be automatically retried.

@craig
Copy link
Contributor

craig bot commented Mar 30, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants