Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
101786: workload: introduce timeout for pre-warming connection pool r=sean- a=sean-

Interrupting target instances during prewarming shouldn't cause workload to proceed: introduce a timeout to prewarming connections.  Connections will have 15s to 5min to warmup before the context will expire.

Epic: none

101987: cli/sql: new option autocerts for TLS client cert auto-discovery r=rafiss a=knz

Fixes #101986.

See the release note below.
An additional benefit not mentioned in the release note is that
it simplifies switching from one tenant to another when using
shared-process multitenancy. For example, this becomes possible:

```
> CREATE TENANT foo;
> ALTER TENANT foo START SERVICE SHARED;
> \c cluster:foo root - - autocerts
```

Alternatively, this can also be used to quickly switch from a non-root
user in an app tenant to the root user in the system tenant:
```
> \c cluster:system root - - autocerts
```

This works because (currently) all tenant servers running side-by-side
use the same TLS CA to validate SQL client certs.

----

Release note (cli change): The `\connect` client-side command for the
SQL shell (included in `cockroach sql`, `cockroach demo`,
`cockroach-sql`) now recognizes an option `autocerts` as last
argument.

When provided, `\c` will now try to discover a TLS client
certificate and key in the same directory(ies) as used by the previous
connection URL.

This feature makes it easier to switch usernames when
TLS client/key files are available for both the previous and the new
username.

102382: c2c: deflake c2c/shutdown roachtests r=stevendanna a=msbutler

   c2c: deflake c2c/shutdown roachtests

    This patch addresses to roachtest failure modes:
    - Prevents roachtest failure if a query fails during a node shutdown.

    - Prevents the src cluster from returning a single node topology, which could
      cause the stream ingestion job to hang if the participating src node gets
    shut down. Longer term, automatic replanning will prevent this. Specifically,
    this patch changes the kv workload to split and scatter the kv table across the
    cluster before the c2c job begins.

    Fixes #101898
    Fixes #102111

    This patch also makes it easier to reproduce c2c roachtest failures by plumbing
    a random seed to several components of the roachtest driver.

    Release note: None


    c2c: rename completeStreamIngestion to applyCutoverTime

    Release note: none


    workload: add --scatter flag to kv workload

    The user can now run `./workload init kv --scatter ....` which scatters the kv
    table across the cluster after the initial data load. This flag is best used
    with `--splits`, `--max-block-bytes`, and `--insert-count`.

    Epic: none

    Release note: none

102819: admission: move CreateTime-sequencing below-raft r=irfansharif a=irfansharif

These are already reviewed commits from #98308. Part of #95563.

---

**admission: move CreateTime-sequencing below-raft**

We move kvflowsequencer.Sequencer and its use in kvflowhandle.Handle (above-raft) to admission.sequencer, now used by admission.StoreWorkQueue (below-raft). This variant appeared in an earlier revision of #97599 where we first introduced monotonically increasing CreateTimes for a given raft group.

In a subsequent commit, when integrating kvflowcontrol into the critical path for replication traffic, we'll observe that it's quite difficult to create sequencing CreateTimes[^1] above raft. This is because these sequence numbers are encoded as part of the raft proposal[^2], and at encode-time, we don't actually know what log position the proposal is going to end up in. It's hard to explicitly guarantee that a proposal with log-position P1 will get encoded before another with log position P2, where P1 < P2.

Naively sequencing CreateTimes at proposal-encode-time could result in over-admission. This is because of how we return flow tokens -- up to some log index[^3], and how use these sequence numbers in below-raft WorkQueues. If P2 ends up with a lower sequence number/CreateTime, it would get admitted first, and when returning flow tokens by log position, in specifying up-to-P2, we'll early return P1's flow tokens despite it not being admitted. So we'd over-admit at the sender. This is all within a <tenant,priority> pair.

[^1]: We use CreateTimes as "sequence numbers" in replication admission control. We want to assign each AC-queued work below-raft a "sequence number" for FIFO ordering within a <tenant,priority>. We ensure these timestamps are roughly monotonic with respect to log positions of replicated work by sequencing work in log position order.
[^2]: In kvflowcontrolpb.RaftAdmissionMeta.
[^3]: See kvflowcontrolpb.AdmittedRaftLogEntries.

---

**admission: add intercept points for when replicated work gets admitted**

In a subsequent commit, when integrating kvflowcontrol into the critical path for replication traffic, we'll set up the return of flow tokens from the receiver node back to the sender once log entries get (asynchronously) admitted[^4]. So we need to intercept the exact points at which the virtually enqueued work items get admitted, since it all happens asynchronously[^5]. To that end we introduce the following interface:
```go
    // OnLogEntryAdmitted is used to observe the specific entries
    // (identified by rangeID + log position) that were admitted. Since
    // admission control for log entries is asynchronous/non-blocking,
    // this allows callers to do requisite post-admission
    // bookkeeping.
    type OnLogEntryAdmitted interface {
     AdmittedLogEntry(
       origin roachpb.NodeID, /* node where the entry originated */
       pri admissionpb.WorkPriority, /* admission priority of the entry */
       storeID roachpb.StoreID, /* store on which the entry was admitted */
       rangeID roachpb.RangeID, /* identifying range for the log entry */
       pos LogPosition, /* log position of the entry that was admitted*/
     )
    }
```
For now we pass in a no-op implementation in production code, but this will change shortly.

Seeing as how the asynchronous admit interface is going to be the primary once once we enable replication admission control by default, for IO control, we no longer need the storeWriteDone interfaces and corresponding types. It's being used by our current (and soon-to-be legacy) above-raft IO admission control to inform granters of when the write was actually done, post-admission. For above-raft IO control, at admit-time we do not have sizing info for the writes, so by intercepting these writes at write-done time we're able to make any outstanding token adjustments in the granter.

To reflect this new world, we:
- Rename setAdmittedDoneModels to setLinearModels.
- Introduce a storeReplicatedWorkAdmittedInfo[^6]. It provides information about the size of replicated work once it's admitted (which happens asynchronously from the work itself). This lets us use the underlying linear models for L0 {writes,ingests} to deduct an appropriate number of tokens from the granter, for the admitted work size[^7].
- Rename the granterWithStoreWriteDone interface to granterWithStoreReplicatedWorkAdmitted. We'll still intercept the actual point of admission for some token adjustments, through the the storeReplicatedWorkAdmittedLocked API shown below. There are two callstacks through which this API gets invoked, one where the coord.mu is already held, and one where it isn't. We plumb this information through so the lock is acquired if not already held. The locking structure is unfortunate, but this was a minimally invasive diff.
```go
   storeReplicatedWorkAdmittedLocked(
    originalTokens int64,
    admittedInfo storeReplicatedWorkAdmittedInfo,
   ) (additionalTokens int64)
```
While here, we also export an admission.TestingReverseWorkPriorityDict. There are at least three tests that have re-invented the wheel.

[^4]: This will happen through the kvflowcontrol.Dispatch interface introduced back in #97766, after integrating it with the RaftTransport layer.
[^5]: Introduced in #97599, for replicated write work.
[^6]: Identical to the previous StoreWorkDoneInfo.
[^7]: There's a peculiarity here in that at enqueuing-time we actually know the size of the write, so we could have deducted the right number of tokens upfront and avoid this post-admit granter token adjustment. We inherit this structure from earlier, and just leave a TODO for now.


103116: generate-logic-test: fix incorrect timeout in logictests template r=rickystewart a=healthy-pod

In #102719, we changed the way we set `-test.timeout` but didn't update the logictests template. This code change updates the template.

Release note: None
Epic: none

Co-authored-by: Sean Chittenden <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: healthy-pod <[email protected]>
  • Loading branch information
6 people committed May 11, 2023
6 parents 07b2aa6 + 9f71433 + dedeacf + 093e2dd + 05c6ae3 + dcab435 commit aa2c52b
Show file tree
Hide file tree
Showing 49 changed files with 717 additions and 410 deletions.
4 changes: 0 additions & 4 deletions pkg/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,6 @@ ALL_TESTS = [
"//pkg/kv/kvserver/kvflowcontrol/kvflowcontroller:kvflowcontroller_test",
"//pkg/kv/kvserver/kvflowcontrol/kvflowdispatch:kvflowdispatch_test",
"//pkg/kv/kvserver/kvflowcontrol/kvflowhandle:kvflowhandle_test",
"//pkg/kv/kvserver/kvflowcontrol/kvflowsequencer:kvflowsequencer_test",
"//pkg/kv/kvserver/kvflowcontrol/kvflowsimulator:kvflowsimulator_test",
"//pkg/kv/kvserver/kvflowcontrol/kvflowtokentracker:kvflowtokentracker_test",
"//pkg/kv/kvserver/kvstorage:kvstorage_test",
Expand Down Expand Up @@ -1309,8 +1308,6 @@ GO_TARGETS = [
"//pkg/kv/kvserver/kvflowcontrol/kvflowdispatch:kvflowdispatch_test",
"//pkg/kv/kvserver/kvflowcontrol/kvflowhandle:kvflowhandle",
"//pkg/kv/kvserver/kvflowcontrol/kvflowhandle:kvflowhandle_test",
"//pkg/kv/kvserver/kvflowcontrol/kvflowsequencer:kvflowsequencer",
"//pkg/kv/kvserver/kvflowcontrol/kvflowsequencer:kvflowsequencer_test",
"//pkg/kv/kvserver/kvflowcontrol/kvflowsimulator:kvflowsimulator_test",
"//pkg/kv/kvserver/kvflowcontrol/kvflowtokentracker:kvflowtokentracker",
"//pkg/kv/kvserver/kvflowcontrol/kvflowtokentracker:kvflowtokentracker_test",
Expand Down Expand Up @@ -2777,7 +2774,6 @@ GET_X_DATA_TARGETS = [
"//pkg/kv/kvserver/kvflowcontrol/kvflowcontrolpb:get_x_data",
"//pkg/kv/kvserver/kvflowcontrol/kvflowdispatch:get_x_data",
"//pkg/kv/kvserver/kvflowcontrol/kvflowhandle:get_x_data",
"//pkg/kv/kvserver/kvflowcontrol/kvflowsequencer:get_x_data",
"//pkg/kv/kvserver/kvflowcontrol/kvflowsimulator:get_x_data",
"//pkg/kv/kvserver/kvflowcontrol/kvflowtokentracker:get_x_data",
"//pkg/kv/kvserver/kvserverbase:get_x_data",
Expand Down
2 changes: 1 addition & 1 deletion pkg/ccl/streamingccl/streamingest/alter_replication_job.go
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ func alterTenantJobCutover(
cutoverTime, record.Timestamp)
}
}
if err := completeStreamIngestion(ctx, jobRegistry, txn, tenInfo.TenantReplicationJobID, cutoverTime); err != nil {
if err := applyCutoverTime(ctx, jobRegistry, txn, tenInfo.TenantReplicationJobID, cutoverTime); err != nil {
return hlc.Timestamp{}, err
}

Expand Down
2 changes: 1 addition & 1 deletion pkg/ccl/streamingccl/streamingest/stream_ingest_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ type streamIngestManagerImpl struct {
func (r *streamIngestManagerImpl) CompleteStreamIngestion(
ctx context.Context, ingestionJobID jobspb.JobID, cutoverTimestamp hlc.Timestamp,
) error {
return completeStreamIngestion(ctx, r.jobRegistry, r.txn, ingestionJobID, cutoverTimestamp)
return applyCutoverTime(ctx, r.jobRegistry, r.txn, ingestionJobID, cutoverTimestamp)
}

// GetStreamIngestionStats implements streaming.StreamIngestManager interface.
Expand Down
5 changes: 3 additions & 2 deletions pkg/ccl/streamingccl/streamingest/stream_ingestion_job.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,9 @@ import (
"github.com/cockroachdb/errors"
)

// completeStreamIngestion terminates the stream as of specified time.
func completeStreamIngestion(
// applyCutoverTime modifies the consumer job record with a cutover time and
// unpauses the job if necessary.
func applyCutoverTime(
ctx context.Context,
jobRegistry *jobs.Registry,
txn isql.Txn,
Expand Down
1 change: 1 addition & 0 deletions pkg/cli/clisqlshell/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ go_library(
"//pkg/util/sysutil",
"@com_github_charmbracelet_bubbles//cursor",
"@com_github_cockroachdb_errors//:errors",
"@com_github_cockroachdb_errors//oserror",
"@com_github_knz_bubbline//:bubbline",
"@com_github_knz_bubbline//computil",
"@com_github_knz_bubbline//editline",
Expand Down
4 changes: 4 additions & 0 deletions pkg/cli/clisqlshell/context.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ type Context struct {
// CockroachDB's own CLI package has a more advanced URL
// parser that is used instead.
ParseURL URLParser

// CertsDir is an extra directory to look for client certs in,
// when the \c command is used.
CertsDir string
}

// internalContext represents the internal configuration state of the
Expand Down
124 changes: 120 additions & 4 deletions pkg/cli/clisqlshell/sql.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/util/envutil"
"github.com/cockroachdb/cockroach/pkg/util/sysutil"
"github.com/cockroachdb/errors"
"github.com/cockroachdb/errors/oserror"
"github.com/knz/bubbline/editline"
"github.com/knz/bubbline/history"
)
Expand All @@ -66,9 +67,10 @@ Query Buffer
Connection
\info display server details including connection strings.
\c, \connect {[DB] [USER] [HOST] [PORT] | [URL]}
\c, \connect {[DB] [USER] [HOST] [PORT] [options] | [URL]}
connect to a server or print the current connection URL.
(Omitted values reuse previous parameters. Use '-' to skip a field.)
Omitted values reuse previous parameters. Use '-' to skip a field.
The option "autocerts" attempts to auto-discover TLS client certs.
\password [USERNAME]
securely change the password for a user
Expand Down Expand Up @@ -1629,7 +1631,7 @@ func (c *cliState) handleConnect(
cmd []string, loopState, errState cliStateEnum,
) (resState cliStateEnum) {
if err := c.handleConnectInternal(cmd, false /*omitConnString*/); err != nil {
fmt.Fprintln(c.iCtx.stderr, err)
clierror.OutputError(c.iCtx.stderr, err, true /*showSeverity*/, false /*verbose*/)
c.exitErr = err
return errState
}
Expand Down Expand Up @@ -1691,10 +1693,21 @@ func (c *cliState) handleConnectInternal(cmd []string, omitConnString bool) erro
return err
}

var autoCerts bool

// Parse the arguments to \connect:
// it accepts newdb, user, host, port in that order.
// it accepts newdb, user, host, port and options in that order.
// Each field can be marked as "-" to reuse the current defaults.
switch len(cmd) {
case 5:
if cmd[4] != "-" {
if cmd[4] == "autocerts" {
autoCerts = true
} else {
return errors.Newf(`unknown syntax: \c %s`, strings.Join(cmd, " "))
}
}
fallthrough
case 4:
if cmd[3] != "-" {
if err := newURL.SetOption("port", cmd[3]); err != nil {
Expand Down Expand Up @@ -1757,6 +1770,12 @@ func (c *cliState) handleConnectInternal(cmd []string, omitConnString bool) erro
newURL.WithAuthn(prevAuthn)
}

if autoCerts {
if err := autoFillClientCerts(newURL, currURL, c.sqlCtx.CertsDir); err != nil {
return err
}
}

if err := newURL.Validate(); err != nil {
return errors.Wrap(err, "validating the new URL")
}
Expand Down Expand Up @@ -2712,3 +2731,100 @@ func (c *cliState) closeOutputFile() error {
c.iCtx.queryOutputBuf = nil
return err
}

// autoFillClientCerts tries to discover a TLS client certificate and key
// for use in newURL. This is used from the \connect command with option
// "autocerts".
func autoFillClientCerts(newURL, currURL *pgurl.URL, extraCertsDir string) error {
username := newURL.GetUsername()
// We could use methods from package "certnames" here but we're
// avoiding extra package dependencies for the sake of keeping
// the standalone shell binary (cockroach-sql) small.
desiredKeyFile := "client." + username + ".key"
desiredCertFile := "client." + username + ".crt"
// Try to discover a TLS client certificate and key.
// First we'll try to find them in the directory specified in the shell config.
// This is coming from --certs-dir on the command line (of COCKROACH_CERTS_DIR).
//
// If not found there, we'll try to find the client cert in the
// same directory as the cert in the original URL; and the key in
// the same directory as the key in the original URL (cert and key
// may be in different places).
//
// If the original URL doesn't have a cert, we'll try to find a
// cert in the directory where the CA cert is stored.

// If all fails, we'll tell the user where we tried to search.
candidateDirs := make(map[string]struct{})
var newCert, newKey string
if extraCertsDir != "" {
candidateDirs[extraCertsDir] = struct{}{}
if candidateCert := filepath.Join(extraCertsDir, desiredCertFile); fileExists(candidateCert) {
newCert = candidateCert
}
if candidateKey := filepath.Join(extraCertsDir, desiredKeyFile); fileExists(candidateKey) {
newKey = candidateKey
}
}
if newCert == "" || newKey == "" {
var caCertDir string
if tlsUsed, _, caCertPath := currURL.GetTLSOptions(); tlsUsed {
caCertDir = filepath.Dir(caCertPath)
candidateDirs[caCertDir] = struct{}{}
}
var prevCertDir, prevKeyDir string
if authnCertEnabled, certPath, keyPath := currURL.GetAuthnCert(); authnCertEnabled {
prevCertDir = filepath.Dir(certPath)
prevKeyDir = filepath.Dir(keyPath)
candidateDirs[prevCertDir] = struct{}{}
candidateDirs[prevKeyDir] = struct{}{}
}
if newKey == "" {
if candidateKey := filepath.Join(prevKeyDir, desiredKeyFile); fileExists(candidateKey) {
newKey = candidateKey
} else if candidateKey := filepath.Join(caCertDir, desiredKeyFile); fileExists(candidateKey) {
newKey = candidateKey
}
}
if newCert == "" {
if candidateCert := filepath.Join(prevCertDir, desiredCertFile); fileExists(candidateCert) {
newCert = candidateCert
} else if candidateCert := filepath.Join(caCertDir, desiredCertFile); fileExists(candidateCert) {
newCert = candidateCert
}
}
}
if newCert == "" || newKey == "" {
err := errors.Newf("unable to find TLS client cert and key for user %q", username)
if len(candidateDirs) == 0 {
err = errors.WithHint(err, "No candidate directories; try specifying --certs-dir on the command line.")
} else {
sortedDirs := make([]string, 0, len(candidateDirs))
for dir := range candidateDirs {
sortedDirs = append(sortedDirs, dir)
}
sort.Strings(sortedDirs)
err = errors.WithDetailf(err, "Candidate directories:\n- %s", strings.Join(sortedDirs, "\n- "))
}
return err
}

newURL.WithAuthn(pgurl.AuthnClientCert(newCert, newKey))

return nil
}

func fileExists(path string) bool {
_, err := os.Stat(path)
if err == nil {
return true
}
if oserror.IsNotExist(err) {
return false
}
// Stat() returned an error that is not "does not exist".
// This is unexpected, but we'll treat it as if the file does exist.
// The connection will try to use the file, and then fail with a
// more specific error.
return true
}
2 changes: 2 additions & 0 deletions pkg/cli/demo.go
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,8 @@ func runDemoInternal(
}
defer func() { resErr = errors.CombineErrors(resErr, conn.Close()) }()

_, _, certsDir := c.GetSQLCredentials()
sqlCtx.ShellCtx.CertsDir = certsDir
sqlCtx.ShellCtx.ParseURL = clienturl.MakeURLParserFn(cmd, cliCtx.clientOpts)

if err := extraInit(ctx, conn); err != nil {
Expand Down
18 changes: 16 additions & 2 deletions pkg/cli/interactive_tests/test_connect_cmd.tcl
Original file line number Diff line number Diff line change
Expand Up @@ -110,17 +110,31 @@ eexpect foo@
eexpect "/system>"
end_test

start_test "Check that the client-side connect cmd can change users with automatic client cert detection"
send "\\c - root - - autocerts\r"
eexpect "using new connection URL"
eexpect root@
eexpect "/system>"
end_test

start_test "Check that the auto-cert feature properly fails if certs were not found"
send "\\c - unknownuser - - autocerts\r"
eexpect "unable to find TLS client cert and key"
eexpect root@
eexpect "/system>"
end_test

start_test "Check that the client-side connect cmd can detect syntax errors"
send "\\c - - - - abc\r"
eexpect "unknown syntax"
eexpect foo@
eexpect root@
eexpect "/system>"
end_test

start_test "Check that the client-side connect cmd recognizes invalid URLs"
send "\\c postgres://root@localhost:26257/defaultdb?sslmode=invalid&sslcert=$certs_dir%2Fclient.root.crt&sslkey=$certs_dir%2Fclient.root.key&sslrootcert=$certs_dir%2Fca.crt\r"
eexpect "unrecognized sslmode parameter"
eexpect foo@
eexpect root@
eexpect "/system>"
end_test

Expand Down
1 change: 1 addition & 0 deletions pkg/cli/sql_shell_cmd.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ func runTerm(cmd *cobra.Command, args []string) (resErr error) {
}
defer func() { resErr = errors.CombineErrors(resErr, conn.Close()) }()

sqlCtx.ShellCtx.CertsDir = baseCfg.SSLCertsDir
sqlCtx.ShellCtx.ParseURL = clienturl.MakeURLParserFn(cmd, cliCtx.clientOpts)
return sqlCtx.Run(context.Background(), conn)
}
1 change: 1 addition & 0 deletions pkg/cmd/cockroach-sql/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,7 @@ func runSQL(cmd *cobra.Command, args []string) (resErr error) {
}
defer func() { resErr = errors.CombineErrors(resErr, conn.Close()) }()

cfg.ShellCtx.CertsDir = copts.CertsDir
cfg.ShellCtx.ParseURL = clienturl.MakeURLParserFn(cmd, copts)
return cfg.Run(context.Background(), conn)
}
5 changes: 4 additions & 1 deletion pkg/cmd/generate-logictest/templates.go
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,10 @@ go_test(
size = "enormous",
srcs = ["generated_test.go"],{{ if .SqliteLogicTest }}
args = ["-test.timeout=7195s"],{{ else }}
args = ["-test.timeout=3595s"],{{ end }}
args = select({
"//build/toolchains:use_ci_timeouts": ["-test.timeout=895s"],
"//conditions:default": ["-test.timeout=3595s"],
}),{{ end }}
data = [
"//c-deps:libgeos", # keep{{ if .SqliteLogicTest }}
"@com_github_cockroachdb_sqllogictest//:testfiles", # keep{{ end }}{{ if .CockroachGoTestserverTest }}
Expand Down
5 changes: 5 additions & 0 deletions pkg/cmd/roachtest/option/node_list_option.go
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,11 @@ func (n NodeListOption) RandNode() NodeListOption {
return NodeListOption{n[rand.Intn(len(n))]}
}

// SeededRandNode returns a random node from the NodeListOption using a seeded rand object.
func (n NodeListOption) SeededRandNode(rand *rand.Rand) NodeListOption {
return NodeListOption{n[rand.Intn(len(n))]}
}

// NodeIDsString returns the nodes in the NodeListOption, separated by spaces.
func (n NodeListOption) NodeIDsString() string {
result := ""
Expand Down
Loading

0 comments on commit aa2c52b

Please sign in to comment.