Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: enable span configs by default #73876

Merged
merged 7 commits into from
Jan 18, 2022

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented Dec 15, 2021

We were previously using an opt-in flag to enable span configs; this
commit flips it into an opt-out one. This has the following effects:

  • fresh crdb nodes/clusters run with span configs enabled.
  • our entire test suite runs using the span configs infrastructure, save
    for a select few written:
    • with the gossipped system config span in mind;
    • using legacy testing harnesses that prevent injection of
      spanconfig dependencies due to circular imports.

We also introduce a more minified form of the opt-out testing flag:
UseSystemConfigSpanForQueues. Using the minified form still enables the
SQL-side of span configs to run (Reconciler, SQLWatcher, SQLTranslator)
but defaults to using the system config span down in KV.

Release note: None

@irfansharif irfansharif requested review from a team as code owners December 15, 2021 22:45
@irfansharif irfansharif requested review from samiskin, tbg, a team and erikgrinaker and removed request for a team, samiskin and tbg December 15, 2021 22:45
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif irfansharif removed the request for review from erikgrinaker December 15, 2021 22:45
@tbg
Copy link
Member

tbg commented Dec 16, 2021

Heads up, you can use draft PRs to avoid sending out emails.

@tbg tbg marked this pull request as draft December 16, 2021 08:37
@irfansharif irfansharif force-pushed the 211214.enable-spanconfigs branch 3 times, most recently from a24c292 to 61125e5 Compare December 21, 2021 01:11
@irfansharif irfansharif force-pushed the 211214.enable-spanconfigs branch from 4214d8a to 8b4b2b5 Compare December 22, 2021 03:35
@irfansharif irfansharif changed the title [wip] *: enable span configs infrastructure by default *: enable span configs infrastructure by default Dec 22, 2021
@irfansharif irfansharif marked this pull request as ready for review December 22, 2021 03:43
@irfansharif irfansharif requested a review from a team December 22, 2021 03:43
@irfansharif irfansharif requested review from a team as code owners December 22, 2021 03:43
@irfansharif irfansharif requested a review from a team December 22, 2021 03:43
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Jan 24, 2022
Fixed cockroachdb#75020. This test makes use of an "upsert-until-backpressure"
primitive, where it continually writes blobs to a scratch table
configured with a lower max range size (1<<18 from the default of
512<<20) until it experiences replica backpressure. Before cockroachdb#73876 (when
using the system config span), the splitting off of the scratch table's
ranges happened near instantly after the creation of the table itself.
This changed slightly with the span configs infrastructure, where
there's more of asynchrony built in and ranges may appear after a while
longer.

The test previously started upserting as soon as it created the table,
being able to implicitly rely on the tight synchrony of already having
the table's ranges carved out. This comes when deciding to record a
replica's largest previous max range size -- something we only do if the
replica's current size is larger than the new limit (see
(*Replica).SetSpanCnfig). When we weren't upserting until the range
applied the latest config, this was not possible. With span configs and
its additional asynchrony, this assumption no longer held. It was
possible then for the range to accept writes larger than the newest max
range size, which unintentionally (for this test at least) triggers an
escape hatch where we avoid backpressure when lowering a range's max
size below its current size (cockroachdb#46863).

The failure is easy to repro. This test runs in ~8s, and with the
following we were reliably seeing timeouts where the test was waiting
for backpressure to kick in (but it never did because of the escape
hatch).

    dev test pkg/kv/kvserver \
      -f TestProtectedTimestamps -v --show-logs \
      --timeout 300s --ignore-cache --count 10

De-flaking this test is simple -- we just wait for the table's replicas
to apply the latest config before issuing writes to it.

Release note: None
craig bot pushed a commit that referenced this pull request Jan 25, 2022
74765: roachpb: introduce the concept of `SystemSpanConfig` and related protos r=arulajmani a=arulajmani

This patch is motivated by the desire to let the host tenant lay
protected timestamps on one or all secondary tenants' keyspace. It
also provides a mechanism to allow secondary tenants to lay protected
timestamps on their entire keyspace without updating every span
configuration.

We introduce the concept of `SystemSpanConfig` and
`SystemSpanConfigTarget` to enable this. We tie these together using a
`SystemSpanConfigEntry`.

A `SystemSpanConfig` is a system installed configuration that can apply
to multiple spans. It only contains protected timestamp information.

A `SystemSpanConfigTarget` is used to specify the spans a
`SystemSpanConfig` applies over. It can be used to target the entire
(logical) cluster or a particular secondary tenant. We will ensure that
only the host tenant can target secondary tenants in a future PR that
actually persists `SystemSpanConfigs`.

We will persist `SystemSpanConfigs` in `system.span_configurations` in
a future patch. The `SystemSpanConfigTarget` will be encoded into
special reserved keys when we do so.

This change introduces the notion of a hierarchy to span configurations.
The configuration that applies to a span will now bee the `SpanConfig`
stored in `system.span_configurations` combined with all the
`SystemSpanConfigs` that apply to the span. This can be at most 4
levels deep -- for a secondary tenant's range, the secondary tenant can
install a `SystemSpanConfig` that applies to all its ranges, the host
tenant can install a `SystemSpanConfig` that applies to all ranges of
the secondary tenant, and the host tenant can install a
`SystemSpanConfig` that applies to all ranges.

These protos form the data model which will later be used to enable
protected timestamp support for secondary tenants using the span config
infrastructure. It will be used by the various components such as the
`SQLTranslator`, `KVAccessor`, `Reconciler` etc.

Release note: None

75233: kvserver: avoid clobbering replica conf r=irfansharif a=irfansharif

Fixes #75109. There are two ways to read the configuration applied over
a given replica:
  (a) the copy embedded in each replica struct
  (b) spanconfig.StoreReader, hanging off the store struct

The interface in (b) is implemented by both the span configs
infrastructure and the legacy system config span it's designed to
replace; it's typically used by KV queues (see #69172). When switching
between subsystems in KV (controlled by spanconfig.store.enabled), for
we transparently switch the source for (b). We also use then use the
right source to refresh (a) for every replica. Incremental updates
thereafter mutate (a) directly. As you'd expect, we need to take special
care that only one subsystem is writing to (a) at a given point in time,
a guarantee that was not upheld before this commit. This bug manifested
after we enabled span configs by default (see #73876), and likely
affected many tests.

To avoid the system config span from clobbering (a) when span configs
are enabled, we just need to check what spanconfig.store.enabled
holds at the right spot. To repro:

    # Reliably fails with 1-2m on my GCE worker before this patch,
    # doesn't after.
    dev test pkg/kv/kvserver \
      -f TestBackpressureNotAppliedWhenReducingRangeSize \
      -v --show-logs --timeout 15m --stress

Release note: None

75280: sql: deprecate TableDescriptor.GCMutations r=postamar,ajwerner a=stevendanna

This appears unused. While the schema changer adds entries that the gc
job subsequently removes, the only other code that made use of this
field (outside of tests) was FindIndexByID. FindIndexByID appears to
use it to return a special error that no one looks for.

Release note: None

Co-authored-by: arulajmani <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Steven Danna <[email protected]>
craig bot pushed a commit that referenced this pull request Jan 25, 2022
75304: execbuilder: deflake TestExecBuild r=irfansharif a=irfansharif

Fixes #74933. This test asserts on physical plans of statements after
moving ranges around, i.e. whether distsql execution is "local" or
"distributed". This is determined by:
- where the distsql planner places processor cores,
- which in turn is a function of the span resolver,
- which delegates to the embedded distsender,
- which operates off of a range cache.

The range cache, like the name suggests, can hold stale data. This test
moved replicas of indexes around to induce distributed execution, but
was not accounting for the caches holding onto stale data. In my test
runs every so often I was seeing stale range descriptors from before the
relocate trip up the planner to generate a local execution, flaking the
test. Fortunately for us, all we need to do is slap on a retry. To
repro:

    # Took under a minute on my gceworker. Helped to trim down the test
    # file slightly.
    dev test pkg/sql/opt/exec/execbuilder \
      -f 'TestExecBuild/5node' -v --show-logs \
      --stress --stress-args '-p 4'

Release note: None

75436: kvserver: de-flake TestProtectedTimestamps r=irfansharif a=irfansharif

Fixed #75020. This test makes use of an "upsert-until-backpressure"
primitive, where it continually writes blobs to a scratch table
configured with a lower max range size (1<<18 from the default of
512<<20) until it experiences replica backpressure. Before #73876 (when
using the system config span), the splitting off of the scratch table's
ranges happened near instantly after the creation of the table itself.
This changed slightly with the span configs infrastructure, where
there's more of asynchrony built in and ranges may appear after a while
longer.

The test previously started upserting as soon as it created the table,
being able to implicitly rely on the tight synchrony of already having
the table's ranges carved out. This comes when deciding to record a
replica's largest previous max range size -- something we only do if the
replica's current size is larger than the new limit (see
(*Replica).SetSpanCnfig). When we weren't upserting until the range
applied the latest config, this was not possible. With span configs and
its additional asynchrony, this assumption no longer held. It was
possible then for the range to accept writes larger than the newest max
range size, which unintentionally (for this test at least) triggers an
escape hatch where we avoid backpressure when lowering a range's max
size below its current size (#46863).

The failure is easy to repro. This test runs in ~8s, and with the
following we were reliably seeing timeouts where the test was waiting
for backpressure to kick in (but it never did because of the escape
hatch).

    dev test pkg/kv/kvserver \
      -f TestProtectedTimestamps -v --show-logs \
      --timeout 300s --ignore-cache --count 10

De-flaking this test is simple -- we just wait for the table's replicas
to apply the latest config before issuing writes to it.

Release note: None

75443: ui: removed formatting to statements on the details pages r=THardy98 a=THardy98

**ui: removed formatting to statements on the details pages**

Previously, statements displayed on the statement/transaction/index
details pages were formatted (formatting was added to allow for better
readability of statements on these detail pages). However, statements
queries from the frontend were noticeably slower due to this
implementation. This change reverts the changes to statement formatting
(updates the fixtures to show the non-formatted statements), but keeps
the change that uses statement ID as an identifier in the URL instead of
the raw statement.

**Release note (ui change)**: removed formatting to statements on the
statement, transaction and index details pages, change to replace raw
statement with statement ID in the URL remained.

Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Thomas Hardy <[email protected]>
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Jan 25, 2022
With cockroachdb#73876 there is a bit more asynchrony than before and
thus the test must wait until all the ranges have completed splitting
before it attempts the last backup, so that the ExportRequest targets
the range with the correct SpanConfig applied to it.

Fixes: cockroachdb#75202

Release note: None
craig bot pushed a commit that referenced this pull request Jan 26, 2022
75491: backupccl: deflake TestGCDropIndexSpanExpansion r=irfansharif a=adityamaru

With #73876 there is a bit more asynchrony than before and
thus the test must wait until all the ranges have completed splitting
before it attempts the last backup, so that the ExportRequest targets
the range with the correct SpanConfig applied to it.

Fixes: #75202

Release note: None

Co-authored-by: Aditya Maru <[email protected]>
knz added a commit to knz/cockroach that referenced this pull request Oct 20, 2022
Prior to this patch, the code in `runStartSQL` was using
a different sequence of steps than in `runStart`, even
for those steps that are beneficial to run in both cases.

This commit fixes that. In particular it adds the following
missing bits to `cockroach mt start-sql`:

- it fixes support for the (test-only)
  `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from cockroachdb#4754)

- it adds a tracing span for the startup code. (from cockroachdb#8712)

- it properly supports `--listening-url-file`. (from cockroachdb#15468)

- it properly does sanitization of `--external-io-dir`. (from cockroachdb#19725)

- it sets the proper log severity level for gRPC. (from cockroachdb#20308)

- it reports the command-line and env config to logs. (from cockroachdb#21344)

- it prevents startup if there is a `_CRITICAL_ALERT.txt` file
  in the store directory. (from cockroachdb#42401)

- sets the umask for newly created file to remove "other" permission
  bits. This was a security team request originally. (from cockroachdb#44043)

- it sets `GOMAXPROCS` from current cgroup limits. (from cockroachdb#57390)

- it stops the server early if the storage disk is full. (from cockroachdb#66893)

- it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env
  var. (from cockroachdb#73876)

To review this commit, it is useful to open the files
`start.go` and `mt_start_sql.go` side-by-side, which
clarifies how much closer they have become to each other.
Looking at the `mt_start_sql.go` changes without
that context likely makes the review more difficult.

A later commit will actually merge the two code paths together so
there is just one thing to maintain.

Release note: None
knz added a commit to knz/cockroach that referenced this pull request Oct 28, 2022
Prior to this patch, the code in `runStartSQL` was using
a different sequence of steps than in `runStart`, even
for those steps that are beneficial to run in both cases.

This commit fixes that. In particular it adds the following
missing bits to `cockroach mt start-sql`:

- it fixes support for the (test-only)
  `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from cockroachdb#4754)

- it adds a tracing span for the startup code. (from cockroachdb#8712)

- it properly supports `--listening-url-file`. (from cockroachdb#15468)

- it properly does sanitization of `--external-io-dir`. (from cockroachdb#19725)

- it sets the proper log severity level for gRPC. (from cockroachdb#20308)

- it reports the command-line and env config to logs. (from cockroachdb#21344)

- it prevents startup if there is a `_CRITICAL_ALERT.txt` file
  in the store directory. (from cockroachdb#42401)

- sets the umask for newly created file to remove "other" permission
  bits. This was a security team request originally. (from cockroachdb#44043)

- it sets `GOMAXPROCS` from current cgroup limits. (from cockroachdb#57390)

- it stops the server early if the storage disk is full. (from cockroachdb#66893)

- it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env
  var. (from cockroachdb#73876)

To review this commit, it is useful to open the files
`start.go` and `mt_start_sql.go` side-by-side, which
clarifies how much closer they have become to each other.
Looking at the `mt_start_sql.go` changes without
that context likely makes the review more difficult.

A later commit will actually merge the two code paths together so
there is just one thing to maintain.

Release note: None
knz added a commit to knz/cockroach that referenced this pull request Oct 28, 2022
Prior to this patch, the code in `runStartSQL` was using
a different sequence of steps than in `runStart`, even
for those steps that are beneficial to run in both cases.

This commit fixes that. In particular it adds the following
missing bits to `cockroach mt start-sql`:

- it fixes support for the (test-only)
  `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from cockroachdb#4754)

- it adds a tracing span for the startup code. (from cockroachdb#8712)

- it properly supports `--listening-url-file`. (from cockroachdb#15468)

- it properly does sanitization of `--external-io-dir`. (from cockroachdb#19725)

- it sets the proper log severity level for gRPC. (from cockroachdb#20308)

- it reports the command-line and env config to logs. (from cockroachdb#21344)

- it prevents startup if there is a `_CRITICAL_ALERT.txt` file
  in the store directory. (from cockroachdb#42401)

- sets the umask for newly created file to remove "other" permission
  bits. This was a security team request originally. (from cockroachdb#44043)

- it sets `GOMAXPROCS` from current cgroup limits. (from cockroachdb#57390)

- it stops the server early if the storage disk is full. (from cockroachdb#66893)

- it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env
  var. (from cockroachdb#73876)

To review this commit, it is useful to open the files
`start.go` and `mt_start_sql.go` side-by-side, which
clarifies how much closer they have become to each other.
Looking at the `mt_start_sql.go` changes without
that context likely makes the review more difficult.

A later commit will actually merge the two code paths together so
there is just one thing to maintain.

Release note: None
knz added a commit to knz/cockroach that referenced this pull request Oct 31, 2022
Prior to this patch, the code in `runStartSQL` was using
a different sequence of steps than in `runStart`, even
for those steps that are beneficial to run in both cases.

This commit fixes that. In particular it adds the following
missing bits to `cockroach mt start-sql`:

- it fixes support for the (test-only)
  `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from cockroachdb#4754)

- it adds a tracing span for the startup code. (from cockroachdb#8712)

- it properly supports `--listening-url-file`. (from cockroachdb#15468)

- it properly does sanitization of `--external-io-dir`. (from cockroachdb#19725)

- it sets the proper log severity level for gRPC. (from cockroachdb#20308)

- it reports the command-line and env config to logs. (from cockroachdb#21344)

- it prevents startup if there is a `_CRITICAL_ALERT.txt` file
  in the store directory. (from cockroachdb#42401)

- sets the umask for newly created file to remove "other" permission
  bits. This was a security team request originally. (from cockroachdb#44043)

- it sets `GOMAXPROCS` from current cgroup limits. (from cockroachdb#57390)

- it stops the server early if the storage disk is full. (from cockroachdb#66893)

- it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env
  var. (from cockroachdb#73876)

To review this commit, it is useful to open the files
`start.go` and `mt_start_sql.go` side-by-side, which
clarifies how much closer they have become to each other.
Looking at the `mt_start_sql.go` changes without
that context likely makes the review more difficult.

A later commit will actually merge the two code paths together so
there is just one thing to maintain.

Release note: None
craig bot pushed a commit that referenced this pull request Oct 31, 2022
90176: cli/start: unify code between `cockroach start` and `cockroach mt start-sql` r=andreimatei a=knz

Fixes #89974.
Fixes #90831.
Fixes #90524.

This PR merges the server initialization code between `cockroach start` and `cockroach mt start-sql`.

In doing so, it brings `cockroach mt start-sql` closer to what we expect from proper CockroachDB server processes:

- it fixes support for the (test-only)
  `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from #4754)

- it adds a tracing span for the startup code. (from #8712!!)

- it properly supports `--listening-url-file`. (from #15468)

- it properly does sanitization of `--external-io-dir`. (from #19725)

- it sets the proper log severity level for gRPC. (from #20308)

- it reports the command-line and env config to logs. (from #21344)

- it prevents startup if there is a `_CRITICAL_ALERT.txt` file
  in the store directory. (from #42401)

- sets the umask for newly created file to remove "other" permission
  bits. This was a security team request originally. (from #44043)

- it recovers support for `DISABLE_STARTING_BACKGROUND_JOBS`. (from #44786)

- it sets `GOMAXPROCS` from current cgroup limits. (from #57390)

- it stops the server early if the storage disk is full. (from #66893)

- it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env
  var. (from #73876)

See the individual commit for details.



90660: sql: add contention_events to cluster_execution_insights r=j82w a=j82w

The original contention column will remain
to make query operations faster. The events
are being put into a json column because it's
possible there could be multiple blocking events
for a single statement. The json column avoids the complexity of adding another table and keeping it
in sync with the insights table.

The table can be joined with index_columns and tables to get the database name, table name, and index name that contention occurred on. This does not contain the blocking statement information, and the blocking fingerprint id.

closes: #88561

Release note (sql change): Adds contention_events
to cluster_execution_insights. This is used
to see which transaction is blocking the specific
statement.

90719: opgen: added a bool field in struct opgen.transition r=Xiang-Gu a=Xiang-Gu

This PR adds a bool field in struct opgen.transition that indicates whether it results from a `equiv(xx)` transition spec in the opgen file. It will be useful for a test where we need to find the inital status on a adding/dropping path. Without such a change, it can be problematic if we have a `equiv(xx)` spec as the first transition. E.g.
```
  ToAbsent(
    PUBLIC,
    equiv(VALIDATED),
    to(WRITE_ONLY),
    to(ABSENT),
  )
```
Without this change, the inital status will confusingly be `VALIDATED`, and the next status will be `PUBLIC`.
With this change, the initial status will be `PUBLIC`, and the next status will be `WRITE_ONLY`.

We also added some comments when we make transitions from the specs.

Epic: None
Release note: None

90865: sql: use bare name string of new pk  to compare with pk name when altering primary key r=chengxiong-ruan a=chengxiong-ruan

Fixes #90836

Release note (sql change): previously, the `DROP CONSTRAINT, ADD CONSTRAINT` in one trick to have a new primary key without moving old primary key to be a secondary index didn't work if the primary key name is a reserved SQL keyword. A `constraint already exists` error was returned. This patch fixed the bug, the trick now also works with primary key named as reserved keywords.

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: j82w <[email protected]>
Co-authored-by: Xiang Gu <[email protected]>
Co-authored-by: Chengxiong Ruan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants