-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: enable span configs by default #73876
Merged
craig
merged 7 commits into
cockroachdb:master
from
irfansharif:211214.enable-spanconfigs
Jan 18, 2022
Merged
*: enable span configs by default #73876
craig
merged 7 commits into
cockroachdb:master
from
irfansharif:211214.enable-spanconfigs
Jan 18, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
irfansharif
requested review from
samiskin,
tbg,
a team and
erikgrinaker
and removed request for
a team,
samiskin and
tbg
December 15, 2021 22:45
24 tasks
Heads up, you can use draft PRs to avoid sending out emails. |
irfansharif
force-pushed
the
211214.enable-spanconfigs
branch
3 times, most recently
from
December 21, 2021 01:11
a24c292
to
61125e5
Compare
irfansharif
force-pushed
the
211214.enable-spanconfigs
branch
from
December 22, 2021 03:35
4214d8a
to
8b4b2b5
Compare
irfansharif
changed the title
[wip] *: enable span configs infrastructure by default
*: enable span configs infrastructure by default
Dec 22, 2021
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this pull request
Jan 24, 2022
Fixed cockroachdb#75020. This test makes use of an "upsert-until-backpressure" primitive, where it continually writes blobs to a scratch table configured with a lower max range size (1<<18 from the default of 512<<20) until it experiences replica backpressure. Before cockroachdb#73876 (when using the system config span), the splitting off of the scratch table's ranges happened near instantly after the creation of the table itself. This changed slightly with the span configs infrastructure, where there's more of asynchrony built in and ranges may appear after a while longer. The test previously started upserting as soon as it created the table, being able to implicitly rely on the tight synchrony of already having the table's ranges carved out. This comes when deciding to record a replica's largest previous max range size -- something we only do if the replica's current size is larger than the new limit (see (*Replica).SetSpanCnfig). When we weren't upserting until the range applied the latest config, this was not possible. With span configs and its additional asynchrony, this assumption no longer held. It was possible then for the range to accept writes larger than the newest max range size, which unintentionally (for this test at least) triggers an escape hatch where we avoid backpressure when lowering a range's max size below its current size (cockroachdb#46863). The failure is easy to repro. This test runs in ~8s, and with the following we were reliably seeing timeouts where the test was waiting for backpressure to kick in (but it never did because of the escape hatch). dev test pkg/kv/kvserver \ -f TestProtectedTimestamps -v --show-logs \ --timeout 300s --ignore-cache --count 10 De-flaking this test is simple -- we just wait for the table's replicas to apply the latest config before issuing writes to it. Release note: None
craig bot
pushed a commit
that referenced
this pull request
Jan 25, 2022
74765: roachpb: introduce the concept of `SystemSpanConfig` and related protos r=arulajmani a=arulajmani This patch is motivated by the desire to let the host tenant lay protected timestamps on one or all secondary tenants' keyspace. It also provides a mechanism to allow secondary tenants to lay protected timestamps on their entire keyspace without updating every span configuration. We introduce the concept of `SystemSpanConfig` and `SystemSpanConfigTarget` to enable this. We tie these together using a `SystemSpanConfigEntry`. A `SystemSpanConfig` is a system installed configuration that can apply to multiple spans. It only contains protected timestamp information. A `SystemSpanConfigTarget` is used to specify the spans a `SystemSpanConfig` applies over. It can be used to target the entire (logical) cluster or a particular secondary tenant. We will ensure that only the host tenant can target secondary tenants in a future PR that actually persists `SystemSpanConfigs`. We will persist `SystemSpanConfigs` in `system.span_configurations` in a future patch. The `SystemSpanConfigTarget` will be encoded into special reserved keys when we do so. This change introduces the notion of a hierarchy to span configurations. The configuration that applies to a span will now bee the `SpanConfig` stored in `system.span_configurations` combined with all the `SystemSpanConfigs` that apply to the span. This can be at most 4 levels deep -- for a secondary tenant's range, the secondary tenant can install a `SystemSpanConfig` that applies to all its ranges, the host tenant can install a `SystemSpanConfig` that applies to all ranges of the secondary tenant, and the host tenant can install a `SystemSpanConfig` that applies to all ranges. These protos form the data model which will later be used to enable protected timestamp support for secondary tenants using the span config infrastructure. It will be used by the various components such as the `SQLTranslator`, `KVAccessor`, `Reconciler` etc. Release note: None 75233: kvserver: avoid clobbering replica conf r=irfansharif a=irfansharif Fixes #75109. There are two ways to read the configuration applied over a given replica: (a) the copy embedded in each replica struct (b) spanconfig.StoreReader, hanging off the store struct The interface in (b) is implemented by both the span configs infrastructure and the legacy system config span it's designed to replace; it's typically used by KV queues (see #69172). When switching between subsystems in KV (controlled by spanconfig.store.enabled), for we transparently switch the source for (b). We also use then use the right source to refresh (a) for every replica. Incremental updates thereafter mutate (a) directly. As you'd expect, we need to take special care that only one subsystem is writing to (a) at a given point in time, a guarantee that was not upheld before this commit. This bug manifested after we enabled span configs by default (see #73876), and likely affected many tests. To avoid the system config span from clobbering (a) when span configs are enabled, we just need to check what spanconfig.store.enabled holds at the right spot. To repro: # Reliably fails with 1-2m on my GCE worker before this patch, # doesn't after. dev test pkg/kv/kvserver \ -f TestBackpressureNotAppliedWhenReducingRangeSize \ -v --show-logs --timeout 15m --stress Release note: None 75280: sql: deprecate TableDescriptor.GCMutations r=postamar,ajwerner a=stevendanna This appears unused. While the schema changer adds entries that the gc job subsequently removes, the only other code that made use of this field (outside of tests) was FindIndexByID. FindIndexByID appears to use it to return a special error that no one looks for. Release note: None Co-authored-by: arulajmani <[email protected]> Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Steven Danna <[email protected]>
craig bot
pushed a commit
that referenced
this pull request
Jan 25, 2022
75304: execbuilder: deflake TestExecBuild r=irfansharif a=irfansharif Fixes #74933. This test asserts on physical plans of statements after moving ranges around, i.e. whether distsql execution is "local" or "distributed". This is determined by: - where the distsql planner places processor cores, - which in turn is a function of the span resolver, - which delegates to the embedded distsender, - which operates off of a range cache. The range cache, like the name suggests, can hold stale data. This test moved replicas of indexes around to induce distributed execution, but was not accounting for the caches holding onto stale data. In my test runs every so often I was seeing stale range descriptors from before the relocate trip up the planner to generate a local execution, flaking the test. Fortunately for us, all we need to do is slap on a retry. To repro: # Took under a minute on my gceworker. Helped to trim down the test # file slightly. dev test pkg/sql/opt/exec/execbuilder \ -f 'TestExecBuild/5node' -v --show-logs \ --stress --stress-args '-p 4' Release note: None 75436: kvserver: de-flake TestProtectedTimestamps r=irfansharif a=irfansharif Fixed #75020. This test makes use of an "upsert-until-backpressure" primitive, where it continually writes blobs to a scratch table configured with a lower max range size (1<<18 from the default of 512<<20) until it experiences replica backpressure. Before #73876 (when using the system config span), the splitting off of the scratch table's ranges happened near instantly after the creation of the table itself. This changed slightly with the span configs infrastructure, where there's more of asynchrony built in and ranges may appear after a while longer. The test previously started upserting as soon as it created the table, being able to implicitly rely on the tight synchrony of already having the table's ranges carved out. This comes when deciding to record a replica's largest previous max range size -- something we only do if the replica's current size is larger than the new limit (see (*Replica).SetSpanCnfig). When we weren't upserting until the range applied the latest config, this was not possible. With span configs and its additional asynchrony, this assumption no longer held. It was possible then for the range to accept writes larger than the newest max range size, which unintentionally (for this test at least) triggers an escape hatch where we avoid backpressure when lowering a range's max size below its current size (#46863). The failure is easy to repro. This test runs in ~8s, and with the following we were reliably seeing timeouts where the test was waiting for backpressure to kick in (but it never did because of the escape hatch). dev test pkg/kv/kvserver \ -f TestProtectedTimestamps -v --show-logs \ --timeout 300s --ignore-cache --count 10 De-flaking this test is simple -- we just wait for the table's replicas to apply the latest config before issuing writes to it. Release note: None 75443: ui: removed formatting to statements on the details pages r=THardy98 a=THardy98 **ui: removed formatting to statements on the details pages** Previously, statements displayed on the statement/transaction/index details pages were formatted (formatting was added to allow for better readability of statements on these detail pages). However, statements queries from the frontend were noticeably slower due to this implementation. This change reverts the changes to statement formatting (updates the fixtures to show the non-formatted statements), but keeps the change that uses statement ID as an identifier in the URL instead of the raw statement. **Release note (ui change)**: removed formatting to statements on the statement, transaction and index details pages, change to replace raw statement with statement ID in the URL remained. Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Thomas Hardy <[email protected]>
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this pull request
Jan 25, 2022
With cockroachdb#73876 there is a bit more asynchrony than before and thus the test must wait until all the ranges have completed splitting before it attempts the last backup, so that the ExportRequest targets the range with the correct SpanConfig applied to it. Fixes: cockroachdb#75202 Release note: None
craig bot
pushed a commit
that referenced
this pull request
Jan 26, 2022
75491: backupccl: deflake TestGCDropIndexSpanExpansion r=irfansharif a=adityamaru With #73876 there is a bit more asynchrony than before and thus the test must wait until all the ranges have completed splitting before it attempts the last backup, so that the ExportRequest targets the range with the correct SpanConfig applied to it. Fixes: #75202 Release note: None Co-authored-by: Aditya Maru <[email protected]>
This was referenced Jan 26, 2022
knz
added a commit
to knz/cockroach
that referenced
this pull request
Oct 20, 2022
Prior to this patch, the code in `runStartSQL` was using a different sequence of steps than in `runStart`, even for those steps that are beneficial to run in both cases. This commit fixes that. In particular it adds the following missing bits to `cockroach mt start-sql`: - it fixes support for the (test-only) `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from cockroachdb#4754) - it adds a tracing span for the startup code. (from cockroachdb#8712) - it properly supports `--listening-url-file`. (from cockroachdb#15468) - it properly does sanitization of `--external-io-dir`. (from cockroachdb#19725) - it sets the proper log severity level for gRPC. (from cockroachdb#20308) - it reports the command-line and env config to logs. (from cockroachdb#21344) - it prevents startup if there is a `_CRITICAL_ALERT.txt` file in the store directory. (from cockroachdb#42401) - sets the umask for newly created file to remove "other" permission bits. This was a security team request originally. (from cockroachdb#44043) - it sets `GOMAXPROCS` from current cgroup limits. (from cockroachdb#57390) - it stops the server early if the storage disk is full. (from cockroachdb#66893) - it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env var. (from cockroachdb#73876) To review this commit, it is useful to open the files `start.go` and `mt_start_sql.go` side-by-side, which clarifies how much closer they have become to each other. Looking at the `mt_start_sql.go` changes without that context likely makes the review more difficult. A later commit will actually merge the two code paths together so there is just one thing to maintain. Release note: None
knz
added a commit
to knz/cockroach
that referenced
this pull request
Oct 28, 2022
Prior to this patch, the code in `runStartSQL` was using a different sequence of steps than in `runStart`, even for those steps that are beneficial to run in both cases. This commit fixes that. In particular it adds the following missing bits to `cockroach mt start-sql`: - it fixes support for the (test-only) `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from cockroachdb#4754) - it adds a tracing span for the startup code. (from cockroachdb#8712) - it properly supports `--listening-url-file`. (from cockroachdb#15468) - it properly does sanitization of `--external-io-dir`. (from cockroachdb#19725) - it sets the proper log severity level for gRPC. (from cockroachdb#20308) - it reports the command-line and env config to logs. (from cockroachdb#21344) - it prevents startup if there is a `_CRITICAL_ALERT.txt` file in the store directory. (from cockroachdb#42401) - sets the umask for newly created file to remove "other" permission bits. This was a security team request originally. (from cockroachdb#44043) - it sets `GOMAXPROCS` from current cgroup limits. (from cockroachdb#57390) - it stops the server early if the storage disk is full. (from cockroachdb#66893) - it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env var. (from cockroachdb#73876) To review this commit, it is useful to open the files `start.go` and `mt_start_sql.go` side-by-side, which clarifies how much closer they have become to each other. Looking at the `mt_start_sql.go` changes without that context likely makes the review more difficult. A later commit will actually merge the two code paths together so there is just one thing to maintain. Release note: None
knz
added a commit
to knz/cockroach
that referenced
this pull request
Oct 28, 2022
Prior to this patch, the code in `runStartSQL` was using a different sequence of steps than in `runStart`, even for those steps that are beneficial to run in both cases. This commit fixes that. In particular it adds the following missing bits to `cockroach mt start-sql`: - it fixes support for the (test-only) `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from cockroachdb#4754) - it adds a tracing span for the startup code. (from cockroachdb#8712) - it properly supports `--listening-url-file`. (from cockroachdb#15468) - it properly does sanitization of `--external-io-dir`. (from cockroachdb#19725) - it sets the proper log severity level for gRPC. (from cockroachdb#20308) - it reports the command-line and env config to logs. (from cockroachdb#21344) - it prevents startup if there is a `_CRITICAL_ALERT.txt` file in the store directory. (from cockroachdb#42401) - sets the umask for newly created file to remove "other" permission bits. This was a security team request originally. (from cockroachdb#44043) - it sets `GOMAXPROCS` from current cgroup limits. (from cockroachdb#57390) - it stops the server early if the storage disk is full. (from cockroachdb#66893) - it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env var. (from cockroachdb#73876) To review this commit, it is useful to open the files `start.go` and `mt_start_sql.go` side-by-side, which clarifies how much closer they have become to each other. Looking at the `mt_start_sql.go` changes without that context likely makes the review more difficult. A later commit will actually merge the two code paths together so there is just one thing to maintain. Release note: None
knz
added a commit
to knz/cockroach
that referenced
this pull request
Oct 31, 2022
Prior to this patch, the code in `runStartSQL` was using a different sequence of steps than in `runStart`, even for those steps that are beneficial to run in both cases. This commit fixes that. In particular it adds the following missing bits to `cockroach mt start-sql`: - it fixes support for the (test-only) `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from cockroachdb#4754) - it adds a tracing span for the startup code. (from cockroachdb#8712) - it properly supports `--listening-url-file`. (from cockroachdb#15468) - it properly does sanitization of `--external-io-dir`. (from cockroachdb#19725) - it sets the proper log severity level for gRPC. (from cockroachdb#20308) - it reports the command-line and env config to logs. (from cockroachdb#21344) - it prevents startup if there is a `_CRITICAL_ALERT.txt` file in the store directory. (from cockroachdb#42401) - sets the umask for newly created file to remove "other" permission bits. This was a security team request originally. (from cockroachdb#44043) - it sets `GOMAXPROCS` from current cgroup limits. (from cockroachdb#57390) - it stops the server early if the storage disk is full. (from cockroachdb#66893) - it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env var. (from cockroachdb#73876) To review this commit, it is useful to open the files `start.go` and `mt_start_sql.go` side-by-side, which clarifies how much closer they have become to each other. Looking at the `mt_start_sql.go` changes without that context likely makes the review more difficult. A later commit will actually merge the two code paths together so there is just one thing to maintain. Release note: None
craig bot
pushed a commit
that referenced
this pull request
Oct 31, 2022
90176: cli/start: unify code between `cockroach start` and `cockroach mt start-sql` r=andreimatei a=knz Fixes #89974. Fixes #90831. Fixes #90524. This PR merges the server initialization code between `cockroach start` and `cockroach mt start-sql`. In doing so, it brings `cockroach mt start-sql` closer to what we expect from proper CockroachDB server processes: - it fixes support for the (test-only) `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from #4754) - it adds a tracing span for the startup code. (from #8712!!) - it properly supports `--listening-url-file`. (from #15468) - it properly does sanitization of `--external-io-dir`. (from #19725) - it sets the proper log severity level for gRPC. (from #20308) - it reports the command-line and env config to logs. (from #21344) - it prevents startup if there is a `_CRITICAL_ALERT.txt` file in the store directory. (from #42401) - sets the umask for newly created file to remove "other" permission bits. This was a security team request originally. (from #44043) - it recovers support for `DISABLE_STARTING_BACKGROUND_JOBS`. (from #44786) - it sets `GOMAXPROCS` from current cgroup limits. (from #57390) - it stops the server early if the storage disk is full. (from #66893) - it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env var. (from #73876) See the individual commit for details. 90660: sql: add contention_events to cluster_execution_insights r=j82w a=j82w The original contention column will remain to make query operations faster. The events are being put into a json column because it's possible there could be multiple blocking events for a single statement. The json column avoids the complexity of adding another table and keeping it in sync with the insights table. The table can be joined with index_columns and tables to get the database name, table name, and index name that contention occurred on. This does not contain the blocking statement information, and the blocking fingerprint id. closes: #88561 Release note (sql change): Adds contention_events to cluster_execution_insights. This is used to see which transaction is blocking the specific statement. 90719: opgen: added a bool field in struct opgen.transition r=Xiang-Gu a=Xiang-Gu This PR adds a bool field in struct opgen.transition that indicates whether it results from a `equiv(xx)` transition spec in the opgen file. It will be useful for a test where we need to find the inital status on a adding/dropping path. Without such a change, it can be problematic if we have a `equiv(xx)` spec as the first transition. E.g. ``` ToAbsent( PUBLIC, equiv(VALIDATED), to(WRITE_ONLY), to(ABSENT), ) ``` Without this change, the inital status will confusingly be `VALIDATED`, and the next status will be `PUBLIC`. With this change, the initial status will be `PUBLIC`, and the next status will be `WRITE_ONLY`. We also added some comments when we make transitions from the specs. Epic: None Release note: None 90865: sql: use bare name string of new pk to compare with pk name when altering primary key r=chengxiong-ruan a=chengxiong-ruan Fixes #90836 Release note (sql change): previously, the `DROP CONSTRAINT, ADD CONSTRAINT` in one trick to have a new primary key without moving old primary key to be a secondary index didn't work if the primary key name is a reserved SQL keyword. A `constraint already exists` error was returned. This patch fixed the bug, the trick now also works with primary key named as reserved keywords. Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: j82w <[email protected]> Co-authored-by: Xiang Gu <[email protected]> Co-authored-by: Chengxiong Ruan <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We were previously using an opt-in flag to enable span configs; this
commit flips it into an opt-out one. This has the following effects:
for a select few written:
spanconfig dependencies due to circular imports.
We also introduce a more minified form of the opt-out testing flag:
UseSystemConfigSpanForQueues. Using the minified form still enables the
SQL-side of span configs to run (Reconciler, SQLWatcher, SQLTranslator)
but defaults to using the system config span down in KV.
Release note: None