sql: rerunning an intermediate system table user ID migration will fail #91449

andyyang890 · 2022-11-07T22:30:10Z

I came across this issue when trying to write a series of migrations adding user ID columns to the system.role_members table.

For every schema change migration, a schemaExistsFn (defined here and used here) needs to be provided, which checks whether the schema change already exists. One such function is hasColumn (defined here), which verifies that either the column does not already exist or it matches the table descriptor in pkg/sql/catalog/systemschema/system.go exactly.

The previously merged internal user ID migrations, which can be found in pkg/upgrade/upgrades/system_users_role_id_migration.go and pkg/upgrade/upgrades/role_options_table_migration.go, roughly performs a system table schema change through the following series of migrations:

Add the new column(s) as nullable (NULL) columns to the table and create indexes for them
Backfill the column(s)
Change the column(s) to be not nullable (NOT NULL)

The migrations in step 1 for both of the existing migrations make use of hasColumn and so if the migration were to be run again after the adding of the columns has succeeded (but before the step 3 migration successfully completes), they would fail since hasColumn expects the column to either not exist, or be NOT NULL (to match the desired final state specified in the descriptor). This would prevent the completion of that migration as well as any further migrations. I believe this issue can be fixed by writing a custom schemaExistsFn for the migration in step 1 that basically mirrors hasColumn but allows the expected column descriptor to have Nullable == false while the actual column descriptor has Nullable == true.

However, I'm not sure if this would actually ever be a problem because it would mean that the migration succeeded in adding the column, but was somehow forced to run again. This only occurred during my local development because I had an incorrect query later on in my step 1 migration.

Jira issue: CRDB-21269

Epic CRDB-19134

The text was updated successfully, but these errors were encountered:

90828: roachtest: add --debug-always flag r=stevendanna a=stevendanna Occasionally, it is very useful to keep a cluster around even if the workload happened to complete without error. The --debug-always is like --debug but saves the cluster even if the test was successful. Epic: None Release note: None 91359: keys: accurately size pre-allocated buffers in MakeRangeKey r=arulajmani a=nvanbenschoten This commit includes a trio of changes that accurately size the byte buffers in `MakeRangeKey` and `MakeRangeKeyPrefix` to avoid unnecessary slice resizing (allocation + memcpy) when constructing range-local keys (for example, `RangeDescriptorKey` and `TransactionKey`). The first change is to include the size of the suffix and detail slices when pre-allocating the byte buffer. We know that `MakeRangeKey` will be appending these to the key, so they will force a resize if not accounted for upfront. The second change is to correctly account for the overhead of `EncodeBytesAscending`. The code was getting this wrong in two ways. First, it was failing to account for the 3 bytes of unconditional overhead added by the encoding scheme for the encoding type marker and terminator. Second, it was failing to account for the conditional overhead when bytes need to be escaped. We now accurately and efficiently compute the overhead ahead of time to avoid resizing. The third change is to redefine the transaction tombstone and push marker keys used with the timestamp cache. Previously, the tombstone and push specifiers were additional suffixes that we added after the `LocalTransactionSuffix`. Now, these specifiers are the only suffix added to the key, which avoids an additional key resize. ``` name old time/op new time/op delta KV/Update/Native/rows=1-10 70.6µs ± 2% 69.6µs ± 2% -1.38% (p=0.000 n=100+93) KV/Insert/Native/rows=1-10 46.0µs ± 2% 45.7µs ± 2% -0.62% (p=0.000 n=95+97) KV/Delete/Native/rows=1-10 47.3µs ± 2% 47.1µs ± 2% -0.55% (p=0.000 n=97+94) KV/Insert/Native/rows=10-10 74.4µs ± 2% 74.1µs ± 3% -0.45% (p=0.000 n=96+98) KV/Update/Native/rows=10-10 171µs ± 3% 170µs ± 3% -0.36% (p=0.045 n=96+98) KV/Delete/Native/rows=10-10 85.1µs ± 2% 84.8µs ± 2% -0.29% (p=0.020 n=98+93) KV/Update/SQL/rows=1-10 174µs ± 3% 174µs ± 3% -0.28% (p=0.041 n=97+89) KV/Insert/SQL/rows=1-10 131µs ± 2% 131µs ± 4% ~ (p=0.961 n=89+91) KV/Insert/SQL/rows=10-10 186µs ± 3% 186µs ± 3% ~ (p=0.970 n=94+92) KV/Update/SQL/rows=10-10 336µs ± 2% 336µs ± 3% ~ (p=0.947 n=92+92) KV/Delete/SQL/rows=1-10 149µs ± 2% 149µs ± 3% ~ (p=0.917 n=96+96) KV/Delete/SQL/rows=10-10 226µs ± 8% 225µs ±10% ~ (p=0.057 n=97+98) name old alloc/op new alloc/op delta KV/Insert/Native/rows=1-10 17.9kB ± 1% 17.6kB ± 1% -1.95% (p=0.000 n=100+100) KV/Delete/Native/rows=1-10 18.2kB ± 1% 17.9kB ± 1% -1.92% (p=0.000 n=94+94) KV/Update/Native/rows=1-10 25.1kB ± 0% 24.7kB ± 1% -1.44% (p=0.000 n=97+97) KV/Insert/Native/rows=10-10 44.9kB ± 1% 44.5kB ± 1% -0.88% (p=0.000 n=100+99) KV/Delete/Native/rows=10-10 42.4kB ± 1% 42.0kB ± 0% -0.87% (p=0.000 n=96+94) KV/Delete/SQL/rows=1-10 52.9kB ± 1% 52.6kB ± 1% -0.48% (p=0.000 n=96+100) KV/Update/Native/rows=10-10 75.2kB ± 1% 74.8kB ± 1% -0.43% (p=0.000 n=98+99) KV/Update/SQL/rows=1-10 52.6kB ± 1% 52.4kB ± 0% -0.35% (p=0.000 n=95+95) KV/Insert/SQL/rows=1-10 45.4kB ± 1% 45.3kB ± 0% -0.24% (p=0.000 n=97+93) KV/Update/SQL/rows=10-10 119kB ± 1% 119kB ± 1% -0.16% (p=0.002 n=97+99) KV/Delete/SQL/rows=10-10 88.0kB ± 1% 87.8kB ± 1% -0.14% (p=0.017 n=91+95) KV/Insert/SQL/rows=10-10 93.9kB ± 1% 93.7kB ± 1% -0.14% (p=0.000 n=98+100) name old allocs/op new allocs/op delta KV/Insert/Native/rows=1-10 142 ± 0% 130 ± 0% -8.45% (p=0.000 n=99+100) KV/Delete/Native/rows=1-10 143 ± 0% 131 ± 0% -8.39% (p=0.000 n=96+94) KV/Update/Native/rows=1-10 198 ± 0% 186 ± 0% -6.06% (p=0.000 n=99+98) KV/Delete/Native/rows=10-10 275 ± 0% 263 ± 0% -4.36% (p=0.000 n=90+90) KV/Insert/Native/rows=10-10 295 ± 0% 283 ± 0% -4.07% (p=0.000 n=93+91) KV/Update/Native/rows=10-10 472 ± 1% 460 ± 1% -2.55% (p=0.000 n=98+98) KV/Insert/SQL/rows=1-10 365 ± 0% 356 ± 0% -2.53% (p=0.000 n=97+75) KV/Delete/SQL/rows=1-10 402 ± 0% 393 ± 0% -2.24% (p=0.000 n=97+98) KV/Update/SQL/rows=1-10 509 ± 0% 500 ± 0% -1.81% (p=0.000 n=96+95) KV/Insert/SQL/rows=10-10 589 ± 0% 580 ± 0% -1.53% (p=0.000 n=94+95) KV/Delete/SQL/rows=10-10 623 ± 1% 614 ± 1% -1.47% (p=0.000 n=98+97) KV/Update/SQL/rows=10-10 858 ± 1% 849 ± 0% -1.03% (p=0.000 n=95+93) ``` I confirmed in heap profiles that this change eliminates all resizes of these buffers. Release note: None Epic: None 91719: storage: add MVCCExportFingerprintOptions r=stevendanna a=stevendanna This adds MVCCExportFingerprintOptions with two new options: - StripTenantPrefix - StripValueChecksum The goal of these options is to produce a fingerprint that can be used for comparing data across two tenants. Note that if arbitrary keys and values are encountered, both options have the possibility of erroneously removing data from the fingerprint that isn't actually a tenant prefix or checksum. Fixes #91150 Release note: None 91750: kvserver: factor out ReplicatedCmd r=pavelkalinnikov a=tbg **Background** We would like to apply log entries at startup time, and so we are working towards making all code related to entry application stand-alone (and, as a nice side product, more unit testable). We already have a decent set of abstractions in place: - `apply.Decoder` can consume log entries and turn them into (iterators over) `apply.Command`. - `apply.StateMachine` can make an `apply.Batch` which is a pebble batch along with the management of any below-raft side effects (replicated or in-memory) - `apply.Batch` has a `Stage` method that handles an `apply.Command`, in the simplest case doing little more than adding it to its pebble batch (but in the most complex cases, locking adjacent replicas for complicated split-merge dances). - `Stage` returns a `CheckedCmd`, which can be acked to the client. **This PR** This PR (i.e. this and the lead-up of preceding commits) provides an implementation of `apply.Command` (and friends) outside of `kvserver` in a way that does not lead to code duplication. Before this PR, `apply.Command` was implemented only by `*kvserver.replicatedCmd`[^rc], which is a struct wrapping the decoded raft entry and adding some state (such as whether the entry applies with a forced error, i.e. as an empty command, or not). Some of this state is necessary in standalone application as well (like the forced error), other state isn't (like a tracing span, or client's context). In particular, `replicatedCmd` holds on to a `ProposalData` which itself references types related to latching and the quota pool. These are hard to extract, and besides this isn't necessary as standalone application doesn't need them. Instead of trying to transitively extract `replicatedCmd` from `kvserver`, we split `replicatedCmd` into the standalone part, and let `replicatedCmd` extend the standalone part with the fields only applicable during application in the context of a `*Replica`. In effect, `raftlog.ReplicatedCmd` determines deterministically the effect of a raft command onto the state machine, whereas `replicatedCmd` additionally deals with additional concerns outside of that scope, such as managing concurrency between different state machines, acknowledging clients, updating in-memory counters, etc. This decoupling results in the type introduced in this PR, `ReplicatedCmd`, which is embedded into `replicatedCmd`. Both `*replicatedCmd` and `*ReplicatedCmd` implement `apply.Command` (as well as `apply.CheckedCommand` and `apply.AppliedCommand`). To avoid duplication, the implementations on `*replicatedCmd` fall through to those of `*ReplicatedCmd` whereever possible (via the embedding of the latter into the former). For example, the `IsTrivial` method is purely a property of the `ReplicatedEvalResult`, i.e. of the raft entry, and can thus be answered by `ReplicatedCmd.IsTrivial()`; `*replicatedCmd` does not override this method.[^concern] For another, a bit less trivial example, the `Rejected()` method checks whether there is a forced error for this command. This is determined below raft (i.e. not a property of the entry itself), but both stand-alone and regular application need to come to the same results here. This is why the forced error is a field on `ReplicatedCmd` and the implementation sits on the base as well, simply returning whether the field is set. An example where the implementations differ is `IsLocal()`: in standalone applications commands are never considered "local" (we never have a client waiting to be acked), so the base implementation returns `false`, but the implementation on `*replicatedCmd` overrides this to `return c.proposal != nil`. With this in place, we have an implementation of `apply.{,Checked,Applied}Command` present in `raftlog`. [^rc]: https://github.com/cockroachdb/cockroach/blob/fb4014a31b9b8d8235dc48de52196e64b185f490/pkg/kv/kvserver/replica_application_cmd.go#L27-L79 [^concern]: it might be too opaque to implement an interface wholly or in part via an embedded implementation. Reviewers should consider whether they'd prefer explicit "passthrough" methods to be added. **Next Steps** It's straightforward to implement the `apply.Decoder` seen in the walkthrough at the top. The tricky part is `apply.StateMachine`. Today's implementation of that is essentially the `*Replica`[^sm]; the meat of it is in `*replicaAppBatch`[^rb], which is the `apply.Batch`, and which holds a `*Replica` that it calls out to on many occasions, in particular during `Stage()`. The next challenge will be a similar separation of concerns as carried out above for `replicatedCmd`. Large parts of `Stage` are related to the deterministic handling of log entries during application, but others deal with updating in-memory state, keeping counters, notifying rangefeeds of events, etc, all concerns not relevant to standalone application and rather side-effects of state machine application vs a part of it. The goal is to introduce a handler interface which can separate these, and for which the standalone application provides an implementation of the essentials with no-ops for everything which is not a property of state machine application in itself. With this interface contained in it, `replicaAppBatch` can be moved to a leaf package, and can be used in both standalone and regular log entry application. [^sm]: https://github.com/cockroachdb/cockroach/blob/ccac3ddd85ca2fd4a8d02a89c82cd04761a1ce26/pkg/kv/kvserver/replica_application_state_machine.go#L106-L121 [^rb]: https://github.com/cockroachdb/cockroach/blob/ccac3ddd85ca2fd4a8d02a89c82cd04761a1ce26/pkg/kv/kvserver/replica_application_state_machine.go#L202-L234 Touches #75729. Epic: CRDB-220 Release note: None 91755: upgrades: add weaker column schema exists funcs for use with migrations r=ajwerner,rafiss a=andyyang890 This patch adds two schema exists functions for use with migrations that involve multiple schema changes on the same column(s) in order to preserve the idempotence of the migration(s). They are weaker in the sense that they do not check that the stored and final expected descriptor match. Informs #91449 Release note: None 91806: streampb: move out of CCL r=adityamaru a=stevendanna This moves streampb into pkg/streaming so that non-CCL code doesn't need to import CCL code. It also renames pkg/streaming to pkg/repstream. Fixes #91005 Release note: None 91920: ptsstorage: allow synthetic timestamps in pts storage r=HonoreDB a=aliher1911 Previously synthetic timestamps were causing failures in changefeeds if checkpoint contained a synthetic timestamps. Timestamp representation was parsed as decimal for storage which is not the case for synthetic timestamps. This commit changes pts storage to strip synthetic flag to mitigate the issue. Stripping synthetic flag should be safe as protected timestamp is not used to update key or transaction timestamps but to compare against GC thresholds. Release note: None Fixes #91922 Co-authored-by: Steven Danna <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Andy Yang <[email protected]> Co-authored-by: Oleg Afanasyev <[email protected]>

91756: upgrades: fix idempotence issue in system table role id migrations r=ajwerner,rafiss a=andyyang890 This patch fixes an idempotence issue in the role id migrations for the system.users and system.role_options tables. Fixes #91449 Release note: None Co-authored-by: Andy Yang <[email protected]>

andyyang890 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Nov 7, 2022

andyyang890 mentioned this issue Nov 14, 2022

upgrades: add weaker column schema exists funcs for use with migrations #91755

Merged

andyyang890 self-assigned this Nov 14, 2022

andyyang890 mentioned this issue Nov 14, 2022

upgrades: fix idempotence issue in system table role id migrations #91756

Merged

andyyang890 mentioned this issue Nov 15, 2022

release-22.2: upgrades: add weaker column schema exists funcs for use with migrations #91950

Merged

craig bot closed this as completed in 6594b56 Nov 16, 2022

andyyang890 mentioned this issue Nov 16, 2022

release-22.2: upgrades: fix idempotence issue in system table role id migrations #91973

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: rerunning an intermediate system table user ID migration will fail #91449

sql: rerunning an intermediate system table user ID migration will fail #91449

andyyang890 commented Nov 7, 2022 •

edited by exalate-issue-sync bot

Loading

sql: rerunning an intermediate system table user ID migration will fail #91449

sql: rerunning an intermediate system table user ID migration will fail #91449

Comments

andyyang890 commented Nov 7, 2022 • edited by exalate-issue-sync bot Loading

andyyang890 commented Nov 7, 2022 •

edited by exalate-issue-sync bot

Loading