Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
90828: roachtest: add --debug-always flag r=stevendanna a=stevendanna Occasionally, it is very useful to keep a cluster around even if the workload happened to complete without error. The --debug-always is like --debug but saves the cluster even if the test was successful. Epic: None Release note: None 91359: keys: accurately size pre-allocated buffers in MakeRangeKey r=arulajmani a=nvanbenschoten This commit includes a trio of changes that accurately size the byte buffers in `MakeRangeKey` and `MakeRangeKeyPrefix` to avoid unnecessary slice resizing (allocation + memcpy) when constructing range-local keys (for example, `RangeDescriptorKey` and `TransactionKey`). The first change is to include the size of the suffix and detail slices when pre-allocating the byte buffer. We know that `MakeRangeKey` will be appending these to the key, so they will force a resize if not accounted for upfront. The second change is to correctly account for the overhead of `EncodeBytesAscending`. The code was getting this wrong in two ways. First, it was failing to account for the 3 bytes of unconditional overhead added by the encoding scheme for the encoding type marker and terminator. Second, it was failing to account for the conditional overhead when bytes need to be escaped. We now accurately and efficiently compute the overhead ahead of time to avoid resizing. The third change is to redefine the transaction tombstone and push marker keys used with the timestamp cache. Previously, the tombstone and push specifiers were additional suffixes that we added after the `LocalTransactionSuffix`. Now, these specifiers are the only suffix added to the key, which avoids an additional key resize. ``` name old time/op new time/op delta KV/Update/Native/rows=1-10 70.6µs ± 2% 69.6µs ± 2% -1.38% (p=0.000 n=100+93) KV/Insert/Native/rows=1-10 46.0µs ± 2% 45.7µs ± 2% -0.62% (p=0.000 n=95+97) KV/Delete/Native/rows=1-10 47.3µs ± 2% 47.1µs ± 2% -0.55% (p=0.000 n=97+94) KV/Insert/Native/rows=10-10 74.4µs ± 2% 74.1µs ± 3% -0.45% (p=0.000 n=96+98) KV/Update/Native/rows=10-10 171µs ± 3% 170µs ± 3% -0.36% (p=0.045 n=96+98) KV/Delete/Native/rows=10-10 85.1µs ± 2% 84.8µs ± 2% -0.29% (p=0.020 n=98+93) KV/Update/SQL/rows=1-10 174µs ± 3% 174µs ± 3% -0.28% (p=0.041 n=97+89) KV/Insert/SQL/rows=1-10 131µs ± 2% 131µs ± 4% ~ (p=0.961 n=89+91) KV/Insert/SQL/rows=10-10 186µs ± 3% 186µs ± 3% ~ (p=0.970 n=94+92) KV/Update/SQL/rows=10-10 336µs ± 2% 336µs ± 3% ~ (p=0.947 n=92+92) KV/Delete/SQL/rows=1-10 149µs ± 2% 149µs ± 3% ~ (p=0.917 n=96+96) KV/Delete/SQL/rows=10-10 226µs ± 8% 225µs ±10% ~ (p=0.057 n=97+98) name old alloc/op new alloc/op delta KV/Insert/Native/rows=1-10 17.9kB ± 1% 17.6kB ± 1% -1.95% (p=0.000 n=100+100) KV/Delete/Native/rows=1-10 18.2kB ± 1% 17.9kB ± 1% -1.92% (p=0.000 n=94+94) KV/Update/Native/rows=1-10 25.1kB ± 0% 24.7kB ± 1% -1.44% (p=0.000 n=97+97) KV/Insert/Native/rows=10-10 44.9kB ± 1% 44.5kB ± 1% -0.88% (p=0.000 n=100+99) KV/Delete/Native/rows=10-10 42.4kB ± 1% 42.0kB ± 0% -0.87% (p=0.000 n=96+94) KV/Delete/SQL/rows=1-10 52.9kB ± 1% 52.6kB ± 1% -0.48% (p=0.000 n=96+100) KV/Update/Native/rows=10-10 75.2kB ± 1% 74.8kB ± 1% -0.43% (p=0.000 n=98+99) KV/Update/SQL/rows=1-10 52.6kB ± 1% 52.4kB ± 0% -0.35% (p=0.000 n=95+95) KV/Insert/SQL/rows=1-10 45.4kB ± 1% 45.3kB ± 0% -0.24% (p=0.000 n=97+93) KV/Update/SQL/rows=10-10 119kB ± 1% 119kB ± 1% -0.16% (p=0.002 n=97+99) KV/Delete/SQL/rows=10-10 88.0kB ± 1% 87.8kB ± 1% -0.14% (p=0.017 n=91+95) KV/Insert/SQL/rows=10-10 93.9kB ± 1% 93.7kB ± 1% -0.14% (p=0.000 n=98+100) name old allocs/op new allocs/op delta KV/Insert/Native/rows=1-10 142 ± 0% 130 ± 0% -8.45% (p=0.000 n=99+100) KV/Delete/Native/rows=1-10 143 ± 0% 131 ± 0% -8.39% (p=0.000 n=96+94) KV/Update/Native/rows=1-10 198 ± 0% 186 ± 0% -6.06% (p=0.000 n=99+98) KV/Delete/Native/rows=10-10 275 ± 0% 263 ± 0% -4.36% (p=0.000 n=90+90) KV/Insert/Native/rows=10-10 295 ± 0% 283 ± 0% -4.07% (p=0.000 n=93+91) KV/Update/Native/rows=10-10 472 ± 1% 460 ± 1% -2.55% (p=0.000 n=98+98) KV/Insert/SQL/rows=1-10 365 ± 0% 356 ± 0% -2.53% (p=0.000 n=97+75) KV/Delete/SQL/rows=1-10 402 ± 0% 393 ± 0% -2.24% (p=0.000 n=97+98) KV/Update/SQL/rows=1-10 509 ± 0% 500 ± 0% -1.81% (p=0.000 n=96+95) KV/Insert/SQL/rows=10-10 589 ± 0% 580 ± 0% -1.53% (p=0.000 n=94+95) KV/Delete/SQL/rows=10-10 623 ± 1% 614 ± 1% -1.47% (p=0.000 n=98+97) KV/Update/SQL/rows=10-10 858 ± 1% 849 ± 0% -1.03% (p=0.000 n=95+93) ``` I confirmed in heap profiles that this change eliminates all resizes of these buffers. Release note: None Epic: None 91719: storage: add MVCCExportFingerprintOptions r=stevendanna a=stevendanna This adds MVCCExportFingerprintOptions with two new options: - StripTenantPrefix - StripValueChecksum The goal of these options is to produce a fingerprint that can be used for comparing data across two tenants. Note that if arbitrary keys and values are encountered, both options have the possibility of erroneously removing data from the fingerprint that isn't actually a tenant prefix or checksum. Fixes #91150 Release note: None 91750: kvserver: factor out ReplicatedCmd r=pavelkalinnikov a=tbg **Background** We would like to apply log entries at startup time, and so we are working towards making all code related to entry application stand-alone (and, as a nice side product, more unit testable). We already have a decent set of abstractions in place: - `apply.Decoder` can consume log entries and turn them into (iterators over) `apply.Command`. - `apply.StateMachine` can make an `apply.Batch` which is a pebble batch along with the management of any below-raft side effects (replicated or in-memory) - `apply.Batch` has a `Stage` method that handles an `apply.Command`, in the simplest case doing little more than adding it to its pebble batch (but in the most complex cases, locking adjacent replicas for complicated split-merge dances). - `Stage` returns a `CheckedCmd`, which can be acked to the client. **This PR** This PR (i.e. this and the lead-up of preceding commits) provides an implementation of `apply.Command` (and friends) outside of `kvserver` in a way that does not lead to code duplication. Before this PR, `apply.Command` was implemented only by `*kvserver.replicatedCmd`[^rc], which is a struct wrapping the decoded raft entry and adding some state (such as whether the entry applies with a forced error, i.e. as an empty command, or not). Some of this state is necessary in standalone application as well (like the forced error), other state isn't (like a tracing span, or client's context). In particular, `replicatedCmd` holds on to a `ProposalData` which itself references types related to latching and the quota pool. These are hard to extract, and besides this isn't necessary as standalone application doesn't need them. Instead of trying to transitively extract `replicatedCmd` from `kvserver`, we split `replicatedCmd` into the standalone part, and let `replicatedCmd` extend the standalone part with the fields only applicable during application in the context of a `*Replica`. In effect, `raftlog.ReplicatedCmd` determines deterministically the effect of a raft command onto the state machine, whereas `replicatedCmd` additionally deals with additional concerns outside of that scope, such as managing concurrency between different state machines, acknowledging clients, updating in-memory counters, etc. This decoupling results in the type introduced in this PR, `ReplicatedCmd`, which is embedded into `replicatedCmd`. Both `*replicatedCmd` and `*ReplicatedCmd` implement `apply.Command` (as well as `apply.CheckedCommand` and `apply.AppliedCommand`). To avoid duplication, the implementations on `*replicatedCmd` fall through to those of `*ReplicatedCmd` whereever possible (via the embedding of the latter into the former). For example, the `IsTrivial` method is purely a property of the `ReplicatedEvalResult`, i.e. of the raft entry, and can thus be answered by `ReplicatedCmd.IsTrivial()`; `*replicatedCmd` does not override this method.[^concern] For another, a bit less trivial example, the `Rejected()` method checks whether there is a forced error for this command. This is determined below raft (i.e. not a property of the entry itself), but both stand-alone and regular application need to come to the same results here. This is why the forced error is a field on `ReplicatedCmd` and the implementation sits on the base as well, simply returning whether the field is set. An example where the implementations differ is `IsLocal()`: in standalone applications commands are never considered "local" (we never have a client waiting to be acked), so the base implementation returns `false`, but the implementation on `*replicatedCmd` overrides this to `return c.proposal != nil`. With this in place, we have an implementation of `apply.{,Checked,Applied}Command` present in `raftlog`. [^rc]: https://github.com/cockroachdb/cockroach/blob/fb4014a31b9b8d8235dc48de52196e64b185f490/pkg/kv/kvserver/replica_application_cmd.go#L27-L79 [^concern]: it might be too opaque to implement an interface wholly or in part via an embedded implementation. Reviewers should consider whether they'd prefer explicit "passthrough" methods to be added. **Next Steps** It's straightforward to implement the `apply.Decoder` seen in the walkthrough at the top. The tricky part is `apply.StateMachine`. Today's implementation of that is essentially the `*Replica`[^sm]; the meat of it is in `*replicaAppBatch`[^rb], which is the `apply.Batch`, and which holds a `*Replica` that it calls out to on many occasions, in particular during `Stage()`. The next challenge will be a similar separation of concerns as carried out above for `replicatedCmd`. Large parts of `Stage` are related to the deterministic handling of log entries during application, but others deal with updating in-memory state, keeping counters, notifying rangefeeds of events, etc, all concerns not relevant to standalone application and rather side-effects of state machine application vs a part of it. The goal is to introduce a handler interface which can separate these, and for which the standalone application provides an implementation of the essentials with no-ops for everything which is not a property of state machine application in itself. With this interface contained in it, `replicaAppBatch` can be moved to a leaf package, and can be used in both standalone and regular log entry application. [^sm]: https://github.com/cockroachdb/cockroach/blob/ccac3ddd85ca2fd4a8d02a89c82cd04761a1ce26/pkg/kv/kvserver/replica_application_state_machine.go#L106-L121 [^rb]: https://github.com/cockroachdb/cockroach/blob/ccac3ddd85ca2fd4a8d02a89c82cd04761a1ce26/pkg/kv/kvserver/replica_application_state_machine.go#L202-L234 Touches #75729. Epic: CRDB-220 Release note: None 91755: upgrades: add weaker column schema exists funcs for use with migrations r=ajwerner,rafiss a=andyyang890 This patch adds two schema exists functions for use with migrations that involve multiple schema changes on the same column(s) in order to preserve the idempotence of the migration(s). They are weaker in the sense that they do not check that the stored and final expected descriptor match. Informs #91449 Release note: None 91806: streampb: move out of CCL r=adityamaru a=stevendanna This moves streampb into pkg/streaming so that non-CCL code doesn't need to import CCL code. It also renames pkg/streaming to pkg/repstream. Fixes #91005 Release note: None 91920: ptsstorage: allow synthetic timestamps in pts storage r=HonoreDB a=aliher1911 Previously synthetic timestamps were causing failures in changefeeds if checkpoint contained a synthetic timestamps. Timestamp representation was parsed as decimal for storage which is not the case for synthetic timestamps. This commit changes pts storage to strip synthetic flag to mitigate the issue. Stripping synthetic flag should be safe as protected timestamp is not used to update key or transaction timestamps but to compare against GC thresholds. Release note: None Fixes #91922 Co-authored-by: Steven Danna <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Andy Yang <[email protected]> Co-authored-by: Oleg Afanasyev <[email protected]>
- Loading branch information