-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvnemesis: support non-transactional DelRange operations #69642
Labels
A-testing
Testing tools and infrastructure
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Comments
AlexTalks
added
the
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
label
Aug 31, 2021
Relevant snippet cockroach/pkg/kv/kvnemesis/validator.go Lines 383 to 416 in 6ee6501
|
cc @cockroachdb/replication |
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Sep 21, 2022
The original premise of `kvnemesis` is that by using unique values, we can compute for each operation a time slice at which it is valid (i.e. would have the outcome observed by it). Unfortunately, deletions aren't unique values. `kvnemesis` handles them by bending over backwards and resorting to using the actual time at which the operation was carried out, namely the transaction's commit (write) timestamp. If we see a deletion from the rangefeed, we match it up using its key and timestamp to the (a) transaction at the same timestamp. If we don't have a write timestamp (i.e. deletion not written by a txn) we ... just don't verify it as much. This is sort of hidden and is not obvious until you really dig into kvnemesis, as I had to do for cockroachdb#69642. This commit refactors the `txnID` away and replaces it with an optional timestamp. The medium-term plan here is to realize that all MVCC operations, in fact, have a timestamp at which they take effect, and that there is no reason to reverse-engineer a valid timestamp range (other than that this might be helpful for explaining the failure), which happens to have been something also envisioned by the original author[^1]. [^1]: https://github.com/cockroachdb/cockroach/blob/7cde315da539fe3d790f546a1ddde6cc882fca6b/pkg/kv/kvnemesis/validator.go#L43-L46 Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Sep 22, 2022
It took me a while to fully understand `processOp` and `checkAtomic`. This commit simplifies them. It does so in a few ways: - remove a map that only ever had one entry. - avoid the need to use fake transaction IDs, and in particular clarify that nothing is really about txn IDs, it's about collecting atomic units (which might originate in a batch or a txn) - thread the optional execution timestamp directly, as opposed to indirecting through an optional `*Transaction` proto. The last point deserves a few more words. At its core, `kvnemesis` wants to figure out valid execution timestamps by relying on unique values coming in over the rangestream. But deletion tombstones carry no value and thus aren't unique. There then needs to be some way to match up a deletion tombstone with an operation that might have written it. This requires knowledge of the timestamp at which the operation executed, and kvnemesis was, at least for `ClosureTxnOperation`s, using its knowledge of the commit timestamp for that purpose. We can actually get that timestamp for all operations, though, and we should switch `kvnemesis` to sort operations by their execution timestamp, and then verify that the observed MVCC history is congruent with that execution order[^1]. This commit doesn't quite do that but it sets the stage by abstracting away from the txn commit timestamp. This is related to cockroachdb#69642 in that this is the issue that prompted this refactor. [^1]: which happens to have been something also envisioned by the original author: https://github.com/cockroachdb/cockroach/blob/7cde315da539fe3d790f546a1ddde6cc882fca6b/pkg/kv/kvnemesis/validator.go#L43-L46 Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Sep 22, 2022
It took me a while to fully understand `processOp` and `checkAtomic`. This commit simplifies them. It does so in a few ways: - remove a map that only ever had one entry. - avoid the need to use fake transaction IDs, and in particular clarify that nothing is really about txn IDs, it's about collecting atomic units (which might originate in a batch or a txn) - thread the optional execution timestamp directly, as opposed to indirecting through an optional `*Transaction` proto. The last point deserves a few more words. At its core, `kvnemesis` wants to figure out valid execution timestamps by relying on unique values coming in over the rangestream. But deletion tombstones carry no value and thus aren't unique. There then needs to be some way to match up a deletion tombstone with an operation that might have written it. This requires knowledge of the timestamp at which the operation executed, and kvnemesis was, at least for `ClosureTxnOperation`s, using its knowledge of the commit timestamp for that purpose. We can actually get that timestamp for all operations, though, and we should switch `kvnemesis` to sort operations by their execution timestamp, and then verify that the observed MVCC history is congruent with that execution order[^1]. This commit doesn't quite do that but it sets the stage by abstracting away from the txn commit timestamp. This is related to cockroachdb#69642 in that this is the issue that prompted this refactor. [^1]: which happens to have been something also envisioned by the original author: https://github.com/cockroachdb/cockroach/blob/7cde315da539fe3d790f546a1ddde6cc882fca6b/pkg/kv/kvnemesis/validator.go#L43-L46 Release note: None
craig bot
pushed a commit
that referenced
this issue
Sep 22, 2022
87877: kvnemesis: simplify and document validation logic r=erikgrinaker a=tbg It took me a while to fully understand `processOp` and `checkAtomic`. This commit simplifies them. It does so in a few ways: - remove a map that only ever had one entry. - avoid the need to use fake transaction IDs, and in particular clarify that nothing is really about txn IDs, it's about collecting atomic units (which might originate in a batch or a txn) - thread the optional execution timestamp directly, as opposed to indirecting through an optional `*Transaction` proto. The last point deserves a few more words. At its core, `kvnemesis` wants to figure out valid execution timestamps by relying on unique values coming in over the rangestream. But deletion tombstones carry no value and thus aren't unique. There then needs to be some way to match up a deletion tombstone with an operation that might have written it. This requires knowledge of the timestamp at which the operation executed, and kvnemesis was, at least for `ClosureTxnOperation`s, using its knowledge of the commit timestamp for that purpose. We can actually get that timestamp for all operations, though, and we should switch `kvnemesis` to sort operations by their execution timestamp, and then verify that the observed MVCC history is congruent with that execution order[^1]. This commit doesn't quite do that but it sets the stage by abstracting away from the txn commit timestamp. This is related to #69642 in that this is the issue that prompted this refactor. [^1]: which happens to have been something also envisioned by the original author: https://github.com/cockroachdb/cockroach/blob/7cde315da539fe3d790f546a1ddde6cc882fca6b/pkg/kv/kvnemesis/validator.go#L43-L46 Release note: None 88308: kv/rangefeed: reduce size of event struct from 200 bytes to 72 bytes r=nvanbenschoten a=nvanbenschoten This commit restructures the event struct and reduces its size from 200 bytes to 72 bytes. This is accomplished primarily by pushing large, infrequently used struct fields into pointers. This is mostly just a drive-by cleanup found while working on #77724. Release justification: None. Don't merge yet. Release note: None. Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Sep 26, 2022
This is possible now, thanks to cockroachdb#88722. Closes cockroachdb#69642. Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Oct 6, 2022
This is essentially a v2 of kvnemesis. Fixes cockroachdb#69642. Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Oct 24, 2022
This is essentially a v2 of kvnemesis. Fixes cockroachdb#69642. Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Nov 8, 2022
https://cockroachlabs.slack.com/archives/C02FTAH8JCT/p1666704951072499 https://cockroachlabs.slack.com/archives/C02FTAH8JCT/p1666729571728869?thread_ts=1666705742.466619&cid=C02FTAH8JCT This is essentially a v2 of kvnemesis. Fixes cockroachdb#69642. Epic: none Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Nov 29, 2022
https://cockroachlabs.slack.com/archives/C02FTAH8JCT/p1666704951072499 https://cockroachlabs.slack.com/archives/C02FTAH8JCT/p1666729571728869?thread_ts=1666705742.466619&cid=C02FTAH8JCT This is essentially a v2 of kvnemesis. Fixes cockroachdb#69642. Epic: none Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Dec 2, 2022
https://cockroachlabs.slack.com/archives/C02FTAH8JCT/p1666704951072499 https://cockroachlabs.slack.com/archives/C02FTAH8JCT/p1666729571728869?thread_ts=1666705742.466619&cid=C02FTAH8JCT This is essentially a v2 of kvnemesis. Fixes cockroachdb#69642. Epic: none Release note: None
craig bot
pushed a commit
that referenced
this issue
Dec 7, 2022
89477: kvnemesis: uniquely identify all versions r=erikgrinaker a=tbg This is essentially a v2 of kvnemesis. While a lot of code has changed, it's not a rewrite, rather we are actually bringing kvnemesis closer to the idea which ultimately led to it being built. That idea is that if values can uniquely identify the operation which wrote them, serializability checking becomes easier as any observation of a value totally orders the reader and the writer with respect to each other. "Easier" meant both simpler code, as well as actually being able to computationally do the verification on complex histories. Prior to this PR, kvnemesis was writing unique values where it could, but it couldn't do it for deletions - after all, a deletion is like writing a "nothing" to MVCC, and there wasn't any way to make two "nothings" distinguishable. Having broken with the basic premise of unique values, there was a lot of bending over backwards going on to avoid, for the most part, devolving into a "normal" serializability checker. However, for contrived edge cases this would not be avoidable, and so there would be histories that kvnemesis just couldn't handle. "v2" (this PR) gets this right. The main contribution is that we now thread kvnemesis' sequence numbers all the way down into MVCC and back up with the RangeFeed. Values are now truly unique: a deletion tombstone is identifiable via its `MVCCValueHeader`, which carries the `kvnemesisutil.Seq` of the `Operation` that originally wrote it. On top of this, everything "just works" as you expect. Plumbing testing-only fields through the KV API protobufs isn't something that was taken lightly and that required a good amount of deliberation. However, we figured out a clever (maybe too clever?) way to have these fields be zero-sized in production, meaning they cannot possibly affect production logic and they also do not influence struct sizes or the wire encoding. As a drawback, kvnemesis requires the `crdb_test` build tag (it will `t.Skip()` otherwise). The remainder of this PR is mostly improvements in code quality. `echodriven` testing was introduced for many of the tests that could benefit from it. The core components of kvnemesis were reworked for clarity (the author spent the first week very confused and wishes for that experience to remain unrepeated by anyone). kvnemesis also tracks the execution timestamps as reported by the KV layer, which a future change could [use](#92898) for additional validation. Of note is the updated doc comment, which is reproduced below in entirety. Fixes #90955. Fixes #88988. Fixes #76435. Fixes #69642. ---- Package kvnemesis exercises the KV API with random concurrent traffic (as well as splits, merges, etc) and then validates that the observed behaviors are serializable. It does so in polynomial time based on the techniques used by [Elle] (see in particular section 4.2.3), using the after-the-fact MVCC history as a record of truth. It ensures that all write operations embed a unique identifier that is stored in MVCC history, and can thus identify which of its operations' mutations are reflected in the database ("recoverability" in Elle parlance). A run of kvnemesis proceeds as follows. First, a Generator is instantiated. It can create, upon request, a sequence in which each element is a random Operation. Operations that are mutations (i.e. not pure reads) are assigned a unique kvnemesisutil.Seq which will be stored alongside the MVCC data written by the Operation. A pool of worker threads concurrently generates Operations and executes them against a CockroachDB cluster. Some of these Operations may succeed, some may fail, and for some of them an ambiguous result may be encountered. Alongside this random traffic, kvnemesis maintains a RangeFeed that ingests the MVCC history. This creates a "carbon copy" of the MVCC history. All of these workers wind down once they have in aggregate completed a configured number of steps. Next, kvnemesis validates that the Operations that were executed and the results they saw match the MVCC history, i.e. checks for Serializability. In general, this is an NP-hard problem[^1], but the use of unique sequence numbers (kvnemesisutil.Seq) makes it tractable, as each mutation in the MVCC keyspace uniquely identifies the Operation that must have written the value. We make use of this property as follows. First, the Validate method iterates through all Operations performed and, for each, inspects - the type of the Operation (i.e. Put, Delete, Get, ...), - the (point or range) key of the operation, and - its results (i.e. whether there was an error or which key-value pairs were returned). Each atomic unit (i.e. individual non-transactional operation, or batch of non-transactional operations, or transaction) results in a slice (in order in which the Operations within executed[^2]) of observations, where each element is either an observed read or an observed write. For example, a Batch that first writes `v1` to `k1`, then scans `[k1-k3)` (reading its own write as well as some other key k2 with value v2) and then deletes `k3` would generate the following slice: [ observedWrite(k1->v1), observedScan(k1->v1, k2->v2), observedWrite(k3->v3), ] Each such slice (i.e. atomic unit) will then be compared to the MVCC history. For all units that succeeded, their writes must match up with versions in the MVCC history, and this matching is trivial thanks to unique values (including for deletions, since we embed the kvnemesisutil.Seq in the value), and in particular a single write will entirely fix the MVCC timestamp at which the atomic unit must have executed. For each read (i.e. get or scan), we compute at which time intervals each read would have been valid. For example, if the MVCC history for a key `k1` is as follows: k1 ----------------- t5 v2 t4 t3 t2 <del> t1 v1 then - observedGet(k1->v1) is valid for [t1,t2), - observedGet(k1->nil) is valid for [0,t1) and [t2,t5), and - observedGet(k1->v2) is valid for [t5,inf). By intersecting the time intervals for each Operation in an atomic unit, we then get the MVCC timestamps at which this Operation would have observed what it ended up observing. If this intersection is empty, serializability must have been violated. In the error case, kvnemesis verifies that no part of the Operation became visible. For ambiguous results, kvnemesis requires that either no Operation become visible, or otherwise treats the atomic unit as committed. The KV API also has the capability to return the actual execution timestamp directly with responses. At the time of writing, kvnemesis does verify that it does do this reliably, but it does not verify that the execution timestamp is contained in the intersection of time intervals obtained from inspecting MVCC history[^3]. [Elle]: https://arxiv.org/pdf/2003.10554.pdf [^1]: https://dl.acm.org/doi/10.1145/322154.322158 [^2]: there is currently concurrency within the atomic unit in kvnemesis. It could in theory carry out multiple reads concurrently while not also writing, such as DistSQL does, but this is not implemented today. See: #64825 [^3]: tracked in #92898. Epic: none Co-authored-by: Tobias Grieger <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-testing
Testing tools and infrastructure
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Currently
DelRange
operations are only generated inside of transactions in kvnemesis. We should also add support for these operations outside of transactions, if we can determine a way to validate the delete on each key atomically without relying on the timestamp of the transaction to identify the stored delete tombstones that were written by the operation.Jira issue: CRDB-9701
The text was updated successfully, but these errors were encountered: