Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/engine: introduce a datadriven framework to run MVCC tests #42250

Merged
merged 1 commit into from
Dec 5, 2019

Conversation

knz
Copy link
Contributor

@knz knz commented Nov 6, 2019

Previously MVCC tests were hand-coded in Go.
This was making it hard(er) to introduce new tests or modify existing
tests.

This commit improves upon this situation by introducing a new
datadriven test, TestMVCCHistories, which runs MVCC
tests written using a DSL:

begin_txn      t=<name> [ts=<int>[,<int>]]
remove_txn     t=<name>
resolve_intent t=<name> k=<key> [status=<txnstatus>]
restart_txn    t=<name>
update_txn     t=<name> t2=<name>
step_txn       t=<name> [n=<int>]
advance_txn    t=<name> ts=<int>[,<int>]
txn_status     t=<name> status=<txnstatus>

check_intent   k=<key> [none]

put       [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> v=<string> [raw]
cput      [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> v=<string> [raw] [cond=<string>]
increment [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> [inc=<val>]
del       [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key>
get       [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> [inconsistent] [tombstones]
scan      [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> [end=<key>] [inconsistent] [tombstones] [reverse]

merge     [ts=<int>[,<int>]] k=<key> v=<string> [raw]

clear_range    k=<key> end=<key>

Where <key> can be a simple string, or a string
prefixed by the following characters:

  • =foo means exactly key foo
  • +foo means Key(foo).Next()
  • -foo means Key(foo).PrefixEnd()

Additionally, the pseudo-command with enables sharing
a group of arguments between multiple commands, for example:

with t=A
   begin_txn
   with k=a
     put v=b
     resolve_intent

Release note: None

@knz knz requested review from tbg and nvanbenschoten November 6, 2019 21:23
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz force-pushed the 20191106-mvcc-histories branch 2 times, most recently from b837257 to a69d2ef Compare November 6, 2019 21:28
@petermattis petermattis requested a review from itsbilal November 7, 2019 14:54
@knz knz force-pushed the 20191106-mvcc-histories branch from a69d2ef to 0264f46 Compare November 7, 2019 15:34
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much more fleshed out that I was expecting. Thanks for getting the ball rolling!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, @nvanbenschoten, and @tbg)


pkg/storage/engine/testdata/mvcc_histories/clear_range, line 12 at r1 (raw file):

  mvcc_put  k=b/123 v=abc
  mvcc_put  k=c v=abc
  commit_txn

The functionality here is nifty. I'm going to propose an alternate syntax: we can make this look Go-like. The advantage of doing so is that syntax will be more familiar to the casual reader. Something like:

txnA = NewTxn()
txnA.MVCCPut(k="a", v="abc")
txnA.MVCCPut(...)
txnA.Commit()

I did this in the Pebble metamorphic test. See https://github.com/cockroachdb/pebble/blob/master/internal/metamorphic/testdata/parser#L36-L39 and https://github.com/cockroachdb/pebble/blob/master/internal/metamorphic/parser.go. The parser uses Go's scanner, and then adds a simple bit of parsing on top.

We've been thinking about adding an MVCC metamorphic test which I was imagining would have a similar structure to the Pebble one. Seems feasible to share the syntax between such an MVCC metamorphic test and MVCC datadriven test framework. Thoughts?

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, @petermattis, and @tbg)


pkg/storage/engine/testdata/mvcc_histories/clear_range, line 12 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The functionality here is nifty. I'm going to propose an alternate syntax: we can make this look Go-like. The advantage of doing so is that syntax will be more familiar to the casual reader. Something like:

txnA = NewTxn()
txnA.MVCCPut(k="a", v="abc")
txnA.MVCCPut(...)
txnA.Commit()

I did this in the Pebble metamorphic test. See https://github.com/cockroachdb/pebble/blob/master/internal/metamorphic/testdata/parser#L36-L39 and https://github.com/cockroachdb/pebble/blob/master/internal/metamorphic/parser.go. The parser uses Go's scanner, and then adds a simple bit of parsing on top.

We've been thinking about adding an MVCC metamorphic test which I was imagining would have a similar structure to the Pebble one. Seems feasible to share the syntax between such an MVCC metamorphic test and MVCC datadriven test framework. Thoughts?

Thank you to pointing me to that code. It's interesting. Funnily I even briefly considered something looking like Go myself initially.
But I think you over-engineered that. I rejected the Go idea after thinking hard about it (and not knowing about your code), and I believe I have found the clearly superior design.

Consider:

Aspect Simple DSL defined here Pebble's metamorphic DSL
Familiarity to the CRL eng reader ✓ (unix shell) ✓ (Go)
Reuses existing parsing code ✓ (datadriven) ✓ (Go)
Supports arguments in arbitrary order, makes it easier to edit
Supports argument reuse across commands, makes it easier to type and read
Single case, makes it easier to type and read
No punctuation, makes it easier to type and read
No unnecessary mandatory quoting, makes it easier to type and read

Moreover, your solution has created a false sense of simplicity. Because the basic operators in your language have a 1-to-1 mapping to Go function/method calls, you've placed the burden on the test implementer (and also the reader) to ensure the call protocol is correct. The test spec has to allocate (New) and Close explicitly, even if the test runner could be able to do that automatically from context. All the necessary arguments must be present, even if the test runner could be able to derive them automatically.
So I'd argue I have succeeded in eliminating more boilerplate in my test language than you did.

That being said, I'd gladly volunteer looking at your Pebble test code and apply the best practices I've applied here to there as well (i.e. instead of converting my code to use your ideas, which I think would be a mistake, I'd be OK converting your code to use my ideas, which I think would be an improvement).

@knz knz force-pushed the 20191106-mvcc-histories branch from 0264f46 to ba07d22 Compare November 8, 2019 09:55
@knz knz force-pushed the 20191106-mvcc-histories branch 2 times, most recently from 7407590 to 6e909df Compare November 8, 2019 12:01
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajkr, @itsbilal, @nvanbenschoten, @sumeerbhola, and @tbg)


pkg/storage/engine/testdata/mvcc_histories/clear_range, line 12 at r1 (raw file):

Previously, knz (kena) wrote…

Thank you to pointing me to that code. It's interesting. Funnily I even briefly considered something looking like Go myself initially.
But I think you over-engineered that. I rejected the Go idea after thinking hard about it (and not knowing about your code), and I believe I have found the clearly superior design.

Consider:

Aspect Simple DSL defined here Pebble's metamorphic DSL
Familiarity to the CRL eng reader ✓ (unix shell) ✓ (Go)
Reuses existing parsing code ✓ (datadriven) ✓ (Go)
Supports arguments in arbitrary order, makes it easier to edit
Supports argument reuse across commands, makes it easier to type and read
Single case, makes it easier to type and read
No punctuation, makes it easier to type and read
No unnecessary mandatory quoting, makes it easier to type and read

Moreover, your solution has created a false sense of simplicity. Because the basic operators in your language have a 1-to-1 mapping to Go function/method calls, you've placed the burden on the test implementer (and also the reader) to ensure the call protocol is correct. The test spec has to allocate (New) and Close explicitly, even if the test runner could be able to do that automatically from context. All the necessary arguments must be present, even if the test runner could be able to derive them automatically.
So I'd argue I have succeeded in eliminating more boilerplate in my test language than you did.

That being said, I'd gladly volunteer looking at your Pebble test code and apply the best practices I've applied here to there as well (i.e. instead of converting my code to use your ideas, which I think would be a mistake, I'd be OK converting your code to use my ideas, which I think would be an improvement).

Heh, the unix shell-like syntax is exactly what initially caught my eye here. I would respectfully disagree about your familiarity argument for that syntax.

I'd like to take a step away from arguing that either solution is perfect. Are there useful ideas we can take from both? I very much like the use of Go lexical constructs as it provides built-in support for comments, quoted strings, and other literals. I think you have a good point about keyword arguments which is even more powerful in the context of MVCC operations where the API is filled with methods containing numerous parameters. The with construct is also interesting in reducing boilerplate, though that runs against the explicitness philosophy that frequently runs through Go. Of course, we don't need to adopt that philosophy for the syntax here.

If I were to adopt this test case to Go-like syntax but with an extension for keyword arguments it would look like:

a = BeginTxn(ts=44)
a.MVCCPut(k:"a", v:"abc")
a.MVCCPut(k:"a/123", v:"abc")
a.MVCCPut(k:"b", v:"abc")
a.MVCCPut(k:"b/123", v:"abc")
a.MVCCPut(k:"c", v:"abc")
a.Commit()

I'm not sure how I'd make the with construct fit into this scheme. I'm not fond of some of the magic there (e,g. I'd always want it to be clear what txn is being operated on). Perhaps something like:

a = BeginTxn(ts=44)
args = {v:"abc"}
a.MVCCPut(k:"a", args...)
...

That doesn't help much if there is only one argument, but it could if there are many common ones.

Note that I'm not wedded to the syntax used by the Pebble metamorphic test, but I think you're overstating your claims of superiority for the syntax here. Also, the two of us are likely too invested in our proposals. Some of the other #storage folks should weigh in on this topic as they'll be supporting and extending these tests in the future. Cc @ajkr, @itsbilal, @sumeerbhola.

Lastly, I want to reiterate my appreciation for you getting the ball rolling here. I'm quibbling about surface details, but find the overall direction of this PR to be super exciting.

@knz
Copy link
Contributor Author

knz commented Nov 8, 2019

I'd still like to point out from the specific example snippet:

a = BeginTxn(ts=44)
a.MVCCPut(k:"a", v:"abc")
a.MVCCPut(k:"a/123", v:"abc")
a.MVCCPut(k:"b", v:"abc")
a.MVCCPut(k:"b/123", v:"abc")
a.MVCCPut(k:"c", v:"abc")
a.Commit()

This snippet contains 169 characters.

  • there are 67 non-whitespace punctuation characters, that's about 40% of visual noise
  • a. is repeated 6 times, none of which is strictly necessary
  • v:"abc" is repeated 5 times, just one of which is strictly necessary
  • with would thus save you 57 more characters, so in reality you actually have 34% more visual noise.

74% of visual noise that's a bit much.

@knz knz force-pushed the 20191106-mvcc-histories branch 2 times, most recently from 652e133 to 96942da Compare November 8, 2019 16:48
@petermattis
Copy link
Collaborator

This snippet contains 169 characters.

Is optimizing for the number of characters a goal? It wasn't for me. Your arguments here are also knocks on the Go syntax. On the one hand, those arguments are fine and reasonable, but the Go syntax is also the one CRL engineers spend the most time interacting with. Surely that day-to-day familiarity counts for something.

Is there a way we can get this discussion out of bikeshed territory? I'm going to reiterate my desire to hear what others think of the two alternatives or if there are ideas for something even better. I don't want to over-engineer this syntax, but I also don't want to quickly adopt something that we live with for years.

@knz knz force-pushed the 20191106-mvcc-histories branch from 96942da to 5f997d5 Compare November 8, 2019 18:37
@knz
Copy link
Contributor Author

knz commented Nov 8, 2019

I agree let's let the discussion of specifics sleep for a while.

In the meantime I extended the PR with the tests for intent histories, as needed by savepoints. I'll rebase the MVCC seqnum ignore list PR on top of this now.

@nvanbenschoten
Copy link
Member

@knz one of my bigger confusions with this is why it contains transaction state management. In other words, why is this testing begin_txn, commit_txn, etc? None of that lives at the MVCC level, and that shows because the handlers for these operations are completely mocked out.

Is the idea there just to register transactions with the data-driven framework? If so, I would stay as far away from the names used by the corresponding Requests as possible.

This all does point towards an opportunity to introduce datadriven tests into storage/batcheval as well.

@knz knz force-pushed the 20191106-mvcc-histories branch from 67fcad2 to 1690cc6 Compare November 11, 2019 19:12
@knz
Copy link
Contributor Author

knz commented Nov 11, 2019

why is this testing begin_txn, commit_txn, etc?

begin_txn is needed because the MVCC operations take a roachpb.Transaction as argument.

After checking I agree that we can dump abort_txn/commit_txn and have the mvcc tests resolve their intents explicitly.

@knz knz force-pushed the 20191106-mvcc-histories branch from 1690cc6 to 8da7a1f Compare November 12, 2019 11:20
@knz
Copy link
Contributor Author

knz commented Nov 12, 2019

Simplified the code as suggested.

@knz knz requested review from a team as code owners November 25, 2019 20:26
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have responded to the review comments, see below. Will welcome your reactions and willing to change the code further based on them.

I also tentatively rebased this on top of #42736. Depending on the outcome of cockroachdb/datadriven#8 I might cancel this rebase.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @itsbilal, @nvanbenschoten, @petermattis, @sumeerbhola, and @tbg)


pkg/storage/engine/mvcc_history_test.go, line 193 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

May wanna change the example now that commit_txn does not exist.

Done.


pkg/storage/engine/mvcc_history_test.go, line 48 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Does this belong in datadriven.Walk?

Done.


pkg/storage/engine/mvcc_history_test.go, line 72 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The meta: and data: prefixes feel redundant to me as the key itself also contains this info.

I agree it's redundant with the RHS, but it's also helpful to disambiguate this trace output from the individual commands' output, for example the get: ... prefix.


pkg/storage/engine/mvcc_history_test.go, line 148 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The txn is only updated in response to the *_txn commands. Seems a little odd to print out its value as that is deterministic on the *_txn command implementations which are not MVCC code. Am I missing something here? (Very possible, this is a large change).

I introduced this dump to emphasize the current state of the txn object at each step of a script. This was in reaction to a troubleshooting session where I had mistakenly started with a txn in the wrong state, but I hadn't realized. My MVCC operation wasn't behaving like I wanted, and I wasn't sure where the problem came from. By adding the txn's currebt state to the trace, the reader can verify their assumptions about the MVCC op's input at each step.


pkg/storage/engine/mvcc_history_test.go, line 233 at r3 (raw file):

Was tracing too verbose to always leave on?

I found that it was, but that was subjective. I don' t have much arguments either way on this point.

have each operation output its results, which may be an error. This unifies the ok and error cases and allows progress to be made after an error occurs.

I think for expected errors this is reasonable. However from experience I have found our team too eager to accept a change in reference output when using TESTFLAGS=-rewrite. I think it is good hygiene to strongly distinguish when a test expects an error to occur (and have that error's in the reference output), from when no error is expected (in which case the error's occurrence will prevent datadriven from proceeding even when -rewrite is given).


pkg/storage/engine/mvcc_history_test.go, line 248 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Rather than calling reportResults here, perhaps there should be an explicit dump_data command. Btw, these suggestions are just that. This PR is so large that it is hard to know the repercussions of all of the suggestions.

I have considered this. Again the choice to dump the current state at the end of every directive was the result of a learning experience, where I found in many cases that a previous test was incorrectly written and was leaving the storage in a bad initial state for the next state. Without this automatic state dump, it's possible to miss that mistake and spend (too much) time focusing on the wrong place.


pkg/storage/engine/mvcc_history_test.go, line 368 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

This seems to get status from the command invocation itself, not from the transaction passed in. This has the unintended side effect of calling MVCCResolveWriteIntent with Status: COMMITTED by default, even if the transaction itself is PENDING. Why not use status = txn.Status here?

Because the most common use case is to resolve intents (and remove them) as if the txn had committed. Mandating what you say would force the trace to use one more command in the common case.


pkg/storage/engine/mvcc_history_test.go, line 424 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

It would be nice if we could get the MVCCKey timestamp at which this read is happening, but the way the MVCCGet API works, that's not too straightforward.

I think it works? the value's timestamp is available in val. Done, PTAL?

@knz knz force-pushed the 20191106-mvcc-histories branch 2 times, most recently from 4310860 to 28eeee7 Compare November 26, 2019 15:24
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @itsbilal, @knz, @nvanbenschoten, @sumeerbhola, and @tbg)


pkg/storage/engine/mvcc_history_test.go, line 233 at r3 (raw file):

Previously, knz (kena) wrote…

Was tracing too verbose to always leave on?

I found that it was, but that was subjective. I don' t have much arguments either way on this point.

have each operation output its results, which may be an error. This unifies the ok and error cases and allows progress to be made after an error occurs.

I think for expected errors this is reasonable. However from experience I have found our team too eager to accept a change in reference output when using TESTFLAGS=-rewrite. I think it is good hygiene to strongly distinguish when a test expects an error to occur (and have that error's in the reference output), from when no error is expected (in which case the error's occurrence will prevent datadriven from proceeding even when -rewrite is given).

Ack. I don't feel strongly about this. I haven't noticed a problem with integrating the ok and error cases in Pebble, but that might be accidental. The more likely occurrence in my experience is that a test case generates wrong data that we encode into the expectation and nobody notices. Nothing much can be done about that with this style of testing.


pkg/storage/engine/mvcc_history_test.go, line 248 at r3 (raw file):

Previously, knz (kena) wrote…

I have considered this. Again the choice to dump the current state at the end of every directive was the result of a learning experience, where I found in many cases that a previous test was incorrectly written and was leaving the storage in a bad initial state for the next state. Without this automatic state dump, it's possible to miss that mistake and spend (too much) time focusing on the wrong place.

Ack.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me modulo a few minor comments.

Big remaining question: how did you convert the tests? How should I review the new test cases to make sure that you accurately converted them? Or perhaps @itsbilal already did this.

I'm looking forward to seeing this merged.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @itsbilal, @knz, @nvanbenschoten, @sumeerbhola, and @tbg)


pkg/storage/engine/mvcc_history_test.go, line 34 at r4 (raw file):

// TestMVCCHistories verifies that sequences of MVCC reads and writes
// perform properly.

I think you should copy the commit message explaining the format of the test cases here. That message is super helpful, but commit messages are easy to miss after code is checked in.


pkg/storage/engine/mvcc_history_test.go, line 319 at r4 (raw file):

// commands is the list of all supported script commands.
var commands = map[string]cmd{
	"begin_txn":      {true, false, cmdBegin},

Nit: the two booleans here are a bit hard to read. There seem to be 3 states: txn,!data, !txn,data and !txn,!data. I wonder if the two booleans should be collapsed to a single enum:

type cmdType int

const (
  cmdTypeNone cmdType = iota
  cmdTypeTxn
  cmdTypeData
)

On the other hand, this is the only place the booleans are used, so this isn't terribly important.


pkg/storage/engine/mvcc_history_test.go, line 326 at r4 (raw file):

	"update_txn":     {true, false, cmdUpdateTxn},
	"restart_txn":    {true, false, cmdRestartTxn},
	"step_txn":       {true, false, cmdStepTxn},

All of the *txn* commands are special in the they aren't really MVCC operations. I wonder if they should be grouped by prefixing them with txn_. So instead of step_txn we'd have txn_step.


pkg/storage/engine/mvcc_history_test.go, line 335 at r4 (raw file):

	"scan":           {false, false, cmdScan},
	"increment":      {false, true, cmdIncrement},
	"check_intent":   {false, false, cmdCheckIntent},

Nit: I'd sort these members, and the corresponding functions below, to make it clear where additions should be added. That might just be my OCD speaking, though.


pkg/storage/engine/mvcc_history_test.go, line 481 at r4 (raw file):

	vals, _, intents, err := MVCCScan(e.ctx, e.engine, key, endKey, max, ts, opts)
	if err != nil {
		return err

Here and elsewhere there is an error, I think you should fmt.Fprintf the error, and fall through and allow it to be returned below. This will allow the tests to verify that when an error is returned from MVCCScan, nothing else is.


pkg/storage/engine/mvcc_history_test.go, line 529 at r4 (raw file):

	resolve, resolveStatus := e.getResolve()

	return e.withWriter("put", func(engine ReadWriter) error {

Shouldn't this be "increment" or "inc"?


pkg/storage/engine/testdata/mvcc_histories/clear_range, line 12 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Heh, the unix shell-like syntax is exactly what initially caught my eye here. I would respectfully disagree about your familiarity argument for that syntax.

I'd like to take a step away from arguing that either solution is perfect. Are there useful ideas we can take from both? I very much like the use of Go lexical constructs as it provides built-in support for comments, quoted strings, and other literals. I think you have a good point about keyword arguments which is even more powerful in the context of MVCC operations where the API is filled with methods containing numerous parameters. The with construct is also interesting in reducing boilerplate, though that runs against the explicitness philosophy that frequently runs through Go. Of course, we don't need to adopt that philosophy for the syntax here.

If I were to adopt this test case to Go-like syntax but with an extension for keyword arguments it would look like:

a = BeginTxn(ts=44)
a.MVCCPut(k:"a", v:"abc")
a.MVCCPut(k:"a/123", v:"abc")
a.MVCCPut(k:"b", v:"abc")
a.MVCCPut(k:"b/123", v:"abc")
a.MVCCPut(k:"c", v:"abc")
a.Commit()

I'm not sure how I'd make the with construct fit into this scheme. I'm not fond of some of the magic there (e,g. I'd always want it to be clear what txn is being operated on). Perhaps something like:

a = BeginTxn(ts=44)
args = {v:"abc"}
a.MVCCPut(k:"a", args...)
...

That doesn't help much if there is only one argument, but it could if there are many common ones.

Note that I'm not wedded to the syntax used by the Pebble metamorphic test, but I think you're overstating your claims of superiority for the syntax here. Also, the two of us are likely too invested in our proposals. Some of the other #storage folks should weigh in on this topic as they'll be supporting and extending these tests in the future. Cc @ajkr, @itsbilal, @sumeerbhola.

Lastly, I want to reiterate my appreciation for you getting the ball rolling here. I'm quibbling about surface details, but find the overall direction of this PR to be super exciting.

Apparently nobody else feels strongly about the syntax, so I'm going to give the thumbs up with proceeding as is.

@andy-kimball
Copy link
Contributor

I love the datadriven approach to MVCC testing. Thoroughly testing at multiple component layers is a critical part of improving the stability and correctness of our product (the "onion model" of testing). And the datadriven approach encourages writing lots of tests, since it has nice syntax, and since you can easily keep results up-to-date by using --rewrite.

@knz knz force-pushed the 20191106-mvcc-histories branch from 79071de to ec0a79a Compare December 5, 2019 14:35
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you convert the tests?

I created one test file per converted test. The name of the test file is the same name as the original Go test, snake-cased. The order of operations in the test file is the same as the original go code.

How should I review the new test cases to make sure that you accurately converted them?

Put them side-by-side I suppose. If that makes your life easier, I can split the PR to have one commit per converted test.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @itsbilal, @nvanbenschoten, @petermattis, @sumeerbhola, and @tbg)


pkg/storage/engine/mvcc_history_test.go, line 34 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think you should copy the commit message explaining the format of the test cases here. That message is super helpful, but commit messages are easy to miss after code is checked in.

Done.


pkg/storage/engine/mvcc_history_test.go, line 319 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nit: the two booleans here are a bit hard to read. There seem to be 3 states: txn,!data, !txn,data and !txn,!data. I wonder if the two booleans should be collapsed to a single enum:

type cmdType int

const (
  cmdTypeNone cmdType = iota
  cmdTypeTxn
  cmdTypeData
)

On the other hand, this is the only place the booleans are used, so this isn't terribly important.

Done.


pkg/storage/engine/mvcc_history_test.go, line 326 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

All of the *txn* commands are special in the they aren't really MVCC operations. I wonder if they should be grouped by prefixing them with txn_. So instead of step_txn we'd have txn_step.

Good idea. Done.


pkg/storage/engine/mvcc_history_test.go, line 335 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nit: I'd sort these members, and the corresponding functions below, to make it clear where additions should be added. That might just be my OCD speaking, though.

Good idea. Done.


pkg/storage/engine/mvcc_history_test.go, line 481 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Here and elsewhere there is an error, I think you should fmt.Fprintf the error, and fall through and allow it to be returned below. This will allow the tests to verify that when an error is returned from MVCCScan, nothing else is.

I agree with returning the error below for the reason you explain. But there is no need to print it here - the main runner already prints the error when an error is expected by the test.


pkg/storage/engine/mvcc_history_test.go, line 529 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Shouldn't this be "increment" or "inc"?

d'oh. Yes. Fixed.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Put them side-by-side I suppose. If that makes your life easier, I can split the PR to have one commit per converted test.

Splitting the PR up isn't necessary. I think this PR should be merged so we can make progress on reviewing @itsbilal's PR. Concurrently, I'll check the tests which I expect to take a day or two. The expectation is that if I find anything we can fix that up in a follow-on PR. Sounds good?

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @itsbilal, @nvanbenschoten, @sumeerbhola, and @tbg)

Copy link
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did review tests a while back (not side-by-side but on the basis of what they seemed to be doing). Side-by-side with Go would get us more confidence in being able to remove the existing Go-written tests. Let me know if you're doing this exercise @petermattis, I will probably be doing it either way as a follow up.

Reviewed 24 of 26 files at r2, 1 of 1 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @itsbilal, @nvanbenschoten, @sumeerbhola, and @tbg)

@knz
Copy link
Contributor Author

knz commented Dec 5, 2019

I'll rebase after #42987 merges otherwise CI will never succeed.

@knz knz force-pushed the 20191106-mvcc-histories branch from ec0a79a to f7918fd Compare December 5, 2019 15:54
Previously MVCC tests were hand-coded in Go.
This was making it hard(er) to introduce new tests or modify existing
tests.

This commit improves upon this situation by introducing a new
datadriven test, `TestMVCCHistories`, which runs MVCC
tests written using a DSL:

```
begin_txn      t=<name> [ts=<int>[,<int>]]
remove_txn     t=<name>
resolve_intent t=<name> k=<key> [status=<txnstatus>]
restart_txn    t=<name>
update_txn     t=<name> t2=<name>
step_txn       t=<name> [n=<int>]
advance_txn    t=<name> ts=<int>[,<int>]
txn_status     t=<name> status=<txnstatus>

check_intent   k=<key> [none]

put       [t=<name>] [ts=<int>[,<int>]] [resolve [status=<txnstatus>]] k=<key> v=<string> [raw]
cput      [t=<name>] [ts=<int>[,<int>]] [resolve [status=<txnstatus>]] k=<key> v=<string> [raw] [cond=<string>]
increment [t=<name>] [ts=<int>[,<int>]] [resolve [status=<txnstatus>]] k=<key> [inc=<val>]
del       [t=<name>] [ts=<int>[,<int>]] [resolve [status=<txnstatus>]] k=<key>
get       [t=<name>] [ts=<int>[,<int>]] [resolve [status=<txnstatus>]] k=<key> [inconsistent] [tombstones]
scan      [t=<name>] [ts=<int>[,<int>]] [resolve [status=<txnstatus>]] k=<key> [end=<key>] [inconsistent] [tombstones] [reverse]

merge     [ts=<int>[,<int>]] k=<key> v=<string> [raw]

clear_range    k=<key> end=<key>
```

Where `<key>` can be a simple string, or a string
prefixed by the following characters:

- `=foo` means exactly key `foo`
- `+foo` means `Key(foo).Next()`
- `-foo` means `Key(foo).PrefixEnd()`

Additionally, the pseudo-command `with` enables sharing
a group of arguments between multiple commands, for example:

```
with t=A
   begin_txn
   with k=a
     put v=b
     resolve_intent
```

Release note: None
@knz knz force-pushed the 20191106-mvcc-histories branch from f7918fd to 09b8355 Compare December 5, 2019 22:05
@knz
Copy link
Contributor Author

knz commented Dec 5, 2019

bors r=itsbilal,petermattis

craig bot pushed a commit that referenced this pull request Dec 5, 2019
42250: storage/engine: introduce a datadriven framework to run MVCC tests r=itsbilal,petermattis a=knz

Previously MVCC tests were hand-coded in Go.
This was making it hard(er) to introduce new tests or modify existing
tests.

This commit improves upon this situation by introducing a new
datadriven test, `TestMVCCHistories`, which runs MVCC
tests written using a DSL:

```
begin_txn      t=<name> [ts=<int>[,<int>]]
remove_txn     t=<name>
resolve_intent t=<name> k=<key> [status=<txnstatus>]
restart_txn    t=<name>
update_txn     t=<name> t2=<name>
step_txn       t=<name> [n=<int>]
advance_txn    t=<name> ts=<int>[,<int>]
txn_status     t=<name> status=<txnstatus>

check_intent   k=<key> [none]

put       [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> v=<string> [raw]
cput      [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> v=<string> [raw] [cond=<string>]
increment [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> [inc=<val>]
del       [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key>
get       [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> [inconsistent] [tombstones]
scan      [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> [end=<key>] [inconsistent] [tombstones] [reverse]

merge     [ts=<int>[,<int>]] k=<key> v=<string> [raw]

clear_range    k=<key> end=<key>
```

Where `<key>` can be a simple string, or a string
prefixed by the following characters:

- `=foo` means exactly key `foo`
- `+foo` means `Key(foo).Next()`
- `-foo` means `Key(foo).PrefixEnd()`

Additionally, the pseudo-command `with` enables sharing
a group of arguments between multiple commands, for example:

```
with t=A
   begin_txn
   with k=a
     put v=b
     resolve_intent
```

Release note: None

42999: colexec: cfetcher errors on setup when indexed column contains unhandled type r=yuzefovich a=rohany

Fixes #42994.

Indexed columns are always decoded by the fetcher, even if they are unneeded.
This PR adds the check to cfetcher initialization to error out if an indexed
column is of an unneeded type.

Release note (bug fix): Fixed a bug where scanning an index of an unsupported
type with the vectorized engine would lead to an internal error.

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Rohan Yadav <[email protected]>
@craig
Copy link
Contributor

craig bot commented Dec 5, 2019

Build succeeded

@craig craig bot merged commit 09b8355 into cockroachdb:master Dec 5, 2019
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few random comments. Still working my way through the tests.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajkr, @itsbilal, @knz, @nvanbenschoten, @sumeerbhola, and @tbg)


pkg/storage/engine/testdata/mvcc_histories/get_negative_timestamp, line 10 at r6 (raw file):

get k=k ts=-1
----
error: (*withstack.withStack:) cannot write to "k" at timestamp 0.-00000001,0

What's with the oddly formatted timestamp? 0.-0...?


pkg/storage/engine/testdata/mvcc_histories/put_with_txn, line 5 at r6 (raw file):

  txn_begin ts=0,1
  put  v=v
  get

Shouldn't this have a ts=0,1? Or am I misreading TestMVCCPutWithTxn? Or is that timestamp implicit if no timestamp is specified? I think it would be clearer to be explicit here.


pkg/storage/engine/testdata/mvcc_histories/put_with_txn, line 13 at r6 (raw file):

get: "k" -> /BYTES/v @0.000000000,1
>> at end:
txn: "A" meta={id=00000000 key="k" pri=0.00000000 epo=0 ts=0.000000000,1 min=0.000000000,0 seq=0} rw=true stat=PENDING rts=0.000000000,1 wto=false max=0.000000000,0

There are a lot of 0.00000000 values in these tests. It would massively reduce visual clutter if they could be abbreviated somehow.

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajkr, @itsbilal, @nvanbenschoten, @petermattis, @sumeerbhola, and @tbg)


pkg/storage/engine/testdata/mvcc_histories/get_negative_timestamp, line 10 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

What's with the oddly formatted timestamp? 0.-0...?

This is the standard ts printer. I agree it's suspicious but not specific to this PR. Filed #43183 to track.


pkg/storage/engine/testdata/mvcc_histories/put_with_txn, line 5 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Shouldn't this have a ts=0,1? Or am I misreading TestMVCCPutWithTxn? Or is that timestamp implicit if no timestamp is specified? I think it would be clearer to be explicit here.

I left the ts=0,1 out for the first one because it happens to also be the value of ReadTimestamp for the txn (txn1 in the original code). However I agree it's clearer to list it explicitly. I'll send a PR.


pkg/storage/engine/testdata/mvcc_histories/put_with_txn, line 13 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

There are a lot of 0.00000000 values in these tests. It would massively reduce visual clutter if they could be abbreviated somehow.

These are also the standard printers. I agree these could be improved. Also mentioned in #43183.

craig bot pushed a commit that referenced this pull request Dec 16, 2019
43184: storage/engine: clarify the MVCC history tests r=knz a=knz

Suggested by @petermattis in #42250 (review).
This clarifies the `get` timestamps in `put_with_txn`.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
craig bot pushed a commit that referenced this pull request Dec 20, 2019
43185: hlc: improve/fix the representation of timestamps r=knz a=knz

Requested by @petermattis in #42250 (review)
Informs #43183.

Previously, a negative timestamp would be printed with a negative sign
in the middle, for example `0.-000000123` or `-123.-00000456`.
This patch fixes this by only emitting the sign once.

Additionally, this patch simplifies zero timestamps to just `"0"`.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants