Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: create new TestSnapshotAfterTruncationWithUncommittedTail test #37058

Merged
merged 2 commits into from
Apr 24, 2019

Conversation

nvanbenschoten
Copy link
Member

This PR adds a regression test for #37056. In doing so, it also confirms
the theory that #37055 is the proper fix for that bug.

Before #37055, the test would get stuck with the following logs repeatedly
firing:

I190424 00:15:52.338808 12 storage/client_test.go:1242  SucceedsSoon: expected [21 21 21], got [12 21 21]
I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317  [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2
I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065  [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31
I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068  [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]

After #37055, the test passes.

@nvanbenschoten nvanbenschoten requested review from bdarnell, ajwerner and a team April 24, 2019 01:00
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)


pkg/storage/client_merge_test.go, line 2994 at r2 (raw file):

	if len(req.Heartbeats)+len(req.HeartbeatResps) > 0 {
		reqCpy := *req
		*req = reqCpy

Why make a shallow copy in both directions? This doesn't look like it does anything. Should the second line be req = &reqCpy?


pkg/storage/client_raft_test.go, line 867 at r2 (raw file):

	ctx := context.Background()
	mtc := &multiTestContext{
		// This test was written before the multiTestContext started creating many

Not quite true in this case, but OK.


pkg/storage/client_raft_test.go, line 941 at r2 (raw file):

	for i := 0; i < 32; i++ {
		cCtx, cancel := context.WithTimeout(ctx, 25*time.Millisecond)
		defer cancel()

These deferred cancels will pile up until the end of the outer function. Should this be wrapped in a func literal to run them sooner?


pkg/storage/client_raft_test.go, line 961 at r2 (raw file):

	testutils.SucceedsSoon(t, func() error {
		mtc.advanceClock(ctx)
		err := newLeaderRepl.AdminTransferLease(ctx, newLeaderRepl.StoreID())

I don't think we need a transfer here. What's happening is that we're waiting for the old leader's lease to time out and then the first operation on newLeaderRepl will acquire the lease. Here that operation is an immediate transfer to itself, which is redundant - it could just be a Get.


pkg/storage/client_raft_test.go, line 993 at r2 (raw file):

	}

	// The partitioned replica should catch up after a snapshot.

Can we assert that a snapshot was used? (maybe via a metric?)

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: I have nothing to add. Thanks for typing this up!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/client_raft_test.go, line 936 at r2 (raw file):

 cancelled,

gets me every time

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/testStuckProp branch from 00f5876 to 4c93815 Compare April 24, 2019 02:29
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @bdarnell)


pkg/storage/client_merge_test.go, line 2994 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why make a shallow copy in both directions? This doesn't look like it does anything. Should the second line be req = &reqCpy?

Done.


pkg/storage/client_raft_test.go, line 936 at r2 (raw file):

Previously, ajwerner wrote…
 cancelled,

gets me every time

Done.


pkg/storage/client_raft_test.go, line 941 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

These deferred cancels will pile up until the end of the outer function. Should this be wrapped in a func literal to run them sooner?

Done.


pkg/storage/client_raft_test.go, line 961 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I don't think we need a transfer here. What's happening is that we're waiting for the old leader's lease to time out and then the first operation on newLeaderRepl will acquire the lease. Here that operation is an immediate transfer to itself, which is redundant - it could just be a Get.

I did it like this to be explicit, but I'm fine removing the AdminTransferLease. In that case, I can just put the increment in the SucceedsSoon loop. Done.


pkg/storage/client_raft_test.go, line 993 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Can we assert that a snapshot was used? (maybe via a metric?)

Done.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/testStuckProp branch from 4c93815 to f14c931 Compare April 24, 2019 04:28
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3, 1 of 1 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @bdarnell)

This PR adds a regression test for cockroachdb#37056. In doing so, it also confirms
the theory that cockroachdb#37055 is the proper fix for that bug.

Before cockroachdb#37055, the test would get stuck with the following logs repeatedly
firing:
```
I190424 00:15:52.338808 12 storage/client_test.go:1242  SucceedsSoon: expected [21 21 21], got [12 21 21]
I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317  [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2
I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065  [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31
I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068  [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
```

After cockroachdb#37055, the test passes.

Release note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/testStuckProp branch from f14c931 to 32e7d43 Compare April 24, 2019 04:55
@nvanbenschoten
Copy link
Member Author

bors r+

craig bot pushed a commit that referenced this pull request Apr 24, 2019
37058: storage: create new TestSnapshotAfterTruncationWithUncommittedTail test r=nvanbenschoten a=nvanbenschoten

This PR adds a regression test for #37056. In doing so, it also confirms
the theory that #37055 is the proper fix for that bug.

Before #37055, the test would get stuck with the following logs repeatedly
firing:
```
I190424 00:15:52.338808 12 storage/client_test.go:1242  SucceedsSoon: expected [21 21 21], got [12 21 21]
I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317  [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2
I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065  [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31
I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068  [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
```

After #37055, the test passes.

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig
Copy link
Contributor

craig bot commented Apr 24, 2019

Build succeeded

@craig craig bot merged commit 32e7d43 into cockroachdb:master Apr 24, 2019
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Apr 24, 2019
This commit speeds up the test added in cockroachdb#37058 by parallelizing its
failed writes.

Release note: None
craig bot pushed a commit that referenced this pull request Apr 24, 2019
37079: storage: speed up TestSnapshotAfterTruncationWithUncommittedTail r=nvanbenschoten a=nvanbenschoten

This commit speeds up the test added in #37058 by parallelizing its
failed writes.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Apr 24, 2019
This commit speeds up the test added in cockroachdb#37058 by parallelizing its
failed writes.

Release note: None
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/testStuckProp branch April 24, 2019 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants