-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: create new TestSnapshotAfterTruncationWithUncommittedTail test #37058
storage: create new TestSnapshotAfterTruncationWithUncommittedTail test #37058
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)
pkg/storage/client_merge_test.go, line 2994 at r2 (raw file):
if len(req.Heartbeats)+len(req.HeartbeatResps) > 0 { reqCpy := *req *req = reqCpy
Why make a shallow copy in both directions? This doesn't look like it does anything. Should the second line be req = &reqCpy
?
pkg/storage/client_raft_test.go, line 867 at r2 (raw file):
ctx := context.Background() mtc := &multiTestContext{ // This test was written before the multiTestContext started creating many
Not quite true in this case, but OK.
pkg/storage/client_raft_test.go, line 941 at r2 (raw file):
for i := 0; i < 32; i++ { cCtx, cancel := context.WithTimeout(ctx, 25*time.Millisecond) defer cancel()
These deferred cancels will pile up until the end of the outer function. Should this be wrapped in a func literal to run them sooner?
pkg/storage/client_raft_test.go, line 961 at r2 (raw file):
testutils.SucceedsSoon(t, func() error { mtc.advanceClock(ctx) err := newLeaderRepl.AdminTransferLease(ctx, newLeaderRepl.StoreID())
I don't think we need a transfer here. What's happening is that we're waiting for the old leader's lease to time out and then the first operation on newLeaderRepl will acquire the lease. Here that operation is an immediate transfer to itself, which is redundant - it could just be a Get.
pkg/storage/client_raft_test.go, line 993 at r2 (raw file):
} // The partitioned replica should catch up after a snapshot.
Can we assert that a snapshot was used? (maybe via a metric?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have nothing to add. Thanks for typing this up!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/client_raft_test.go, line 936 at r2 (raw file):
cancelled,
gets me every time
00f5876
to
4c93815
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @bdarnell)
pkg/storage/client_merge_test.go, line 2994 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Why make a shallow copy in both directions? This doesn't look like it does anything. Should the second line be
req = &reqCpy
?
Done.
pkg/storage/client_raft_test.go, line 936 at r2 (raw file):
Previously, ajwerner wrote…
cancelled,
gets me every time
Done.
pkg/storage/client_raft_test.go, line 941 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
These deferred cancels will pile up until the end of the outer function. Should this be wrapped in a func literal to run them sooner?
Done.
pkg/storage/client_raft_test.go, line 961 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
I don't think we need a transfer here. What's happening is that we're waiting for the old leader's lease to time out and then the first operation on newLeaderRepl will acquire the lease. Here that operation is an immediate transfer to itself, which is redundant - it could just be a Get.
I did it like this to be explicit, but I'm fine removing the AdminTransferLease
. In that case, I can just put the increment in the SucceedsSoon
loop. Done.
pkg/storage/client_raft_test.go, line 993 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Can we assert that a snapshot was used? (maybe via a metric?)
Done.
Release note: None
4c93815
to
f14c931
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r3, 1 of 1 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @bdarnell)
This PR adds a regression test for cockroachdb#37056. In doing so, it also confirms the theory that cockroachdb#37055 is the proper fix for that bug. Before cockroachdb#37055, the test would get stuck with the following logs repeatedly firing: ``` I190424 00:15:52.338808 12 storage/client_test.go:1242 SucceedsSoon: expected [21 21 21], got [12 21 21] I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317 [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2 I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065 [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31 I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068 [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0] ``` After cockroachdb#37055, the test passes. Release note: None
f14c931
to
32e7d43
Compare
bors r+ |
37058: storage: create new TestSnapshotAfterTruncationWithUncommittedTail test r=nvanbenschoten a=nvanbenschoten This PR adds a regression test for #37056. In doing so, it also confirms the theory that #37055 is the proper fix for that bug. Before #37055, the test would get stuck with the following logs repeatedly firing: ``` I190424 00:15:52.338808 12 storage/client_test.go:1242 SucceedsSoon: expected [21 21 21], got [12 21 21] I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317 [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2 I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065 [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31 I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068 [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0] ``` After #37055, the test passes. Co-authored-by: Nathan VanBenschoten <[email protected]>
Build succeeded |
This commit speeds up the test added in cockroachdb#37058 by parallelizing its failed writes. Release note: None
37079: storage: speed up TestSnapshotAfterTruncationWithUncommittedTail r=nvanbenschoten a=nvanbenschoten This commit speeds up the test added in #37058 by parallelizing its failed writes. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
This commit speeds up the test added in cockroachdb#37058 by parallelizing its failed writes. Release note: None
This PR adds a regression test for #37056. In doing so, it also confirms
the theory that #37055 is the proper fix for that bug.
Before #37055, the test would get stuck with the following logs repeatedly
firing:
After #37055, the test passes.