-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: deflake TestV3WatchRestoreSnapshotUnsync #15667
tests: deflake TestV3WatchRestoreSnapshotUnsync #15667
Conversation
@@ -69,7 +73,7 @@ func TestV3WatchRestoreSnapshotUnsync(t *testing.T) { | |||
t.Fatal(errW) | |||
} | |||
if err := wStream.Send(&pb.WatchRequest{RequestUnion: &pb.WatchRequest_CreateRequest{ | |||
CreateRequest: &pb.WatchCreateRequest{Key: []byte("foo"), StartRevision: 5}}}); err != nil { | |||
CreateRequest: &pb.WatchCreateRequest{Key: []byte("foo"), StartRevision: 25}}}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, is watching from 25 revision counted as "old revision that were created in synced watcher group in the first place"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Background is a 5 years old issue #9281
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the link! Not sure I understand it correctly: we don't defrag compact it so the revision is still available in kv store. Is it correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, from my understanding, as long as the requested revision is not compacted, the key value is still available.
Here is a simple diagram. /cc @ahrtr @serathius in case the understanding or direction is wrong ==
Great find! |
// T2: L(snapshot-index: 22, compacted-index: 17), F_m0(index:8, out of date) | ||
// T3: L(snapshot-index: 33, compacted-index: 28), F_mo(index:8, out of date) | ||
// | ||
// Since the snapshot is handled in GoAttach, we need to trigger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key part for leader to send snapshot is to to get leader to compact index 8. If you change SnapshotCatchUpEntries to 1, this can be achieved in 1 call of EtcdServer.snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. It was my first patch. But I run into other error.
diff --git a/tests/integration/v3_watch_restore_test.go b/tests/integration/v3_watch_restore_test.go
index bdebeacfc..552f4aecf 100644
--- a/tests/integration/v3_watch_restore_test.go
+++ b/tests/integration/v3_watch_restore_test.go
@@ -57,7 +57,7 @@ func TestV3WatchRestoreSnapshotUnsync(t *testing.T) {
clus := integration.NewCluster(t, &integration.ClusterConfig{
Size: 3,
SnapshotCount: 10,
- SnapshotCatchUpEntries: 5,
+ SnapshotCatchUpEntries: 1,
})
defer clus.Terminate(t)
➜ integration git:(main) ✗ taskset -c 0,1 go test -v -count=1000 -timeout=700m -failfast -run TestV3WatchRestoreSnapshotUnsync ./
...
logger.go:130: 2023-04-08T09:24:52.907+0800 INFO m1 saved snapshot {"member": "m1", "snapshot-index": 22}
logger.go:130: 2023-04-08T09:24:52.907+0800 INFO m1 compacted Raft logs {"member": "m1", "compact-index": 21}
logger.go:130: 2023-04-08T09:24:52.907+0800 INFO m0.raft d6a69b975d0a7658 became follower at term 2 {"member": "m0"}
logger.go:130: 2023-04-08T09:24:52.907+0800 INFO m0.raft raft.node: d6a69b975d0a7658 elected leader c0c978b536d663c2 at term 2 {"member": "m0"}
{"level":"warn","ts":"2023-04-08T09:24:52.90773+0800","logger":"etcd-client","caller":"v3/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00233d860/localhost:m0","method":"/etcdserverpb.KV/Range","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: leader changed"}
logger.go:130: 2023-04-08T09:24:52.911+0800 INFO m2 saved snapshot {"member": "m2", "snapshot-index": 22}
logger.go:130: 2023-04-08T09:24:52.911+0800 INFO m2 skip compaction since there is an inflight snapshot {"member": "m2"}
logger.go:130: 2023-04-08T09:24:52.911+0800 INFO m2 sending database snapshot {"member": "m2", "snapshot-index": 22, "remote-peer-id": "d6a69b975d0a7658", "bytes": 33666, "size": "34 kB"}
logger.go:130: 2023-04-08T09:24:52.911+0800 INFO m2 sending merged snapshot {"member": "m2", "from": "c0c978b536d663c2", "to": "d6a69b975d0a7658", "bytes": 33666, "size": "34 kB"}
logger.go:130: 2023-04-08T09:24:52.911+0800 INFO m2 sent database snapshot to writer {"member": "m2", "bytes": 24576, "size": "25 kB"}
logger.go:130: 2023-04-08T09:24:52.911+0800 INFO m0 receiving database snapshot {"member": "m0", "local-member-id": "d6a69b975d0a7658", "remote-snapshot-sender-id": "c0c978b536d663c2", "incoming-snapshot-index": 22, "incoming-snapshot-message-size-bytes": 9090, "incoming-snapshot-message-size": "9.1 kB"}
...
logger.go:130: 2023-04-08T09:24:52.915+0800 INFO m0 restoring lease store {"member": "m0"}
logger.go:130: 2023-04-08T09:24:52.917+0800 INFO m0 restored lease store {"member": "m0"}
logger.go:130: 2023-04-08T09:24:52.917+0800 INFO m0 restoring mvcc store {"member": "m0"}
logger.go:130: 2023-04-08T09:24:52.917+0800 INFO m0 kvstore restored {"member": "m0", "current-rev": 15}
logger.go:130: 2023-04-08T09:24:52.917+0800 INFO m0 restored mvcc store {"member": "m0", "consistent-index": 22}
logger.go:130: 2023-04-08T09:24:52.917+0800 INFO m0 restoring alarm store {"member": "m0"}
logger.go:130: 2023-04-08T09:24:52.917+0800 INFO m0 closing old backend file {"member": "m0"}
logger.go:130: 2023-04-08T09:24:52.918+0800 INFO m0 restored alarm store {"member": "m0"}
logger.go:130: 2023-04-08T09:24:52.918+0800 INFO m0 restoring auth store {"member": "m0"}
logger.go:130: 2023-04-08T09:24:52.918+0800 INFO m0 restored auth store {"member": "m0"}
logger.go:130: 2023-04-08T09:24:52.918+0800 INFO m0 restoring v2 store {"member": "m0"}
logger.go:130: 2023-04-08T09:24:52.919+0800 INFO m0 restored v2 store {"member": "m0"}
...
v3_watch_restore_test.go:128: sleeping for 2 seconds DONE
v3_watch_restore_test.go:153: wStream.Recv error: expected 12 events, got [kv:<key:"foo" create_revision:2 mod_revision:5 version:4 value:"bar" > kv:<key:"foo" create_revision:2 mod_revision:6 version:5 value:"bar" > kv:<key:"foo" create_revision:2 mod_revision:7 version:6 value:"bar" > kv:<key:"foo" create_revision:2 mod_revision:8 version:7 value:"bar" > kv:<key:"foo" create_revision:2 mod_revision:9 version:8 value:"bar" > kv:<key:"foo" create_revision:2 mod_revision:10 version:9 value:"bar" > kv:<key:"foo" create_revision:2 mod_revision:11 version:10 value:"bar" > kv:<key:"foo" create_revision:2 mod_revision:12 version:11 value:"bar" > kv:<key:"foo" create_revision:2 mod_revision:13 version:12 value:"bar" > kv:<key:"foo" create_revision:2 mod_revision:14 version:13 value:"bar" > kv:<key:"foo" create_revision:2 mod_revision:15 version:14 value:"bar" > ]
m0 should recover to have (consistent-index:23, current-rev:16). But it got (consisten-index:22, curent-rev:15).
It is timing issue.
etcd/server/etcdserver/server.go
Line 938 in 7153a8f
merged := s.createMergedSnapshotMessage(m, ep.appliedt, ep.appliedi, ep.confState) |
The server uses old index which doesn't match the snapshot content. Not sure that it is bug or fault-tolerant. Finally, it can catch-up. However, the watch event is not expected. cc @ahrtr
So I try to make this patch to let it has time to handle memory compact.
I am still testing it with limit core by taskset -c
. Will update it later.
(If it doesn't work, I will switch to watch the log...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, the watch event is not expected. cc @ahrtr
Sorry, I do not get the point, and also did not get time dig into it for now. Please file a separate issue if you think it's an issue or something you can't explain or understand.
_, err := kvc.Put(context.TODO(), &pb.PutRequest{Key: []byte("foo"), Value: []byte("bar")}) | ||
if err != nil { | ||
t.Errorf("#%d: couldn't put key (%v)", i, err) | ||
} | ||
} | ||
// The 33 is latest snapshot index. | ||
ensureTriggerSnapshot(t, clus.Members[initialLead], 33, 5*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally this should wait for leader to compact index 8, but there is no way to check for that.
Writing snapshot files isn't enough to trigger snapshot send. For example if you set SnapshotCatchUpEntries to smth high, new version of the test will still fail. Maybe you can add a comment that implicitly it's also waiting for compaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switch to a solution which monitors testing log to ensure cluster has compacted the raft log. please take a look. Thanks
c179c95
to
4632d29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some nitpicks
} | ||
|
||
// testingLogfMonitor is to monitor t.Logf output. | ||
type testingLogfMonitor struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice implementation, didn't know it's possible to get to all logs in integration (there is AssertProcessLogs for e2e).
It seams like this is implemented to be reused in other tests. Should this be moved to util_test.go? cc @serathius
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etcd/tests/framework/integration/cluster.go
Line 746 in f7af6b6
func memberLogger(t testutil.TB, name string) *zap.Logger { |
I think we can extend the zaptest.Core with tee function. It can be aligned with AssertProcessLogs for process expecter~
193eead
to
df83241
Compare
// NOTE: In 3 members cluster, after initial lead has been elected, | ||
// there are 3xConfChange + 3xMemberAttrSet + 1xClusterVersionSet logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't correct. Note that when starting a new cluster with 3 members, each member will apply 3 ConfChange directly at the beginning before a leader is elected.
Please read https://github.com/etcd-io/raft/blob/918455b897764d8e4d12af70657280365ffaaa04/bootstrap.go#L76-L78
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will update it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated!
if lead != initialLead { | ||
t.Fatalf("expected leader index (%v), but got (%v)", initialLead, lead) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, the leader might change, although usually it will not. So suggest to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will revert this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated! Please take a look. Thanks!
Thanks @fuweid The But my immediate feeling is it's a little over complicated, can we just increase the timeout (e.g. to 20s)? And also wait more time when receiving events: (1) add a timeout, e.g. 15s, and (2) cache the partial event. etcd/tests/integration/v3_watch_restore_test.go Lines 135 to 147 in 3393d13
|
Hi @ahrtr, I can do it in the follow-up. This is to fix the flaky test. I want to keep it small. And I also want to add function helper for in-process member so that we can capture the log for a given member, instead of test case level. Does it make senses to you?
Basically, the problem is that after recover network partition, the leader hasn't compacted the raft log yet. The member 0 receives the heartbeat but it will refuse the appendEntries and return hint (Term 2, Index: 8). And then leader still can find the (index:8) so that leader applies the index:(9,10,...) to member 0 instead of snapshot. We can sleep few seconds before recover network partition. But it might be flaky somehow and the case takes more time. |
OK.
Got it. The reason is etcdserver creates snapshot & compact raft log asynchronously. |
Overall looks good to me. Great finding. @fuweid Please resolve the minor comments. |
The TestV3WatchRestoreSnapshotUnsync setups three members' cluster. Before serving any update requests from client, after leader elected, each member will have index 8 log: 3 x ConfChange + 3 x ClusterMemberAttrSet + 1 x ClusterVersionSet. Based on the config (SnapshotCount: 10, CatchUpCount: 5), we need to file update requests to trigger snapshot at least twice. T1: L(snapshot-index: 11, compacted-index: 6) F_m0(index: 8) T2: L(snapshot-index: 22, compacted-index: 17) F_m0(index: 8, out of date) After member0 recovers from network partition, it will reject leader's request and return hint (index:8, term:x). If it happens after second snapshot, leader will find out the index:8 is out of date and force to transfer snapshot. However, the client only files 15 update requests and leader doesn't finish the process of snapshot in time. Since the last of compacted-index is 6, leader can still replicate index:9 to member0 instead of snapshot. ```bash cd tests/integration CLUSTER_DEBUG=true go test -v -count=1 -run TestV3WatchRestoreSnapshotUnsync ./ ... INFO m2.raft 3da8ba707f1a21a4 became leader at term 2 {"member": "m2"} ... INFO m2 triggering snapshot {"member": "m2", "local-member-id": "3da8ba707f1a21a4", "local-member-applied-index": 22, "local-member-snapshot-index": 11, "local-member-snapshot-count": 10, "snapshot-forced": false} ... cluster.go:1359: network partition between: 99626fe5001fde8b <-> 1c964119da6db036 cluster.go:1359: network partition between: 99626fe5001fde8b <-> 3da8ba707f1a21a4 cluster.go:416: WaitMembersForLeader INFO m0.raft 99626fe5001fde8b became follower at term 2 {"member": "m0"} INFO m0.raft raft.node: 99626fe5001fde8b elected leader 3da8ba707f1a21a4 at term 2 {"member": "m0"} DEBUG m2.raft 3da8ba707f1a21a4 received MsgAppResp(rejected, hint: (index 8, term 2)) from 99626fe5001fde8b for index 23 {"member": "m2"} DEBUG m2.raft 3da8ba707f1a21a4 decreased progress of 99626fe5001fde8b to [StateReplicate match=8 next=9 inflight=15] {"member": "m2"} DEBUG m0 Applying entries {"member": "m0", "num-entries": 15} DEBUG m0 Applying entry {"member": "m0", "index": 9, "term": 2, "type": "EntryNormal"} .... INFO m2 saved snapshot {"member": "m2", "snapshot-index": 22} INFO m2 compacted Raft logs {"member": "m2", "compact-index": 17} ``` To fix this issue, the patch uses log monitor to watch "compacted Raft log" and expect that two members should compact log twice. Fixes: etcd-io#15545 Signed-off-by: Wei Fu <[email protected]>
df83241
to
536953e
Compare
LGTM. FYI: Robustness test triggering snapshot can also be merged into raft log compaction log assertion style if the 100 milliseconds wait turns out to be flaky. etcd/tests/robustness/failpoints.go Lines 422 to 438 in 1227754
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a couple of minor comments, which can be resolved in a followup PR.
} | ||
} | ||
|
||
func (m *testingLogfMonitor) addSubscriber(id string, sub testingLogfSubscriber) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment: change addSubscriber
to register
, and delSubscriber
to deregister
delete(m.subscribers, id) | ||
} | ||
|
||
func (m *testingLogfMonitor) Logf(format string, args ...interface{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a generic solution, probably we should implementa Log
as well.
Thanks @ahrtr @chaochn47 @lavacat for the review. I will file a pr for the followup. |
It's followup for etcd-io#15667. This patch is to use zaptest/observer as base to provide a similar function to pkg/expect.Expect. By default, it's disable for reducing memory cost. Signed-off-by: Wei Fu <[email protected]>
It's followup of etcd-io#15667. This patch is to use zaptest/observer as base to provide a similar function to pkg/expect.Expect. By default, it's disable for reducing memory cost. Signed-off-by: Wei Fu <[email protected]>
It's followup of etcd-io#15667. This patch is to use zaptest/observer as base to provide a similar function to pkg/expect.Expect. By default, it's disable for reducing memory cost. Signed-off-by: Wei Fu <[email protected]>
It's followup of etcd-io#15667. This patch is to use zaptest/observer as base to provide a similar function to pkg/expect.Expect. Signed-off-by: Wei Fu <[email protected]>
It's followup of etcd-io#15667. This patch is to use zaptest/observer as base to provide a similar function to pkg/expect.Expect. Before change ```bash 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz /usr/bin/time -v taskset -c 0,1,2 go test -count=1 ./integration/... Elapsed (wall clock) time (h:mm:ss or m:ss): 6:53.11 Maximum resident set size (kbytes): 331736 ``` After change ```bash 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz /usr/bin/time -v taskset -c 0,1,2 go test -count=1 ./integration/... Elapsed (wall clock) time (h:mm:ss or m:ss): 6:59.73 Maximum resident set size (kbytes): 325832 ``` It won't increase integration run-time too much. Signed-off-by: Wei Fu <[email protected]>
It's followup of etcd-io#15667. This patch is to use zaptest/observer as base to provide a similar function to pkg/expect.Expect. Before change ```bash 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz /usr/bin/time -v taskset -c 0,1,2 go test -count=1 ./integration/... Elapsed (wall clock) time (h:mm:ss or m:ss): 6:53.11 Maximum resident set size (kbytes): 331736 ``` After change ```bash 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz /usr/bin/time -v taskset -c 0,1,2 go test -count=1 ./integration/... Elapsed (wall clock) time (h:mm:ss or m:ss): 6:59.73 Maximum resident set size (kbytes): 325832 ``` Signed-off-by: Wei Fu <[email protected]>
It's followup of etcd-io#15667. This patch is to use zaptest/observer as base to provide a similar function to pkg/expect.Expect. The test env ```bash 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz mkdir /sys/fs/cgroup/etcd-followup-15667 echo 0-2 | tee /sys/fs/cgroup/etcd-followup-15667/cpuset.cpus # three cores ``` Before change: * memory.peak: ~ 681 MiB * Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14.04 After change: * memory.peak: ~ 671 MiB * Elapsed (wall clock) time (h:mm:ss or m:ss): 6:13.07 Based on the test result, I think it's safe to be enabled by default. Signed-off-by: Wei Fu <[email protected]>
It's followup of etcd-io#15667. This patch is to use zaptest/observer as base to provide a similar function to pkg/expect.Expect. The test env ```bash 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz mkdir /sys/fs/cgroup/etcd-followup-15667 echo 0-2 | tee /sys/fs/cgroup/etcd-followup-15667/cpuset.cpus # three cores ``` Before change: * memory.peak: ~ 681 MiB * Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14.04 After change: * memory.peak: ~ 671 MiB * Elapsed (wall clock) time (h:mm:ss or m:ss): 6:13.07 Based on the test result, I think it's safe to be enabled by default. Signed-off-by: Wei Fu <[email protected]>
tests: make log monitor as common helper (followup #15667
The TestV3WatchRestoreSnapshotUnsync setups three members' cluster. Before serving any update requests from client, after leader elected, each member will have index 8 log: 3 x ConfChange + 3 x ClusterMemberAttrSet + 1 x ClusterVersionSet.
Based on the config (SnapshotCount: 10, CatchUpCount: 5), we need to file update requests to trigger snapshot at least twice.
T1: L(snapshot-index: 11, compacted-index: 6) F_m0(index: 8)
T2: L(snapshot-index: 22, compacted-index: 17) F_m0(index: 8, out of date)
After member0 recovers from network partition, it will reject leader's request and return hint (index:8, term:x). If it happens after second snapshot, leader will find out the index:8 is out of date and force to transfer snapshot.
However, the client only files 15 update requests and leader doesn't finish the process of snapshot in time. Since the last of compacted-index is 6, leader can still replicate index:9 to member0 instead of snapshot.
To fix this issue, the patch uses log monitor to watch "compacted Raft log" and
expect that two members should compact log twice.
Fixes: #15545
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.