Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated watch event detected in robustness test #17247

Closed
4 tasks
serathius opened this issue Jan 15, 2024 · 31 comments
Closed
4 tasks

Duplicated watch event detected in robustness test #17247

serathius opened this issue Jan 15, 2024 · 31 comments
Labels
area/robustness-testing priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. type/bug

Comments

@serathius
Copy link
Member

serathius commented Jan 15, 2024

Bug report criteria

What happened?

2 robustness test failures:

Common elements:

  • Both failures have a duplicated watch event
  • Both on Kubernetes traffic
  • Both on 3 node cluster

Differences:

  • Failpoints: raftAfterSaveSnap=panic() and MemberReplace

Both failures on have duplicated watch event:

2024-01-13T09:50:27.4782520Z     validate.go:41: Failed merging watch history to create event history, skipping further validation, err: events between clients 11 and 9 don't match, revision: 800, diff:   []model.WatchEvent{
2024-01-13T09:50:27.4786636Z           	{Event: {Type: "put-operation", Key: "/registry/pods/default/76Z51", Value: {Value: "1061"}}, Revision: 800},
2024-01-13T09:50:27.4788612Z         - 	{
2024-01-13T09:50:27.4789499Z         - 		Event: model.Event{
2024-01-13T09:50:27.4790382Z         - 			Type:  "put-operation",
2024-01-13T09:50:27.4791612Z         - 			Key:   "/registry/pods/default/76Z51",
2024-01-13T09:50:27.4792744Z         - 			Value: model.ValueOrHash{Value: "1061"},
2024-01-13T09:50:27.4793619Z         - 		},
2024-01-13T09:50:27.4794433Z         - 		Revision: 800,
2024-01-13T09:50:27.4795136Z         - 	},
2024-01-13T09:50:27.4795687Z           }
2024-01-14T10:24:58.3514126Z     validate.go:41: Failed merging watch history to create event history, skipping further validation, err: events between clients 10 and 7 don't match, revision: 1583, diff:   []model.WatchEvent{
2024-01-14T10:24:58.3518620Z           	{Event: {Type: "put-operation", Key: "/registry/pods/default/rMWB7", Value: {Value: "1915"}}, Revision: 1583},
2024-01-14T10:24:58.3520322Z         + 	{
2024-01-14T10:24:58.3529383Z         + 		Event: model.Event{
2024-01-14T10:24:58.3530681Z         + 			Type:  "put-operation",
2024-01-14T10:24:58.3532196Z         + 			Key:   "/registry/pods/default/rMWB7",
2024-01-14T10:24:58.3533517Z         + 			Value: model.ValueOrHash{Value: "1915"},
2024-01-14T10:24:58.3534544Z         + 		},
2024-01-14T10:24:58.3535281Z         + 		Revision: 1583,
2024-01-14T10:24:58.3536055Z         + 	},
2024-01-14T10:24:58.3536676Z           }

What did you expect to happen?

No robustness test failure

How can we reproduce it (as minimally and precisely as possible)?

No repro yet, I don't have access to arm64 machine to debug

cc @ahrtr @jmhbnz

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
# paste output here

$ etcdctl version
# paste output here

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

serathius added a commit to serathius/etcd that referenced this issue Jan 15, 2024
Change-Id: Ieed5f16ded627ac27f6acae99fea23211e6f718d
@serathius
Copy link
Member Author

serathius commented Jan 15, 2024

serathius added a commit to serathius/etcd that referenced this issue Jan 16, 2024
Change-Id: Ieed5f16ded627ac27f6acae99fea23211e6f718d
@serathius serathius added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/robustness-testing labels Jan 16, 2024
@serathius
Copy link
Member Author

serathius added a commit to serathius/etcd that referenced this issue Jan 16, 2024
Change-Id: Ieed5f16ded627ac27f6acae99fea23211e6f718d
Signed-off-by: Marek Siarkowicz <[email protected]>
@serathius
Copy link
Member Author

Got a one local repro.

@serathius serathius changed the title [arm64] Duplicated watch event detected in robustness test Duplicated watch event detected in robustness test Jan 16, 2024
@serathius
Copy link
Member Author

serathius commented Jan 16, 2024

Watch request-response with a duplicated revision 882, interesting part:

        {
          "Type": "put-operation",
          "Key": "/registry/pods/default/PSvk1",
          "Value": {
            "Value": "1014",
            "Hash": 0
          },
          "Revision": 881
        },
        {
          "Type": "put-operation",
          "Key": "/registry/pods/default/gqmYo",
          "Value": {
            "Value": "1015",
            "Hash": 0
          },
          "Revision": 882
        },
        {
          "Type": "put-operation",
          "Key": "/registry/pods/default/gqmYo",
          "Value": {
            "Value": "1015",
            "Hash": 0
          },
          "Revision": 882
        },
        {
          "Type": "put-operation",
          "Key": "/registry/pods/default/llCqr",
          "Value": {
            "Value": "1017",
            "Hash": 0
          },
          "Revision": 883
        },

Full response

``` { "Request": { "Key": "/registry/pods/", "Revision": 854, "WithPrefix": true, "WithProgressNotify": true }, "Responses": [ { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/zlqKy", "Value": { "Value": "980", "Hash": 0 }, "Revision": 854 }, { "Type": "put-operation", "Key": "/registry/pods/default/CPWLs", "Value": { "Value": "982", "Hash": 0 }, "Revision": 855 }, { "Type": "put-operation", "Key": "/registry/pods/default/n5T9S", "Value": { "Value": "983", "Hash": 0 }, "Revision": 856 }, { "Type": "put-operation", "Key": "/registry/pods/default/llCqr", "Value": { "Value": "984", "Hash": 0 }, "Revision": 857 }, { "Type": "put-operation", "Key": "/registry/pods/default/Gvvm7", "Value": { "Value": "985", "Hash": 0 }, "Revision": 858 }, { "Type": "put-operation", "Key": "/registry/pods/default/qalpx", "Value": { "Value": "986", "Hash": 0 }, "Revision": 859 }, { "Type": "put-operation", "Key": "/registry/pods/default/EQPt4", "Value": { "Value": "987", "Hash": 0 }, "Revision": 860 }, { "Type": "put-operation", "Key": "/registry/pods/default/158WY", "Value": { "Value": "988", "Hash": 0 }, "Revision": 861 }, { "Type": "put-operation", "Key": "/registry/pods/default/WAGPJ", "Value": { "Value": "991", "Hash": 0 }, "Revision": 862 }, { "Type": "put-operation", "Key": "/registry/pods/default/gqmYo", "Value": { "Value": "992", "Hash": 0 }, "Revision": 863 }, { "Type": "put-operation", "Key": "/registry/pods/default/lwMud", "Value": { "Value": "993", "Hash": 0 }, "Revision": 864 }, { "Type": "put-operation", "Key": "/registry/pods/default/bbfGl", "Value": { "Value": "994", "Hash": 0 }, "Revision": 865 }, { "Type": "put-operation", "Key": "/registry/pods/default/e9AyQ", "Value": { "Value": "996", "Hash": 0 }, "Revision": 866 }, { "Type": "put-operation", "Key": "/registry/pods/default/mpvvr", "Value": { "Value": "997", "Hash": 0 }, "Revision": 867 }, { "Type": "put-operation", "Key": "/registry/pods/default/F3TRD", "Value": { "Value": "998", "Hash": 0 }, "Revision": 868 }, { "Type": "put-operation", "Key": "/registry/pods/default/OVXTy", "Value": { "Value": "1000", "Hash": 0 }, "Revision": 869 }, { "Type": "put-operation", "Key": "/registry/pods/default/uZBI1", "Value": { "Value": "1002", "Hash": 0 }, "Revision": 870 }, { "Type": "put-operation", "Key": "/registry/pods/default/qbvoO", "Value": { "Value": "1003", "Hash": 0 }, "Revision": 871 }, { "Type": "put-operation", "Key": "/registry/pods/default/4b7V4", "Value": { "Value": "1005", "Hash": 0 }, "Revision": 872 }, { "Type": "put-operation", "Key": "/registry/pods/default/WPXbR", "Value": { "Value": "1006", "Hash": 0 }, "Revision": 873 }, { "Type": "put-operation", "Key": "/registry/pods/default/F3Ro3", "Value": { "Value": "1007", "Hash": 0 }, "Revision": 874 }, { "Type": "put-operation", "Key": "/registry/pods/default/s2DPx", "Value": { "Value": "1008", "Hash": 0 }, "Revision": 875 }, { "Type": "put-operation", "Key": "/registry/pods/default/Gvvm7", "Value": { "Value": "1009", "Hash": 0 }, "Revision": 876 }, { "Type": "put-operation", "Key": "/registry/pods/default/XcZU8", "Value": { "Value": "1010", "Hash": 0 }, "Revision": 877 }, { "Type": "put-operation", "Key": "/registry/pods/default/553WP", "Value": { "Value": "1011", "Hash": 0 }, "Revision": 878 }, { "Type": "put-operation", "Key": "/registry/pods/default/Ft0M7", "Value": { "Value": "1012", "Hash": 0 }, "Revision": 879 }, { "Type": "put-operation", "Key": "/registry/pods/default/EQPt4", "Value": { "Value": "1013", "Hash": 0 }, "Revision": 880 }, { "Type": "put-operation", "Key": "/registry/pods/default/PSvk1", "Value": { "Value": "1014", "Hash": 0 }, "Revision": 881 }, { "Type": "put-operation", "Key": "/registry/pods/default/gqmYo", "Value": { "Value": "1015", "Hash": 0 }, "Revision": 882 }, { "Type": "put-operation", "Key": "/registry/pods/default/gqmYo", "Value": { "Value": "1015", "Hash": 0 }, "Revision": 882 }, { "Type": "put-operation", "Key": "/registry/pods/default/llCqr", "Value": { "Value": "1017", "Hash": 0 }, "Revision": 883 }, { "Type": "put-operation", "Key": "/registry/pods/default/vSBGf", "Value": { "Value": "1019", "Hash": 0 }, "Revision": 884 }, { "Type": "put-operation", "Key": "/registry/pods/default/n5T9S", "Value": { "Value": "1020", "Hash": 0 }, "Revision": 885 } ], "IsProgressNotify": false, "Revision": 885, "Time": 6124853237 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/OVXTy", "Value": { "Value": "1021", "Hash": 0 }, "Revision": 886 } ], "IsProgressNotify": false, "Revision": 886, "Time": 6125394945 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/XcZU8", "Value": { "Value": "1023", "Hash": 0 }, "Revision": 887 } ], "IsProgressNotify": false, "Revision": 887, "Time": 6134684586 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/mxBK7", "Value": { "Value": "1024", "Hash": 0 }, "Revision": 888 } ], "IsProgressNotify": false, "Revision": 888, "Time": 6137375281 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/bbfGl", "Value": { "Value": "1025", "Hash": 0 }, "Revision": 889 } ], "IsProgressNotify": false, "Revision": 889, "Time": 6145422960 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/lRR85", "Value": { "Value": "1026", "Hash": 0 }, "Revision": 890 } ], "IsProgressNotify": false, "Revision": 890, "Time": 6147789266 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/v0Wg5", "Value": { "Value": "1027", "Hash": 0 }, "Revision": 891 } ], "IsProgressNotify": false, "Revision": 891, "Time": 6157548521 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/PI7s9", "Value": { "Value": "1028", "Hash": 0 }, "Revision": 892 } ], "IsProgressNotify": false, "Revision": 892, "Time": 6162857744 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/v0Wg5", "Value": { "Value": "1029", "Hash": 0 }, "Revision": 893 } ], "IsProgressNotify": false, "Revision": 893, "Time": 6173448149 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/ZsJkQ", "Value": { "Value": "1030", "Hash": 0 }, "Revision": 894 } ], "IsProgressNotify": false, "Revision": 894, "Time": 6177937264 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/eikdd", "Value": { "Value": "1031", "Hash": 0 }, "Revision": 895 } ], "IsProgressNotify": false, "Revision": 895, "Time": 6182406700 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/fNa1r", "Value": { "Value": "1032", "Hash": 0 }, "Revision": 896 } ], "IsProgressNotify": false, "Revision": 896, "Time": 6188104553 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/EQPt4", "Value": { "Value": "1033", "Hash": 0 }, "Revision": 897 } ], "IsProgressNotify": false, "Revision": 897, "Time": 6198847616 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/WAGPJ", "Value": { "Value": "1035", "Hash": 0 }, "Revision": 898 } ], "IsProgressNotify": false, "Revision": 898, "Time": 6212652771 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/HdoK9", "Value": { "Value": "1036", "Hash": 0 }, "Revision": 899 } ], "IsProgressNotify": false, "Revision": 899, "Time": 6219512628 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/bbfGl", "Value": { "Value": "1037", "Hash": 0 }, "Revision": 900 } ], "IsProgressNotify": false, "Revision": 900, "Time": 6225460171 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/s2DPx", "Value": { "Value": "1038", "Hash": 0 }, "Revision": 901 } ], "IsProgressNotify": false, "Revision": 901, "Time": 6231087364 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/4t7Cm", "Value": { "Value": "1039", "Hash": 0 }, "Revision": 902 } ], "IsProgressNotify": false, "Revision": 902, "Time": 6237653749 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/rwstr", "Value": { "Value": "1040", "Hash": 0 }, "Revision": 903 } ], "IsProgressNotify": false, "Revision": 903, "Time": 6242222081 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/vs80D", "Value": { "Value": "1041", "Hash": 0 }, "Revision": 904 } ], "IsProgressNotify": false, "Revision": 904, "Time": 6252938003 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/63GF1", "Value": { "Value": "1042", "Hash": 0 }, "Revision": 905 } ], "IsProgressNotify": false, "Revision": 905, "Time": 6259062598 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/zlqKy", "Value": { "Value": "1043", "Hash": 0 }, "Revision": 906 } ], "IsProgressNotify": false, "Revision": 906, "Time": 6279987619 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/PSvk1", "Value": { "Value": "1044", "Hash": 0 }, "Revision": 907 } ], "IsProgressNotify": false, "Revision": 907, "Time": 6280049124 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/7hGI0", "Value": { "Value": "1045", "Hash": 0 }, "Revision": 908 } ], "IsProgressNotify": false, "Revision": 908, "Time": 6284565610 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/9nMUa", "Value": { "Value": "1046", "Hash": 0 }, "Revision": 909 } ], "IsProgressNotify": false, "Revision": 909, "Time": 6287166957 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/ZVcyt", "Value": { "Value": "1047", "Hash": 0 }, "Revision": 910 } ], "IsProgressNotify": false, "Revision": 910, "Time": 6292679653 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/5uhDY", "Value": { "Value": "1048", "Hash": 0 }, "Revision": 911 } ], "IsProgressNotify": false, "Revision": 911, "Time": 6297197079 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/qalpx", "Value": { "Value": "1049", "Hash": 0 }, "Revision": 912 } ], "IsProgressNotify": false, "Revision": 912, "Time": 6307661469 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/PI7s9", "Value": { "Value": "1050", "Hash": 0 }, "Revision": 913 } ], "IsProgressNotify": false, "Revision": 913, "Time": 6313212397 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/jOxQ8", "Value": { "Value": "1051", "Hash": 0 }, "Revision": 914 } ], "IsProgressNotify": false, "Revision": 914, "Time": 6324605471 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/edTMh", "Value": { "Value": "1052", "Hash": 0 }, "Revision": 915 } ], "IsProgressNotify": false, "Revision": 915, "Time": 6328803116 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/rwZnc", "Value": { "Value": "1053", "Hash": 0 }, "Revision": 916 } ], "IsProgressNotify": false, "Revision": 916, "Time": 6333627710 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/UvmJr", "Value": { "Value": "1054", "Hash": 0 }, "Revision": 917 } ], "IsProgressNotify": false, "Revision": 917, "Time": 6340956298 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/qalpx", "Value": { "Value": "1055", "Hash": 0 }, "Revision": 918 } ], "IsProgressNotify": false, "Revision": 918, "Time": 6349607460 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/6Z4sD", "Value": { "Value": "1056", "Hash": 0 }, "Revision": 919 } ], "IsProgressNotify": false, "Revision": 919, "Time": 6352693006 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/qbvoO", "Value": { "Value": "1057", "Hash": 0 }, "Revision": 920 } ], "IsProgressNotify": false, "Revision": 920, "Time": 6363659529 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/F3TRD", "Value": { "Value": "1058", "Hash": 0 }, "Revision": 921 } ], "IsProgressNotify": false, "Revision": 921, "Time": 6371889239 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/pMgix", "Value": { "Value": "1059", "Hash": 0 }, "Revision": 922 } ], "IsProgressNotify": false, "Revision": 922, "Time": 6375499923 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/rwZnc", "Value": { "Value": "1060", "Hash": 0 }, "Revision": 923 } ], "IsProgressNotify": false, "Revision": 923, "Time": 6386067716 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/v0Wg5", "Value": { "Value": "1061", "Hash": 0 }, "Revision": 924 } ], "IsProgressNotify": false, "Revision": 924, "Time": 6388870782 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/0AUzn", "Value": { "Value": "1062", "Hash": 0 }, "Revision": 925 } ], "IsProgressNotify": false, "Revision": 925, "Time": 6392359146 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/lwMud", "Value": { "Value": "1063", "Hash": 0 }, "Revision": 926 } ], "IsProgressNotify": false, "Revision": 926, "Time": 6407253095 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/6Z4sD", "Value": { "Value": "1064", "Hash": 0 }, "Revision": 927 } ], "IsProgressNotify": false, "Revision": 927, "Time": 6417900820 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/Ft0M7", "Value": { "Value": "1066", "Hash": 0 }, "Revision": 928 } ], "IsProgressNotify": false, "Revision": 928, "Time": 6438074579 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/OVXTy", "Value": { "Value": "1068", "Hash": 0 }, "Revision": 929 } ], "IsProgressNotify": false, "Revision": 929, "Time": 6452060313 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/SOhlK", "Value": { "Value": "1070", "Hash": 0 }, "Revision": 930 } ], "IsProgressNotify": false, "Revision": 930, "Time": 6469929313 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/pUZix", "Value": { "Value": "1071", "Hash": 0 }, "Revision": 931 } ], "IsProgressNotify": false, "Revision": 931, "Time": 6477969346 }, { "Events": [ { "Type": "put-operation", "Key": "/registry/pods/default/HdoK9", "Value": { "Value": "1072", "Hash": 0 }, "Revision": 932 } ], "IsProgressNotify": false, "Revision": 932, "Time": 6482229480 } ] } ```

@serathius
Copy link
Member Author

Trying to bisect in the PR responsible.

Last passing nightly https://github.com/etcd-io/etcd/actions/runs/7486757727 with commit b3bf59a as a bisect start. However it's not guaranteed corrct commit, due to flakes in #17246 we cannot be 100% sure about error rate of this issue.

@serathius
Copy link
Member Author

We should also investigate why this issue only reproduces on arm64 in CI, even though locally it also appears on amd. Issues independent of platform should be detected on both. Expect this is because of two reasons:

  • arm64 has lower performance as observed in arm64 instances not having enough power to run robustness test  #17246 making issue more likely to surface
  • We don't run this combination frequent enough. It takes around 5-10 minutes of running combination of KubernetesTraffic(1/1), ClusterOfSize3(1/1), raftAfterSaveSnap=panic() (1/35), snapshotCount=50 (1/3). So during robustness tests we in such combination only once per 13 days, while reproducing it takes around 50 repeated runs.

I think we should consider increasing number of runs of exploratory tests (TestRobustness/Kubernetes, TestRobustness/Etcd), while reducing the number of regressions tests (TestRobustness/Issue*)

@serathius
Copy link
Member Author

Bisection pointed to ed99424 as first broken commit, due to not 100% reproducibility we need to more testing to confirm.

Will continue to work on confirming the results, but in the mean time we need to block backports.

@serathius
Copy link
Member Author

Reproduced the issue enough time to make me sure that this is #17228

@siyuanfoundation Do you have time look into this soon or would you prefer to revert and fix later?

@siyuanfoundation
Copy link
Contributor

I will take a look today

siyuanfoundation added a commit to siyuanfoundation/etcd that referenced this issue Jan 16, 2024
might be the cause of etcd-io#17247.

Signed-off-by: Siyuan Zhang <[email protected]>
siyuanfoundation added a commit to siyuanfoundation/etcd that referenced this issue Jan 17, 2024
might be the cause of etcd-io#17247.

Signed-off-by: Siyuan Zhang <[email protected]>
siyuanfoundation added a commit to siyuanfoundation/etcd that referenced this issue Jan 17, 2024
might be the cause of etcd-io#17247.

Signed-off-by: Siyuan Zhang <[email protected]>
siyuanfoundation added a commit to siyuanfoundation/etcd that referenced this issue Jan 17, 2024
@ahrtr
Copy link
Member

ahrtr commented Jan 17, 2024

#17263 (comment) is a valid point. The key point is dedupe always increase bb.used by 1 at least, even there is no any write in the key bucket.

bb.used = widx + 1

Actually there shouldn't be any duplicated items,

  • for key bucket, the key is the revision, which is always monotonically growing;
  • for non key buckets, if clients insert the same key (e.g. user, role, etc.), etcd will return *AlreayExist error. The only possible case is deleting and adding an object (e.g an user) again. But etcd will commit the TXN immediately if there is any deletion operation.

Proposed actions:

siyuanfoundation added a commit to siyuanfoundation/etcd that referenced this issue Jan 17, 2024
siyuanfoundation added a commit to siyuanfoundation/etcd that referenced this issue Jan 17, 2024
siyuanfoundation added a commit to siyuanfoundation/etcd that referenced this issue Jan 17, 2024
to fix etcd-io#17247.

Signed-off-by: Siyuan Zhang <[email protected]>
siyuanfoundation added a commit to siyuanfoundation/etcd that referenced this issue Jan 18, 2024
to fix etcd-io#17247.

Signed-off-by: Siyuan Zhang <[email protected]>
siyuanfoundation added a commit to siyuanfoundation/etcd that referenced this issue Jan 18, 2024
to fix etcd-io#17247.

Signed-off-by: Siyuan Zhang <[email protected]>
siyuanfoundation added a commit to siyuanfoundation/etcd that referenced this issue Jan 18, 2024
to fix etcd-io#17247.

Signed-off-by: Siyuan Zhang <[email protected]>
@siyuanfoundation
Copy link
Contributor

siyuanfoundation commented Jan 18, 2024

#17263 (comment) is a valid point. The key point is dedupe always increase bb.used by 1 at least, even there is no any write in the key bucket.

bb.used = widx + 1

Actually there shouldn't be any duplicated items,

  • for key bucket, the key is the revision, which is always monotonically growing;
  • for non key buckets, if clients insert the same key (e.g. user, role, etc.), etcd will return *AlreayExist error. The only possible case is deleting and adding an object (e.g an user) again. But etcd will commit the TXN immediately if there is any deletion operation.

Proposed actions:

I disagree with the proposal.

  1. at the buffer level, it is hard to predict what new use cases we would need in the future. It is better to fix the issue than double down on the no duplicate assumption. The existing merge function does not make such assumption.
  2. Most of dedupe (sorting) is still needed even if there is no duplicate. See TestRangeAfterUnorderedKeyWriteMatch in Fix tx buffer inconsistency if there are unordered key writes in one tx. #17263

@ahrtr
Copy link
Member

ahrtr commented Jan 18, 2024

at the buffer level, it is unnecessary and confusing to assume no duplicates in the replace path when the actual merging code does not make such assumptions.

Where is the assumption? I didn't make any assumption. Always remember to simplify the repo as much as possible! In this case, the dedupe is unnecessary, so let's remove it to reduce the complexity. Just as I mentioned above, we also need to add a verification for it to ensure we do not make any wrong decision. (Search verify.Verify in the repo)

Most of dedupe (sorting) is still needed even if there is no duplicate

  • Copied from my previous comment we need to keep the sort.Stable(bb) after merging the write and read buffers.
  • Again, for key bucket, it's always sorted (revision is always monotonically growing); for non key bucket, there is no need to be sorted, note range operation isn't allowed on non key bucket. Also we need to add a verification as mentioned above.

Anyway, let's revert #17228 firstly to not to fail the workflow

@siyuanfoundation
Copy link
Contributor

siyuanfoundation commented Jan 18, 2024

at the buffer level, it is unnecessary and confusing to assume no duplicates in the replace path when the actual merging code does not make such assumptions.

Where is the assumption? I didn't make any assumption. Always remember to simplify the repo as much as possible! In this case, the dedupe is unnecessary, so let's remove it to reduce the complexity. Just as I mentioned above, we also need to add a verification for it to ensure we do not make any wrong decision. (Search verify.Verify in the repo)

"there shouldn't be any duplicated items" is an assumption. Or if you prefer, a constraint. This might be true on the etcd level because there is yet no use cases of putting duplicate keys between lock and unlock. But at the tx/buffer level, if UnsafePut is a public method, my natural assumption is that it should be possible put duplicate keys.

Yes, adding verification could safeguard it. But it is just strange to have such a constraint.

In addition, it does not make the code simpler, because func (bb *bucketBuffer) merge(bbsrc *bucketBuffer) still need the dedupe function (because there could be duplicate keys between 2 transactions).
The complexity of verifying no duplicates is O(n), the same as dedupe.

Most of dedupe (sorting) is still needed even if there is no duplicate

  • Copied from my previous comment we need to keep the sort.Stable(bb) after merging the write and read buffers.
  • Again, for key bucket, it's always sorted (revision is always monotonically growing); for non key bucket, there is no need to be sorted, note range operation isn't allowed on non key bucket. Also we need to add a verification as mentioned above.

Anyway, let's revert #17228 firstly to not to fail the workflow

#17263 will fix the workflow.

@wenjiaswe
Copy link
Contributor

Anyway, let's revert #17228 firstly to not to fail the workflow

Looks like we already have a fix for the failed workflow, why do we still need to revert in this case? #17263 is LGTMed.

ahrtr added a commit to ahrtr/etcd that referenced this issue Jan 19, 2024
Actually pull/17228 was fixing a non-exist issue, because there
will never have duplicated items. Please refer to
etcd-io#17247 (comment).

The PR also introduces a bug (issues/17247).

Signed-off-by: Benjamin Wang <[email protected]>
@ahrtr
Copy link
Member

ahrtr commented Jan 19, 2024

"there shouldn't be any duplicated items" is an assumption. Or if you prefer, a constraint.

It isn't an assumption, nor constraint. It's an invariant property, at least for the key space/bucket. Note etcd applys MVCC on the key space, as I mentioned above multiple times, the keys are monotonically growing revisions. There is no overwrite on any existing keys in the Key bucket. It's the key point to support concurrent & consistent read.

We should add a verification to ensure this invariant property. Refer to the verification added in the second commit in #17281. And it successfully identified some test issues so far,

@ahrtr
Copy link
Member

ahrtr commented Jan 19, 2024

Looks like we already have a fix for the failed workflow, why do we still need to revert in this case? #17263 is LGTMed.

It's the most important part of etcd. Please let's do not blindly make any decision (e.g. just because it can fix the workflow failure) before it's well understood and discussed.

Reverting the PR #17228 firstly, afterwards delivering a clean PR to resolve it is the most prudential approach.

@fuweid
Copy link
Member

fuweid commented Jan 20, 2024

IMO there are two potential problems:

  • buffer could contain duplicate keys
  • buffer contains some keys which are committed into bbolt and accessable by read transaction

I think #17263 can be consided to fix first one. We don't need to introduce map for duplicate key check.
If dedupe changes used value, it should not happen for Key bucket. But for me, the reviewers need to pay more attentions to ensure the upper layer's component should not use UnsafePut for Key bucket. So, we should consider to re-active #17263.

For second one, if there are overlap between buffer cache and read transacton, we should panic.
For baseReadTx.UnsafeRange function, we can add more comment about why we should only allow unsafe range Key bucket. It's helpful for new members or contributors to revisit the history.

Just my two cents.

@siyuanfoundation
Copy link
Contributor

For second one, if there are overlap between buffer cache and read transacton, we should panic. For baseReadTx.UnsafeRange function, we can add more comment about why we should only allow unsafe range Key bucket. It's helpful for new members or contributors to revisit the history.

For the second one, baseReadTx.UnsafeRange would panic if limit > 1 for non Key buckets (

if limit > 1 && !bucketType.IsSafeRangeBucket() {
). Since buffer always have the latest value for other buckets, there should be no problem with overlap between buffer cache and read transacton

@ahrtr
Copy link
Member

ahrtr commented Jan 22, 2024

Solution 1: #17281 (comment)
Solution 2: #17290

We need to run benchmark to compare the performance among: main vs #17281 vs #17290. If there is no much performance difference, then prefer to go for #17290. At least, it's much safe to backport.

Please anyone feel free to continue to work on this issue.

@siyuanfoundation
Copy link
Contributor

Is there a standard benchmark process?

@ahrtr
Copy link
Member

ahrtr commented Jan 22, 2024

benchmark is what we have so far. @jmhbnz may have some thoughts on this.

@jmhbnz
Copy link
Member

jmhbnz commented Jan 23, 2024

benchmark is what we have so far. @jmhbnz may have some thoughts on this.

Long story short we don't currently have a standard or easily democratised way for benchmarking. We have two tools that are available however, rw-heatmaps and benchmark.

For a quick comparison I would suggest running benchmark with something like:

# Server
GOMAXPROCS=10 ./bin/etcd --quota-backend-bytes=17179869184 --listen-client-urls http://0.0.0.0:23790 --advertise-client-urls http://127.0.0.1:23790

# Benchmark
GOMAXPROCS=6 go run ./tools/benchmark put --conns=64 --clients=64 --key-size=256 --val-size=4096 --key-space-size=50000 --total=2000000 --endpoints "http://127.0.0.1:23790"

# make clean && make build between each run

For results to have any validity please run several times and ensure all runs for each branch being tested happen on the same hardware and as much as possible system load state.

@fuweid
Copy link
Member

fuweid commented Jan 23, 2024

Since buffer always have the latest value for other buckets, there should be no problem with overlap between buffer cache and read transacton

Hi @siyuanfoundation sorry for late reply. This issue is caused by committed key/values in buffer cache.
There is no guard between buffer cache and read transcation. Need more eyes during review.

@siyuanfoundation
Copy link
Contributor

I have run benchmark 6 times as @jmhbnz suggested

# Server
GOMAXPROCS=10 ./bin/etcd --quota-backend-bytes=17179869184 --listen-client-urls http://0.0.0.0:23790 --advertise-client-urls http://127.0.0.1:23790

# Benchmark
GOMAXPROCS=6 go run ./tools/benchmark put --conns=64 --clients=64 --key-size=256 --val-size=4096 --key-space-size=50000 --total=2000000 --endpoints "http://127.0.0.1:23790"

# make clean && make build between each run

This is the result of the total time of each benchmark run:
Screenshot 2024-01-24 at 12 07 30 PM

It is pretty noisy. But there does not seem to be much difference between #17281 and #17290.

@ahrtr
Copy link
Member

ahrtr commented Jan 25, 2024

thx for the testing. Then let's go for #17290, the main reason is that it's simpler & easy to understand / maintain.

We also need to backport #17325 to 3.5 and 3.4.

@serathius
Copy link
Member Author

serathius commented Jan 26, 2024

Just to double check. Have we expected lack of performance difference between #17281 and #17290 or is it a surprise? If that was unexpected can you double check the benchmark method?

For example you used ./tools/benchmark put which benchmarks only put requests, are we sure the change we are making would influence put performance? You could validate it by adding some dummy sleeps in the code and see if they show up in benchmark results.

@ahrtr
Copy link
Member

ahrtr commented Jan 26, 2024

Have we expected lack of performance difference between #17281 and #17290 or is it a surprise?

In theory, #17281 should have the best performance, and #17290 have the worst performance, and main branch should be in between.

#17281 > main > #17290

So previous test result provided by @siyuanfoundation doesn't comply with our understanding. Probably there indeed isn't much performance difference. It's still better if we can double confirm,

@siyuanfoundation
Copy link
Contributor

if !ok {
			delete(txw.buckets, k)
			wb.dedupe()
			txr.buckets[k] = wb
			continue
		}

is only called when the bucket does not exist yet in the read buffer, which is pretty rare.
So I don't really expect any noticeable differences between main and #17290.

@ahrtr
Copy link
Member

ahrtr commented Jan 29, 2024

is only called when the bucket does not exist yet in the read buffer, which is pretty rare

Why it's rare? Please read #17247 (comment) and try the two proposed ways to double confirm the performance impact.

@serathius
Copy link
Member Author

Can we close the issue?

@serathius
Copy link
Member Author

Meant to ask if there is anything else that needs to be addressed before I close the issue, but that works too :P.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/robustness-testing priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. type/bug
Development

No branches or pull requests

6 participants