-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
membership: prevent quorum loss from membership change #6420
Comments
@sinsharat Try specifying |
@gyuho my issue is if by mistake a member adds a new member by specifying an invalid url by mistake, then the entire cluster becomes not usable. If i try to do anything, even trying to remove the newly added member, it will fail since the single cluster which is up is constantly getting into election and not handling any request. |
Now I understand your issue. Yeah I think it makes sense to have some timeout for membership change for the case you mentioned. Most cases, people still have quorum (e.g. add 1 member to 2-node cluster), so they can revert the membership change. But adding 1 member to 1-node cluster can be problematic. Defer to @xiang90 @heyitsanthony |
@sinsharat Autoremove for 1->2 means the original node will have to make a membership change without quorum. Too easy to get into a split brain mess. Could the etcdserver probe the peer for the 1->2 case to see if the new member is up before submitting the membership change? |
@heyitsanthony yes i totally agree that would be fine if the member doesn't get added blindly but keeps trying and once its able to contact the member then only the member gets added to cluster. That would prevent the cluster from getting un-usable. Even though i mentioned about two cluster system, but this scenerio which can happen when a three node cluster is intended and to a single member two new cluster is intended to be added. |
@heyitsanthony Do you have an easy solution in mind? Member must be added first to the existing node, in order to pass So in current implementation, probing a new member before committing the membership change is impossible. |
@gyuho no quick fix in mind. That boottrap path would have to be changed a little for my suggestion to work. I believe (not 100% sure) the new node could force add itself to the remote peer list on validate, then start running without corrupting the cluster even if it hasn't been added yet. |
I can see an easy fix here for 1 -> 2 case. The one member cluster can commit any proposal locally and immediately. So what we can do here is to let the one member cluster buffer the conf change and commit it once the 2nd member contact it. One problem is that the member might forget the configuration change request after restart. To fix that, we need to persist the buffer somehow. This makes the problem more complicated. But I think we can do the 1st step to start with. |
How would we know know this? Raft progress? |
No. The newly member will contact one existing member to get a list of existing members. For the one member case, it can only contact that member. |
do not block our release. moving to 3.3 |
I'm likely misunderstanding, but don't Raft's cluster membership rules prevent this happening? In the example given originally, we wouldn't have a strict majority of the new (2 member) cluster and thus the cluster membership change must fail. |
Correct. Once you add a new member to a single node cluster, the quorum number becomes 2. That is why the We want to prevent such |
Moving to v3.5. We still have many others things planned for v3.4. |
Would another viable solution for the 1->2 problem be to have server 1 first add server 2 as a non-voting member, replicate log entries to server 2 until server 2 is up to date, and only then complete the 1->2 quorum transition? This is the approach the Raft paper recommends for addressing the challenge of new servers not having the full log, and thus being unable to accept new entries. I think this same approach would also solve the present issue of a new server being unavailable due to misconfiguration. |
@maxenglander I agree. Non-voting member should solve this issue. @gyuho @xiang90 Can we close this issue? Or do we want to fix this issue without using the non-voting member feature? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Steps to reproduce:
This started listening to localhost:2379 for client urls and for peer urls at localhost:2380
.\etcd.exe --initial-advertise-peer-urls http://localhost:22380 --listen-peer-ur
ls http://localhost:22380 --advertise-client-urls http://localhost:22379 --listen-client-urls http://localhost:22379
Got output as:
Member 6cba1075ac9d26b5 added to cluster cdf818194e3a8c32
./etcdctl member update 6cba1075ac9d26b5 --peer-urls=https://localhost:22380
But getting the below error:
Error: context deadline exceeded
Let me know if i have understood the feature wrongly and what i need to do it to make it correct.
Thanks.
The text was updated successfully, but these errors were encountered: