Adding replacement member before removing #6114

michael-px · 2016-08-05T17:04:46Z

etcd 2.3.7

We had an outage and were able to resurrect the dead node and have it rejoin the 3-node cluster. But it's the "leader" and is still having problems. This is production and we can't take the cluster down to migrate to a new one.

How can I change the "isLeader" designation to another node in the cluster that's more healthy?

heyitsanthony · 2016-08-05T17:26:29Z

This is in the raft machinery but it's not yet supported on the etcd level. There's a similar PR for this functionality, #6038. Cycling the etcd leader would force an election with the chance of selecting a different leader, but the cluster will reject proposals until the election completes.

xiang90 · 2016-08-09T23:04:19Z

@michael-px Does @heyitsanthony's reply answer your question?

michael-px · 2016-08-09T23:18:51Z

Not satisfactorily. No.

Seems I can't change the leader. So when it goes down (or I take it down),
how do I maintain a healthy cluster if I have to remove a node first
instead of adding a node. This whole approach is bass-ackwards.

Like the famous philosopher said, when you stare into the void, the void
stares also; but if you cast into the void, you get a type conversion
error. Either way lies madness. -- Charles Stross
The pure and simple truth is rarely pure and never simple.--Oscar Wilde

On Tue, Aug 9, 2016 at 4:04 PM, Xiang Li [email protected] wrote:

@michael-px https://github.com/michael-px Does @heyitsanthony
https://github.com/heyitsanthony's reply answer your question?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6114 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ARbCQTn9dsZw5al9yAj3FN0D1-AUm1Tcks5qeQePgaJpZM4Jd4DU
.

xiang90 · 2016-08-09T23:22:50Z

@michael-px

Please keep the discussion technical here.

If a leader goes down, it will become follower AUTOMATICALLY. That is how raft works. If you want to transfer the leadership from a healthy member to another, it is not supported right now. There is an ongoing effort to make it possible. But it would be great to know why you want to do it.

michael-px · 2016-08-09T23:25:37Z

Because I want to remove the leader from the cluster. It’s a sick puppy and
I can’t add a node first. I have to remove the leader, create an
unhealthy cluster, then try to add a node to an unhealthy cluster.

As I said — bass-ackwards.

Like the famous philosopher said, when you stare into the void, the void
stares also; but if you cast into the void, you get a type conversion
error. Either way lies madness. -- Charles Stross
The pure and simple truth is rarely pure and never simple.--Oscar Wilde

On Tue, Aug 9, 2016 at 4:23 PM, Xiang Li [email protected] wrote:

@michael-px https://github.com/michael-px

Please keep the discussion technical here.

If a leader goes down, it will become follower AUTOMATICALLY. That is how
raft works. If you want to transfer the leadership from a healthy member to
another, it is not supported right now. There is an ongoing effort to make
it possible. But it would be great to know why you want to do it.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6114 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ARbCQVBEPR1H3WJOha-YRE3fRztpt4qeks5qeQvkgaJpZM4Jd4DU
.

heyitsanthony · 2016-08-10T01:42:25Z

@michael-px OK, here's the reasoning behind removing a member before adding its replacement. I believe it is straightforward, and hardly backwards, but please correct me if I'm wrong.

etcd employs distributed consensus based on a quorum model; (n+1)/2 members, a majority, must agree on a proposal before it can be committed to the cluster. These proposals include key-value updates and membership changes. This model totally avoids any possibility of split brain inconsistency. The downside is permanent quorum loss is catastrophic.

How this applies to membership:

If a 3-member cluster has 1 downed member, it can still make forward progress because the quorum is 2 and 2 members are still live. However, adding a new member to a 3-member cluster will increase the quorum to 3 because 3 votes are required for a majority of 4 members. Since the quorum increased, this extra member buys nothing in terms of fault tolerance; the cluster is still one node failure away from being unrecoverable.

Additionally, that new member is risky because it may turn out to be misconfigured or incapable of joining the cluster. In that case, there's no way to recover quorum because the cluster has two members down and two members up, but needs three votes to change membership to undo the botched membership addition. etcd will by default (as of last week) reject member add attempts that could take down the cluster in this manner.

On the other hand, if the downed member is removed from cluster membership first, the number of members becomes 2 and the quorum remains at 2. Following that removal by adding a new member will also keep the quorum steady at 2. So, even if the new node can't be brought up, it's still possible to remove the new member through quorum on the remaining live members.

Some remarks on general cluster health:

How does removing a flaky member from the cluster membership make the cluster unhealthy? The cluster will automatically elect a new leader so long as there's a quorum of members active. At worst, the cluster will not be able to accept proposals for the second or two it takes to run an election cycle (non-quorum reads would still work). Regardless, that brief leadership loss will eventually be seamless as per the pending PR I mentioned.

Is the problem that removing a member before adding a new one will temporarily reduce the number of tolerable failures? If all cluster members are active, it should be fine to add a new member; the number of tolerable failures will remain the same or increase by one, depending on cluster parity. If the new member can't be brought up, that counts as a failure, but there will be enough active members so it can still be repaired by removing the member from the cluster.

I'm surprised that the current leader member is being always being re-elected after its etcd server is cycled; the elections are randomized. I could help more if I could take a look at the server logs.

michael-px · 2016-08-10T15:54:29Z

This is good stuff. It should be added to the documentation.

If the developers had read this first, maybe they wouldn’t have said “I don’t care what the web site says, add a member first.”

On Aug 9, 2016, at 6:42 PM, Anthony Romano [email protected] wrote:

@michael-px https://github.com/michael-px OK, here's the reasoning behind removing a member before adding its replacement. I believe it is straightforward, and hardly backwards, but please correct me if I'm wrong.

etcd employs distributed consensus based on a quorum model; (n+1)/2 members, a majority, must agree on a proposal before it can be committed to the cluster. These proposals include key-value updates and membership changes. This model totally avoids any possibility of split brain inconsistency. The downside is permanent quorum loss is catastrophic.

How this applies to membership:

If a 3-member cluster has 1 downed member, it can still make forward progress because the quorum is 2 and 2 members are still live. However, adding a new member to a 3-member cluster will increase the quorum to 3 because 3 votes are required for a majority of 4 members. Since the quorum increased, this extra member buys nothing in terms of fault tolerance; the cluster is still one node failure away from being unrecoverable.

Additionally, that new member is risky because it may turn out to be misconfigured or incapable of joining the cluster. In that case, there's no way to recover quorum because the cluster has two members down and two members up, but needs three votes to change membership to undo the botched membership addition. etcd will by default (as of last week) reject member add attempts that could take down the cluster in this manner.

On the other hand, if the downed member is removed from cluster membership first, the number of members becomes 2 and the quorum remains at 2. Following that removal by adding a new member will also keep the quorum steady at 2. So, even if the new node can't be brought up, it's still possible to remove the new member through quorum on the remaining live members.

Some remarks on general cluster health:

How does removing a flaky member from the cluster membership make the cluster unhealthy? The cluster will automatically elect a new leader so long as there's a quorum of members active. At worst, the cluster will not be able to accept proposals for the second or two it takes to run an election cycle (non-quorum reads would still work). Regardless, that brief leadership loss will eventually be seamless as per the pending PR I mentioned.

Is the problem that removing a member before adding a new one will temporarily reduce the number of tolerable failures? If all cluster members are active, it should be fine to add a new member; the number of tolerable failures will remain the same or increase by one, depending on cluster parity. If the new member can't be brought up, that counts as a failure, but there will be enough active members so it can still be repaired by removing the member from the cluster.

I'm surprised that the current leader member is being always being re-elected after its etcd server is cycled; the elections are randomized. I could help more if I could take a look at the server logs.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #6114 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/ARbCQRy1JnLXpYcVeE0R_wdbs3NCqVTJks5qeSybgaJpZM4Jd4DU.

Like the famous philosopher said, when you stare into the void,
the void stares also; but if you cast into the void, you get a
type conversion error. Either way lies madness. -- Charles Stross

xiang90 · 2016-08-12T19:19:54Z

@michael-px Anything more we can help on this issue?

michael-px · 2016-08-12T19:37:21Z

no close it.

Like the famous philosopher said, when you stare into the void, the void
stares also; but if you cast into the void, you get a type conversion
error. Either way lies madness. -- Charles Stross
The pure and simple truth is rarely pure and never simple.--Oscar Wilde

On Fri, Aug 12, 2016 at 12:20 PM, Xiang Li [email protected] wrote:

@michael-px https://github.com/michael-px Anything more we can help on
this issue?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6114 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ARbCQayoX-3zY0NPwrVuOprPPW0zePjcks5qfMd0gaJpZM4Jd4DU
.

Copy Anthony's answer from: etcd-io#6103 etcd-io#6114

xiang90 closed this as completed Aug 12, 2016

gyuho changed the title ~~change isLeader node~~ adding replacement member before removing Sep 29, 2016

gyuho changed the title ~~adding replacement member before removing~~ Adding replacement member before removing Sep 29, 2016

gyuho added a commit to gyuho/etcd that referenced this issue Dec 16, 2016

Documentation: add FAQs on membership operation

851b0bb

Copy Anthony's answer from: etcd-io#6103 etcd-io#6114

gyuho mentioned this issue Dec 16, 2016

Documentation: add FAQs on membership operation #7028

Merged

gyuho added a commit to gyuho/etcd that referenced this issue Dec 16, 2016

Documentation: add FAQs on membership operation

2f0e82a

Copy Anthony's answer from: etcd-io#6103 etcd-io#6114

gyuho mentioned this issue Jan 17, 2017

etcdctl member list ==> Failed to get leader: client: etcd cluster is unavailable or misconfigured #7171

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding replacement member before removing #6114

Adding replacement member before removing #6114

michael-px commented Aug 5, 2016

heyitsanthony commented Aug 5, 2016

xiang90 commented Aug 9, 2016

michael-px commented Aug 9, 2016

xiang90 commented Aug 9, 2016

michael-px commented Aug 9, 2016

heyitsanthony commented Aug 10, 2016

michael-px commented Aug 10, 2016

xiang90 commented Aug 12, 2016

michael-px commented Aug 12, 2016

Adding replacement member before removing #6114

Adding replacement member before removing #6114

Comments

michael-px commented Aug 5, 2016

heyitsanthony commented Aug 5, 2016

xiang90 commented Aug 9, 2016

michael-px commented Aug 9, 2016

xiang90 commented Aug 9, 2016

michael-px commented Aug 9, 2016

heyitsanthony commented Aug 10, 2016

michael-px commented Aug 10, 2016

xiang90 commented Aug 12, 2016

michael-px commented Aug 12, 2016