Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How about make configuration changes effective on received instead of applied #11284

Closed
hicqu opened this issue Oct 22, 2019 · 4 comments
Closed

Comments

@hicqu
Copy link
Contributor

hicqu commented Oct 22, 2019

Hi, I'm currently writting some code to make configuration changes effective
on received instead of applied in raft-rs.
The main reason is to fix some potential bugs. For example,
The reason we do it is:

  1. Suppose there is a Raft cluster with peers A, B, C and A is the leader.
    So we can propose RemoveNode(C) and RemoveNode(B) one by oen on A,
    and both of them can be committed by the Raft cluster, and committed by A.
    However both B and C could have not applied 2 configuration changes,
    so that they still think the cluster contains A, B and C. So B and C
    may elect a new leader beside A, which means there are 2 leader in 1 term.

  2. Suppose there is a Raft cluster with peers A, B, C and D, and A is the leader.
    So we an propose RemoveNode(D), then waiting for the proposal is committed (or applied)
    to ACK the proposer. After the proposer is ACKed, he will think the cluster currently
    only has 3 nodes, so the cluster should be available when A is isolated. However
    B and C could still think the cluster contains A, B, C and D because the configuration
    change has not been applied, in which case B and C can't elect a new leader forever.

And we are pleasure if we can port the behavior to etcd. Do you think it's necessary?

@hicqu
Copy link
Contributor Author

hicqu commented Oct 22, 2019

@tbg PTAL thanks!

@tbg
Copy link
Contributor

tbg commented Oct 22, 2019

Hi @hicqu, thanks for your proposal. This was discussed on #7625. In the end we decided that sticking with apply-time configuration changes (i.e. etcd today) was the more prudent option. Due to this library's use in various applications it is prohibitively difficult to make the change you're suggesting at this point (you need interop between the old and the new versions, plus forklift API changes into all of the apps importing etcd/raft), though I agree that it's desirable when considered in isolation. Your example 1) is addressed by the discussion in the thread (never ack a second conf change until syncing the application of the first - not enforced yet though it's really hard to get this counter-example to work in practice) and 2) unfortunately is a real limitation that I have no good workaround for.
Let's continue the discussion in #11284.

@tbg tbg closed this as completed Oct 22, 2019
@hicqu
Copy link
Contributor Author

hicqu commented Oct 22, 2019

@tbg thanks for your replay! Seems you worry about rolling update a Raft cluster. How about making the new behavior as an option? Only if it's enabled configuration changes become effective on received, otherwize on applied, just like now. And we can add a rolling update interface on Storage interface to document how to update a cluster to enable the feature. According to our research, the update is safe, what applications need to do is just read all received but not applied Raft logs and read configuration changes in them when restart.

@BusyJay
Copy link
Contributor

BusyJay commented Jul 1, 2020

Another limitation of taking effect on applied is it prevents practical rollback of joint consensus when only quorum are alive. For example, when removing C from ABCDE, and DE are isolated, the removal can't succeed when using joint consensus apparently. But there is no way to rollback the change as no logs can be committed any more, the cluster is stuck.

Related discussion tikv/raft-rs#192

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants