-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unstable leadership when running server is demoted/removed without its participation #524
Comments
Thank you for reporting this and for the detailed reproduction @szechuen ! Going though those logs, I see this log line:
which I presume is autopilot adding the server back to the cluster as a non voter and the cluster became unstable when we try to replicate to it which happen over and over again and lead to the cluster staying unstable. I'm wondering if it would be sufficient to add a check similar to the one we added in #477 when checking for term in the replication routine here.That would prevent the election from happening as it does not make sense to start an election when a non voter is encountered with a higher term, that non voter can't participate in the election anyway. |
Would Update: Also Lines 1378 to 1381 in 1075246
|
Confirmed that Nomad OSS/Enterprise >= v1.3.3 is also affected and updated https://github.com/szechuen/raft-524 to reproduce the issue. Its preconditions are similar to Consul's https://www.nomadproject.io/docs/configuration/autopilot |
@szechuen Hi 👋 After reading your description of how Vault behaves, I believe hashicorp/vault#17019 might mitigate this somewhat, at least from the perspective that |
Yep hashicorp/vault#17019 mitigates the issue for Vault, or you could toggle |
We have an open issue in the private raft-autopilot-enterprise repo (https://github.com/hashicorp/raft-autopilot-enterprise/issues/5) that relates to this. I intend to tackle it later, once I've got the quick fix for Vault merged. I don't claim to have fully digested everything in this here issue, so I'm not saying it covers all the concerns raised, just wanted to add this detail for now. |
Yes the term should be reset, I think. The idea is that NonVoters d should accept the term that the cluster agree upon as they don't have a voting right they should not be able to destabilize the cluster as non voters won't be able to win an election. I've put a PR up that have a test that reproduce the same behaviour and propose a fix (#525) |
Verified that #526 resolves the behavior in Consul and Nomad (can't test Vault since the primary behavior is only present on Enterprise). Consul
Nomad
|
@dhiaayachi Do you think we should keep the ticket open until pre-vote is done, or track that separately? |
Includes part of the fix for hashicorp/raft#524
* deps: update hashicorp/raft to v1.3.11 Includes part of the fix for hashicorp/raft#524 * cl: add changelog entry
(manual cherry pick of a0071f4) * deps: update hashicorp/raft to v1.3.11 Includes part of the fix for hashicorp/raft#524 * cl: add changelog entry
(manual cherry-pick of a0071f4) * deps: update hashicorp/raft to v1.3.11 Includes part of the fix for hashicorp/raft#524 * cl: add changelog entry
(manual cherry-pick of a0071f4) * deps: update hashicorp/raft to v1.3.11 Includes part of the fix for hashicorp/raft#524 * cl: add changelog entry Co-authored-by: Seth Hoenig <[email protected]>
(manual cherry pick of a0071f4) * deps: update hashicorp/raft to v1.3.11 Includes part of the fix for hashicorp/raft#524 * cl: add changelog entry Co-authored-by: Seth Hoenig <[email protected]>
Closing this as we have a workaround in place. Pre vote could be tracked in the relevant open PR #530 |
Hi 👋
We have discovered an issue with the library that results in unstable leadership when running server is demoted/removed without its participation, which we have managed to reasonably reproduce in Consul OSS/Enterprise >= v1.13.0 and Vault Enterprise >= v1.11.0. We believe the livelock behavior was uncovered by the logically correct check introduced in #477 and released as part of Raft v1.3.7.
Reproduction: https://github.com/szechuen/raft-524
Precondition
A running server is demoted or removed from the cluster without its participation. Realistically, this is likely to occur when a server is temporarily disconnected from the rest of the cluster and automatically demoted/removed by the remaining servers via a process like raft-autopilot which both Consul and Vault utilizes before being reconnected.
Consul
As of Consul v1.13.x, the behavior is tied to the
CleanupDeadServers
option (default:true
) offered by raft-autopilot which when invoked removes all failed servers subject to the configuredMinQuorum
(default:0
). This is triggered onceLastContactThreshold
(default:200ms
) is exceeded and the server is considered unhealthy.Vault
As of Vault v1.11.x, the
cleanup_dead_servers
option isfalse
by default and even when enabled defaults to the much higherdead_server_last_contact_threshold
of24h
, hence less likely to occur with reasonable configuration.However, the Automated Upgrades Autopilot feature was introduced in the Enterprise variant which is enabled by default (
disable_upgrade_migration
, default:false
) and automatically demotes any server with anUpgradeVersionTag
not matching the majority of the servers to a non-voter. There must be a leadership transfer during the loss of connectivity for the last known information to be reset and thus triggering the behavior. This is ensured when the server cut off is the existing leader.Behavior
All log samples reference the Consul and Vault issue reproduction linked above with a 3-node (
node0
,node1
,node2
) cluster setup wherenode0
is the initial node bootstrapped (hence existing leader) and disconnected from the remaining nodes for a period of 30s after the cluster has been formed.First, ensure that the cluster has stabilized and the remaining nodes have been promoted to voters if the
ServerStabilizationTime
Autopilot config is set (default:10s
for both Consul and Vault).Next,
node0
is disconnected from the cluster for 30s during which it would be either demoted to a non-voter (Vault) or removed as a server entirely (Consul). Sincenode0
can't learn about the configuration change and hear from the leader, it would continuously attempt to start an election and request for votes as a candidate with progressively higher terms.Once
node0
is reconnected, we see that it continues to attempt starting an election and request for votes as a candidate since it does not learn about the configuration change. The check introduced in #477 makes the remaining nodes ignore the vote request but becausenode0
has a higher term, they abandon their quorum and step down to followers before restarting elections. However, the terms onnode1
/node2
are not updated to the highest term seen (fromnode0
) and the next round of elections continues to be made with a lower term than that ofnode0
causing the process to repeat all over again.Note that prior to #477,
node1
/node2
would have accepted the vote request fromnode0
despite not being a voter and elected it as the new leader which while incorrect would have allowed the cluster to make forward progress.The result of which is a cluster that is henceforth trapped in a livelock and never able to regain stable leadership until
node0
is reset.The text was updated successfully, but these errors were encountered: