-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv/raft: force campaigning on leader removal should respect fortification #129796
Labels
A-kv-replication
Relating to Raft, consensus, and coordination.
A-leader-leases
Related to the introduction of leader leases
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Comments
nvanbenschoten
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-kv-replication
Relating to Raft, consensus, and coordination.
T-kv
KV Team
A-leader-leases
Related to the introduction of leader leases
labels
Aug 28, 2024
nvanbenschoten
changed the title
kv/raft: handle fortified leader step down
kv/raft: force campaigning on leader removal should respect fortification
Sep 11, 2024
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Sep 13, 2024
Informs cockroachdb#129796. This commit is a natural continuation of 7f37eaa (from 5 years ago). The commit made it so that the replicateQueue would never issue a ChangeReplicas request that removed a voter replica. Instead, the replicate queue would always demote the voter replica to a learner replica first. This commit makes it so that the ChangeReplicasTrigger code does not even know how to remove a voter replica, it only knows how to demote them. This prevents abuse and strengthens the guarantee that we will never perform such unsafe configuration changes. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Sep 13, 2024
Informs cockroachdb#129796. This commit makes it so that the confChangeImpl code will never create a raftpb.ConfChange(,V2) with a direct voter removal. Instead, it will always demote the voter replica to a learner replica first. Release note: None
craig bot
pushed a commit
that referenced
this issue
Sep 18, 2024
130660: kv: don't remove voter replicas in ChangeReplicasTrigger r=nvanbenschoten a=nvanbenschoten Informs #129796. This commit is a natural continuation of 7f37eaa (from 5 years ago). The commit made it so that the replicateQueue would never issue a ChangeReplicas request that removed a voter replica. Instead, the replicate queue would always demote the voter replica to a learner replica first. This commit makes it so that the ChangeReplicasTrigger code does not even know how to remove a voter replica, it only knows how to demote them. This prevents abuse and strengthens the guarantee that we will never perform such unsafe configuration changes. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Sep 26, 2024
Informs cockroachdb#129796. Informs cockroachdb#125355. This commit is a natural continuation of 7f37eaa (from 5 years ago). The commit made it so that the replicateQueue would never issue a ChangeReplicas request that removed a voter replica. Instead, the replicate queue would always demote the voter replica to a learner replica first. This commit makes it so that raft will no longer allow direct voter removal in a proposed configuration change. Instead, it requires the voter (who may be the leader) to first be demoted to a learner. This prevents abuse and strengthens the guarantee that we will never perform such unsafe configuration changes. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Sep 26, 2024
Informs cockroachdb#129796. Informs cockroachdb#125355. Perform the same assertion as we have in pkg/raft, but for cases where DisableConfChangeValidation is set to true. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Oct 7, 2024
Informs cockroachdb#129796. Informs cockroachdb#125355. This commit is a natural continuation of 7f37eaa (from 5 years ago). The commit made it so that the replicateQueue would never issue a ChangeReplicas request that removed a voter replica. Instead, the replicate queue would always demote the voter replica to a learner replica first. This commit makes it so that raft will no longer allow direct voter removal in a proposed configuration change. Instead, it requires the voter (who may be the leader) to first be demoted to a learner. This prevents abuse and strengthens the guarantee that we will never perform such unsafe configuration changes. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Oct 7, 2024
Informs cockroachdb#129796. Informs cockroachdb#125355. Perform the same assertion as we have in pkg/raft, but for cases where DisableConfChangeValidation is set to true. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Oct 8, 2024
Informs cockroachdb#129796. Informs cockroachdb#125355. This commit is a natural continuation of 7f37eaa (from 5 years ago). The commit made it so that the replicateQueue would never issue a ChangeReplicas request that removed a voter replica. Instead, the replicate queue would always demote the voter replica to a learner replica first. This commit makes it so that raft will no longer allow direct voter removal in a proposed configuration change. Instead, it requires the voter (who may be the leader) to first be demoted to a learner. This prevents abuse and strengthens the guarantee that we will never perform such unsafe configuration changes. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Oct 8, 2024
Informs cockroachdb#129796. Informs cockroachdb#125355. Perform the same assertion as we have in pkg/raft, but for cases where DisableConfChangeValidation is set to true. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Oct 8, 2024
Informs cockroachdb#129796. Informs cockroachdb#125355. This commit is a natural continuation of 7f37eaa (from 5 years ago). The commit made it so that the replicateQueue would never issue a ChangeReplicas request that removed a voter replica. Instead, the replicate queue would always demote the voter replica to a learner replica first. This commit makes it so that raft will no longer allow direct voter removal in a proposed configuration change. Instead, it requires the voter (who may be the leader) to first be demoted to a learner. This prevents abuse and strengthens the guarantee that we will never perform such unsafe configuration changes. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Oct 8, 2024
Informs cockroachdb#129796. Informs cockroachdb#125355. Perform the same assertion as we have in pkg/raft, but for cases where DisableConfChangeValidation is set to true. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Oct 14, 2024
Informs cockroachdb#129796. Informs cockroachdb#125355. This commit is a natural continuation of 7f37eaa (from 5 years ago). The commit made it so that the replicateQueue would never issue a ChangeReplicas request that removed a voter replica. Instead, the replicate queue would always demote the voter replica to a learner replica first. This commit makes it so that raft will no longer allow direct voter removal in a proposed configuration change. Instead, it requires the voter (who may be the leader) to first be demoted to a learner. This prevents abuse and strengthens the guarantee that we will never perform such unsafe configuration changes. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Oct 14, 2024
Informs cockroachdb#129796. Informs cockroachdb#125355. Perform the same assertion as we have in pkg/raft, but for cases where DisableConfChangeValidation is set to true. Release note: None
craig bot
pushed a commit
that referenced
this issue
Oct 15, 2024
131445: raft: forbid direct voter removal without demotion r=nvanbenschoten a=nvanbenschoten Informs #129796. Informs #125355. This commit is a natural continuation of 7f37eaa (from 5 years ago). The commit made it so that the replicateQueue would never issue a ChangeReplicas request that removed a voter replica. Instead, the replicate queue would always demote the voter replica to a learner replica first. This commit makes it so that raft will no longer allow direct voter removal in a proposed configuration change. Instead, it requires the voter (who may be the leader) to first be demoted to a learner. This prevents abuse and strengthens the guarantee that we will never perform such unsafe configuration changes. Release note: None 132349: server: Fix NPE in spanStatsFanOut r=kyle-a-wong a=kyle-a-wong a NPE was surfaced when doing a spanstats request on an unfinalized (mixed version) cluster. To fix, defensive checks are added to defend against potential nil responses and references to non-existant map entries for a request span stat Fixes: #132130 Release note (bug fix): Fixes a bug where a span stats request on a mixed version cluster resulted in an NPE 132574: tablemetadatacache: fix TestTableMetadataUpdateJobProgressAndMetrics r=kyle-a-wong a=kyle-a-wong This test has a high degree of variability in run time, likely due to the generation for so many tables. To fix, new tables are no longer generated in the test. Instead, the batch size for the table metadata update job has is lowered to effectively trigger the same behavior Epic: [CRDB-37558](https://cockroachlabs.atlassian.net/browse/CRDB-37558) Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Kyle Wong <[email protected]>
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Oct 23, 2024
Informs cockroachdb#129796. This commit addresses a TODO to transfer raft leadership away when the leader steps down to a learner through the application of a configuration change. The leader now ask the follower with the largest Match to campaign immediately with a TimeoutNow to avoid interruption. This comes as a safer, easier to understand alternative to "force campaigning" on the leaseholder after applying a configuration change which has demoted the leader to a learner. The force campaign approach is more complex, is a layering violation, may not work if the leaseholder isn't told about the larger applied index before the leader steps down, and poses problems for fortification and leader leases. Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-replication
Relating to Raft, consensus, and coordination.
A-leader-leases
Related to the introduction of leader leases
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Extracted from #125254.
The remaining use of manual campaigning is in
forceCampaignLocked
. However, this use is special in that it uses a "forced" campaign throughMsgTimeoutNow
. It does so because it's trying to handle the case where the raft leader has stepped down or been removed from a range.In practice, this may be ok today because we will only call
forceCampaignLocked
as the leaseholder-not-leader. Still, if a leader is fortified, we probably want to handle it a different way. Since we only demote leaders to voters and never remove them immediately, there may be a better way to handle this inside of raft, around this logic:cockroach/pkg/raft/raft.go
Lines 2176 to 2191 in e243814
Jira issue: CRDB-41715
Epic CRDB-37522
The text was updated successfully, but these errors were encountered: