-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only primary with slots has the right to mark a node as failed #634
Conversation
In markNodeAsFailingIfNeeded we will count needed_quorum and failures, needed_quorum is the half the cluster->size and plus one, and cluster-size is the size of primary node which contain slots, but when counting failures, we dit not check if primary has slots. Signed-off-by: Binbin <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #634 +/- ##
============================================
- Coverage 70.19% 70.16% -0.03%
============================================
Files 110 110
Lines 60049 60050 +1
============================================
- Hits 42149 42135 -14
- Misses 17900 17915 +15
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. Thanks!
The cluster spec mentions that the majority of primaries need to flag it as PFAIL to become FAIL. We need to update that text. It needs to count only the primaries with the right to vote, which are the ones counted in the cluster size, as you mentioned.
I think this bug can cause a minority of voting primaries with help from empty primaries to trigger a failover election. If this minority is in a netsplit, the failover can't succeed and the replicas will try to get elected over and over.
Technically, the detection of nodes with or without slots are equally effective, so I don't think it is a good idea to just ignore the votes of nodes with no slots. Perhaps we can change the definition of needed_quorum to half of the total count of primary nodes plus one. In this case, we don't need to make sure slots are distributed among odd number of nodes, we can distribute slots among even number of nodes and add one extra node to ensure voting is effective. |
i remember @zuiderkwast had a voting replicas idea? if i remember it correctly.
In Tencent Cloud, we have a similar design, we flag a node as an arbiter node, it is a empty primary (don't has slots), it has the right to vote. so in a one shard cluster (or in non-cluster mode, we converted the primary-replica mode to a single-shard cluster with 16384 slots), we will doing this: one shard (16384 slots), two arbiter nodes (0 slots with the right to vote), so it can meet to the quorum to do the failover. i think i can publish it if your guys are interested in adding it or we can just add a new configuration and then make the empty primary has the right to vote. |
Once a cluster is in steady state, nodes can be added without consensus, which means that nodes would disagree about the quorum size at a given epoch when nodes are getting added. We need a way to identify that is a node is part of the cluster config at a given epoch, which is done today when we migrate a slot into the new primary.
I would rather have some type of config that indicates a node is able to participate in the quorum and it's attached to the epoch. I don't like the idea of having primaries without slots just arbitrarily participate. It's really no different than just letting everyone vote. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Viktors minor suggestion, but the general change seems good to me.
Signed-off-by: Binbin <[email protected]>
I don't think we consult with the epochs when computing the quorum size currently. The dynamic quorum issue is a real problem of the current cluster design/implementation (even if we count non-empty primaries only).
+1. I like this idea. We should explore further next :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general.
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]> Co-authored-by: Ping Xie <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good fix! Good test case!
What to mention in release notes? Shall we mark it as a bugfix?
We do.. We include a node in the quorum if it has slots and has an epoch greater than our own. On the flip side the primary receiving the request will compare epochs to see if it can vote. |
Mentioning it's a bugfix where nodes not in the quorum group might spuriously mark nodes as failed. |
…y-io#634) In markNodeAsFailingIfNeeded we will count needed_quorum and failures, needed_quorum is the half the cluster->size and plus one, and cluster-size is the size of primary node which contain slots, but when counting failures, we dit not check if primary has slots. Only the primary has slots that has the rights to vote, adding a new clusterNodeIsVotingPrimary to formalize this concept. Release notes: bugfix where nodes not in the quorum group might spuriously mark nodes as failed --------- Signed-off-by: Binbin <[email protected]> Co-authored-by: Ping Xie <[email protected]>
…y-io#634) In markNodeAsFailingIfNeeded we will count needed_quorum and failures, needed_quorum is the half the cluster->size and plus one, and cluster-size is the size of primary node which contain slots, but when counting failures, we dit not check if primary has slots. Only the primary has slots that has the rights to vote, adding a new clusterNodeIsVotingPrimary to formalize this concept. Release notes: bugfix where nodes not in the quorum group might spuriously mark nodes as failed --------- Signed-off-by: Binbin <[email protected]> Co-authored-by: Ping Xie <[email protected]>
…y-io#634) In markNodeAsFailingIfNeeded we will count needed_quorum and failures, needed_quorum is the half the cluster->size and plus one, and cluster-size is the size of primary node which contain slots, but when counting failures, we dit not check if primary has slots. Only the primary has slots that has the rights to vote, adding a new clusterNodeIsVotingPrimary to formalize this concept. Release notes: bugfix where nodes not in the quorum group might spuriously mark nodes as failed --------- Signed-off-by: Binbin <[email protected]> Co-authored-by: Ping Xie <[email protected]> Signed-off-by: Ping Xie <[email protected]>
In markNodeAsFailingIfNeeded we will count needed_quorum and failures, needed_quorum is the half the cluster->size and plus one, and cluster-size is the size of primary node which contain slots, but when counting failures, we dit not check if primary has slots. Only the primary has slots that has the rights to vote, adding a new clusterNodeIsVotingPrimary to formalize this concept. Release notes: bugfix where nodes not in the quorum group might spuriously mark nodes as failed --------- Signed-off-by: Binbin <[email protected]> Co-authored-by: Ping Xie <[email protected]> Signed-off-by: Ping Xie <[email protected]>
In markNodeAsFailingIfNeeded we will count needed_quorum and failures,
needed_quorum is the half the cluster->size and plus one, and cluster-size
is the size of primary node which contain slots, but when counting
failures, we dit not check if primary has slots.
Only the primary has slots that has the rights to vote, adding a new
clusterNodeIsVotingPrimary to formalize this concept.
Release notes: