Only primary with slots has the right to mark a node as failed #634

enjoy-binbin · 2024-06-12T08:00:26Z

In markNodeAsFailingIfNeeded we will count needed_quorum and failures,
needed_quorum is the half the cluster->size and plus one, and cluster-size
is the size of primary node which contain slots, but when counting
failures, we dit not check if primary has slots.

Only the primary has slots that has the rights to vote, adding a new
clusterNodeIsVotingPrimary to formalize this concept.

Release notes:

bugfix where nodes not in the quorum group might spuriously mark nodes as failed

In markNodeAsFailingIfNeeded we will count needed_quorum and failures, needed_quorum is the half the cluster->size and plus one, and cluster-size is the size of primary node which contain slots, but when counting failures, we dit not check if primary has slots. Signed-off-by: Binbin <[email protected]>

codecov · 2024-06-12T08:13:30Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.16%. Comparing base (5d9d418) to head (acfdbb8).
Report is 2 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #634      +/-   ##
============================================
- Coverage     70.19%   70.16%   -0.03%     
============================================
  Files           110      110              
  Lines         60049    60050       +1     
============================================
- Hits          42149    42135      -14     
- Misses        17900    17915      +15

Files	Coverage Δ
src/cluster_legacy.c	`85.80% <100.00%> (+0.17%)`	⬆️

... and 7 files with indirect coverage changes

zuiderkwast

This makes sense. Thanks!

The cluster spec mentions that the majority of primaries need to flag it as PFAIL to become FAIL. We need to update that text. It needs to count only the primaries with the right to vote, which are the ones counted in the cluster size, as you mentioned.

I think this bug can cause a minority of voting primaries with help from empty primaries to trigger a failover election. If this minority is in a netsplit, the failover can't succeed and the replicas will try to get elected over and over.

src/cluster_legacy.c

CharlesChen888 · 2024-06-13T03:07:21Z

Technically, the detection of nodes with or without slots are equally effective, so I don't think it is a good idea to just ignore the votes of nodes with no slots.

Perhaps we can change the definition of needed_quorum to half of the total count of primary nodes plus one. In this case, we don't need to make sure slots are distributed among odd number of nodes, we can distribute slots among even number of nodes and add one extra node to ensure voting is effective.

enjoy-binbin · 2024-06-13T04:28:01Z

i remember @zuiderkwast had a voting replicas idea? if i remember it correctly.
it is a good idea that empty primary has the right to vote, but i think it is a another topic. and in this PR i want to fix the definition (count in cluster size).

we don't need to make sure slots are distributed among odd number of nodes, we can distribute slots among even number of nodes and add one extra node to ensure voting is effective.

In Tencent Cloud, we have a similar design, we flag a node as an arbiter node, it is a empty primary (don't has slots), it has the right to vote. so in a one shard cluster (or in non-cluster mode, we converted the primary-replica mode to a single-shard cluster with 16384 slots), we will doing this: one shard (16384 slots), two arbiter nodes (0 slots with the right to vote), so it can meet to the quorum to do the failover. i think i can publish it if your guys are interested in adding it or we can just add a new configuration and then make the empty primary has the right to vote.

madolson · 2024-06-13T04:46:45Z

Perhaps we can change the definition of needed_quorum to half of the total count of primary nodes plus one. In this case, we don't need to make sure slots are distributed among odd number of nodes, we can distribute slots among even number of nodes and add one extra node to ensure voting is effective.

Once a cluster is in steady state, nodes can be added without consensus, which means that nodes would disagree about the quorum size at a given epoch when nodes are getting added. We need a way to identify that is a node is part of the cluster config at a given epoch, which is done today when we migrate a slot into the new primary.

i think i can publish it if your guys are interested in adding it or we can just add a new configuration and then make the empty primary has the right to vote.

I would rather have some type of config that indicates a node is able to participate in the quorum and it's attached to the epoch. I don't like the idea of having primaries without slots just arbitrarily participate. It's really no different than just letting everyone vote.

madolson

Agree with Viktors minor suggestion, but the general change seems good to me.

Signed-off-by: Binbin <[email protected]>

PingXie · 2024-06-13T07:26:29Z

We need a way to identify that is a node is part of the cluster config at a given epoch, which is done today when we migrate a slot into the new primary.

I don't think we consult with the epochs when computing the quorum size currently. The dynamic quorum issue is a real problem of the current cluster design/implementation (even if we count non-empty primaries only).

I would rather have some type of config that indicates a node is able to participate in the quorum and it's attached to the epoch. I don't like the idea of having primaries without slots just arbitrarily participate. It's really no different than just letting everyone vote.

+1. I like this idea. We should explore further next :-)

PingXie

LGTM in general.

src/cluster_legacy.c

Signed-off-by: Binbin <[email protected]>

src/cluster_legacy.c

Signed-off-by: Binbin <[email protected]>

src/cluster_legacy.c

Signed-off-by: Binbin <[email protected]>

src/cluster_legacy.c

Signed-off-by: Binbin <[email protected]> Co-authored-by: Ping Xie <[email protected]>

zuiderkwast

Good fix! Good test case!

What to mention in release notes? Shall we mark it as a bugfix?

madolson · 2024-06-14T15:28:44Z

I don't think we consult with the epochs when computing the quorum size currently.

We do.. We include a node in the quorum if it has slots and has an epoch greater than our own. On the flip side the primary receiving the request will compare epochs to see if it can vote.

madolson · 2024-06-14T15:29:29Z

What to mention in release notes? Shall we mark it as a bugfix?

Mentioning it's a bugfix where nodes not in the quorum group might spuriously mark nodes as failed.

…y-io#634) In markNodeAsFailingIfNeeded we will count needed_quorum and failures, needed_quorum is the half the cluster->size and plus one, and cluster-size is the size of primary node which contain slots, but when counting failures, we dit not check if primary has slots. Only the primary has slots that has the rights to vote, adding a new clusterNodeIsVotingPrimary to formalize this concept. Release notes: bugfix where nodes not in the quorum group might spuriously mark nodes as failed --------- Signed-off-by: Binbin <[email protected]> Co-authored-by: Ping Xie <[email protected]>

…y-io#634) In markNodeAsFailingIfNeeded we will count needed_quorum and failures, needed_quorum is the half the cluster->size and plus one, and cluster-size is the size of primary node which contain slots, but when counting failures, we dit not check if primary has slots. Only the primary has slots that has the rights to vote, adding a new clusterNodeIsVotingPrimary to formalize this concept. Release notes: bugfix where nodes not in the quorum group might spuriously mark nodes as failed --------- Signed-off-by: Binbin <[email protected]> Co-authored-by: Ping Xie <[email protected]> Signed-off-by: Ping Xie <[email protected]>

In markNodeAsFailingIfNeeded we will count needed_quorum and failures, needed_quorum is the half the cluster->size and plus one, and cluster-size is the size of primary node which contain slots, but when counting failures, we dit not check if primary has slots. Only the primary has slots that has the rights to vote, adding a new clusterNodeIsVotingPrimary to formalize this concept. Release notes: bugfix where nodes not in the quorum group might spuriously mark nodes as failed --------- Signed-off-by: Binbin <[email protected]> Co-authored-by: Ping Xie <[email protected]> Signed-off-by: Ping Xie <[email protected]>

enjoy-binbin requested review from PingXie and madolson June 12, 2024 08:00

zuiderkwast approved these changes Jun 12, 2024

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

src/cluster_legacy.c Outdated Show resolved Hide resolved

madolson approved these changes Jun 13, 2024

View reviewed changes

madolson added the release-notes This issue should get a line item in the release notes label Jun 13, 2024

update comment

6845725

Signed-off-by: Binbin <[email protected]>

PingXie reviewed Jun 13, 2024

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

add a new clusterNodeIsVotingPrimary

897300f

Signed-off-by: Binbin <[email protected]>

zuiderkwast reviewed Jun 14, 2024

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

enjoy-binbin added 2 commits June 14, 2024 12:17

Merge remote-tracking branch 'upstream/unstable' into mark_fail

04cad4d

Signed-off-by: Binbin <[email protected]>

static inline int

1469477

Signed-off-by: Binbin <[email protected]>

PingXie reviewed Jun 14, 2024

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

code review from Ping

af2e92c

Signed-off-by: Binbin <[email protected]>

PingXie approved these changes Jun 14, 2024

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

Update src/cluster_legacy.c

acfdbb8

Signed-off-by: Binbin <[email protected]> Co-authored-by: Ping Xie <[email protected]>

zuiderkwast approved these changes Jun 14, 2024

View reviewed changes

zuiderkwast added the cluster label Jun 14, 2024

madolson merged commit db6d3c1 into valkey-io:unstable Jun 17, 2024
18 checks passed

enjoy-binbin deleted the mark_fail branch June 17, 2024 03:49

enjoy-binbin mentioned this pull request Jan 22, 2025

[NEW] Non-voting primaries, voting empty primaries, voting replicas #1600

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only primary with slots has the right to mark a node as failed #634

Only primary with slots has the right to mark a node as failed #634

enjoy-binbin commented Jun 12, 2024 •

edited by zuiderkwast

Loading

codecov bot commented Jun 12, 2024 •

edited

Loading

zuiderkwast left a comment

CharlesChen888 commented Jun 13, 2024

enjoy-binbin commented Jun 13, 2024

madolson commented Jun 13, 2024

madolson left a comment

PingXie commented Jun 13, 2024

PingXie left a comment

zuiderkwast left a comment

madolson commented Jun 14, 2024

madolson commented Jun 14, 2024

Only primary with slots has the right to mark a node as failed #634

Only primary with slots has the right to mark a node as failed #634

Conversation

enjoy-binbin commented Jun 12, 2024 • edited by zuiderkwast Loading

codecov bot commented Jun 12, 2024 • edited Loading

Codecov Report

zuiderkwast left a comment

Choose a reason for hiding this comment

CharlesChen888 commented Jun 13, 2024

enjoy-binbin commented Jun 13, 2024

madolson commented Jun 13, 2024

madolson left a comment

Choose a reason for hiding this comment

PingXie commented Jun 13, 2024

PingXie left a comment

Choose a reason for hiding this comment

zuiderkwast left a comment

Choose a reason for hiding this comment

madolson commented Jun 14, 2024

madolson commented Jun 14, 2024

enjoy-binbin commented Jun 12, 2024 •

edited by zuiderkwast

Loading

codecov bot commented Jun 12, 2024 •

edited

Loading