Skip to content

Commit

Permalink
Add voting-only master node (#43410)
Browse files Browse the repository at this point in the history
A voting-only master-eligible node is a node that can participate in master elections but will not act
as a master in the cluster. In particular, a voting-only node can help elect another master-eligible
node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at
least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two
can still elect a master amongst them-selves. This only requires one of the two remaining nodes to
have the capability to act as master, but both need to have voting powers. This means that one of
the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated
master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a
voting-only non-dedicated master node can play the role of the third master-eligible node, which
allows running an HA cluster with only two dedicated master nodes.

Closes #14340

Co-authored-by: David Turner <[email protected]>
  • Loading branch information
ywelsch and DaveCTurner committed Jun 26, 2019
1 parent 11f41c4 commit 2049f71
Show file tree
Hide file tree
Showing 41 changed files with 2,533 additions and 1,576 deletions.
Empty file added A
Empty file.
18 changes: 12 additions & 6 deletions docs/reference/cluster.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,14 @@ one of the following:
* an IP address or hostname, to add all matching nodes to the subset.
* a pattern, using `*` wildcards, which adds all nodes to the subset
whose name, address or hostname matches the pattern.
* `master:true`, `data:true`, `ingest:true` or `coordinating_only:true`, which
respectively add to the subset all master-eligible nodes, all data nodes,
all ingest nodes, and all coordinating-only nodes.
* `master:false`, `data:false`, `ingest:false` or `coordinating_only:false`,
which respectively remove from the subset all master-eligible nodes, all data
nodes, all ingest nodes, and all coordinating-only nodes.
* `master:true`, `data:true`, `ingest:true`, `voting_only:true` or
`coordinating_only:true`, which respectively add to the subset all
master-eligible nodes, all data nodes, all ingest nodes, all voting-only
nodes, and all coordinating-only nodes.
* `master:false`, `data:false`, `ingest:false`, `voting_only:true`, or
`coordinating_only:false`, which respectively remove from the subset all
master-eligible nodes, all data nodes, all ingest nodes, all voting-only
nodes and all coordinating-only nodes.
* a pair of patterns, using `*` wildcards, of the form `attrname:attrvalue`,
which adds to the subset all nodes with a custom node attribute whose name
and value match the respective patterns. Custom node attributes are
Expand All @@ -46,6 +48,9 @@ means that filters such as `master:false` which remove nodes from the chosen
subset are only useful if they come after some other filters. When used on its
own, `master:false` selects no nodes.

NOTE: The `voting_only` role requires the {default-dist} of Elasticsearch and
is not supported in the {oss-dist}.

Here are some examples of the use of node filters with the
<<cluster-nodes-info,Nodes Info>> APIs.

Expand All @@ -69,6 +74,7 @@ GET /_nodes/10.0.0.*
GET /_nodes/_all,master:false
GET /_nodes/data:true,ingest:true
GET /_nodes/coordinating_only:true
GET /_nodes/master:true,voting_only:false
# Select nodes by custom attribute (e.g. with something like `node.attr.rack: 2` in the configuration file)
GET /_nodes/rack:2
GET /_nodes/ra*:2
Expand Down
9 changes: 7 additions & 2 deletions docs/reference/cluster/stats.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,8 @@ Will return, for example:
"data": 1,
"coordinating_only": 0,
"master": 1,
"ingest": 1
"ingest": 1,
"voting_only": 0
},
"versions": [
"{version}"
Expand Down Expand Up @@ -207,6 +208,7 @@ Will return, for example:
// TESTRESPONSE[s/"plugins": \[[^\]]*\]/"plugins": $body.$_path/]
// TESTRESPONSE[s/"network_types": \{[^\}]*\}/"network_types": $body.$_path/]
// TESTRESPONSE[s/"discovery_types": \{[^\}]*\}/"discovery_types": $body.$_path/]
// TESTRESPONSE[s/"count": \{[^\}]*\}/"count": $body.$_path/]
// TESTRESPONSE[s/"packaging_types": \[[^\]]*\]/"packaging_types": $body.$_path/]
// TESTRESPONSE[s/: true|false/: $body.$_path/]
// TESTRESPONSE[s/: (\-)?[0-9]+/: $body.$_path/]
Expand All @@ -217,7 +219,10 @@ Will return, for example:
// see an exhaustive list anyway.
// 2. Similarly, ignore the contents of `network_types`, `discovery_types`, and
// `packaging_types`.
// 3. All of the numbers and strings on the right hand side of *every* field in
// 3. Ignore the contents of the (nodes) count object, as what's shown here
// depends on the license. Voting-only nodes are e.g. only shown when this
// test runs with a basic license.
// 4. All of the numbers and strings on the right hand side of *every* field in
// the response are ignored. So we're really only asserting things about the
// the shape of this response, not the values in it.

Expand Down
46 changes: 44 additions & 2 deletions docs/reference/modules/node.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,9 @@ creating or deleting an index, tracking which nodes are part of the cluster,
and deciding which shards to allocate to which nodes. It is important for
cluster health to have a stable master node.

Any master-eligible node (all nodes by default) may be elected to become the
master node by the <<modules-discovery,master election process>>.
Any master-eligible node that is not a <<voting-only-node,voting-only node>> may
be elected to become the master node by the <<modules-discovery,master election
process>>.

IMPORTANT: Master nodes must have access to the `data/` directory (just like
`data` nodes) as this is where the cluster state is persisted between node restarts.
Expand Down Expand Up @@ -135,6 +136,47 @@ cluster.remote.connect: false <4>
<3> Disable the `node.ingest` role (enabled by default).
<4> Disable {ccs} (enabled by default).

[float]
[[voting-only-node]]
==== Voting-only master-eligible node

A voting-only master-eligible node is a node that participates in
<<modules-discovery,master elections>> but which will not act as the cluster's
elected master node. In particular, a voting-only node can serve as a tiebreaker
in elections.

It may seem confusing to use the term "master-eligible" to describe a
voting-only node since such a node is not actually eligible to become the master
at all. This terminology is an unfortunate consequence of history:
master-eligible nodes are those nodes that participate in elections and perform
certain tasks during cluster state publications, and voting-only nodes have the
same responsibilities even if they can never become the elected master.

To configure a master-eligible node as a voting-only node, set the following
setting:

[source,yaml]
-------------------
node.voting_only: true <1>
-------------------
<1> The default for `node.voting_only` is `false`.

IMPORTANT: The `voting_only` role requires the {default-dist} of Elasticsearch
and is not supported in the {oss-dist}. If you use the {oss-dist} and set
`node.voting_only` then the node will fail to start. Also note that only
master-eligible nodes can be marked as voting-only.

High availability (HA) clusters require at least three master-eligible nodes, at
least two of which are not voting-only nodes. Such a cluster will be able to
elect a master node even if one of the nodes fails.

Since voting-only nodes never act as the cluster's elected master, they may
require require less heap and a less powerful CPU than the true master nodes.
However all master-eligible nodes, including voting-only nodes, require
reasonably fast persistent storage and a reliable and low-latency network
connection to the rest of the cluster, since they are on the critical path for
<<cluster-state-publishing,publishing cluster state updates>>.

[float]
[[data-node]]
=== Data Node
Expand Down
4 changes: 4 additions & 0 deletions docs/reference/rest-api/info.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,10 @@ Example response:
"available" : true,
"enabled" : true
},
"voting_only" : {
"available" : true,
"enabled" : true
},
"watcher" : {
"available" : true,
"enabled" : true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -122,14 +122,16 @@ static class ClusterFormationState {
private final List<TransportAddress> resolvedAddresses;
private final List<DiscoveryNode> foundPeers;
private final long currentTerm;
private final ElectionStrategy electionStrategy;

ClusterFormationState(Settings settings, ClusterState clusterState, List<TransportAddress> resolvedAddresses,
List<DiscoveryNode> foundPeers, long currentTerm) {
List<DiscoveryNode> foundPeers, long currentTerm, ElectionStrategy electionStrategy) {
this.settings = settings;
this.clusterState = clusterState;
this.resolvedAddresses = resolvedAddresses;
this.foundPeers = foundPeers;
this.currentTerm = currentTerm;
this.electionStrategy = electionStrategy;
}

String getDescription() {
Expand Down Expand Up @@ -188,7 +190,9 @@ String getDescription() {
final VoteCollection voteCollection = new VoteCollection();
foundPeers.forEach(voteCollection::addVote);
final String isQuorumOrNot
= CoordinationState.isElectionQuorum(voteCollection, clusterState) ? "is a quorum" : "is not a quorum";
= electionStrategy.isElectionQuorum(clusterState.nodes().getLocalNode(), currentTerm, clusterState.term(),
clusterState.version(), clusterState.getLastCommittedConfiguration(), clusterState.getLastAcceptedConfiguration(),
voteCollection) ? "is a quorum" : "is not a quorum";

return String.format(Locale.ROOT,
"master not discovered or elected yet, an election requires %s, have discovered %s which %s; %s",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,14 @@
import org.elasticsearch.cluster.coordination.CoordinationMetaData.VotingConfiguration;
import org.elasticsearch.cluster.metadata.MetaData;
import org.elasticsearch.cluster.node.DiscoveryNode;
import org.elasticsearch.common.settings.Settings;

import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Optional;
import java.util.Set;

import static org.elasticsearch.cluster.coordination.Coordinator.ZEN1_BWC_TERM;

Expand All @@ -44,6 +45,8 @@ public class CoordinationState {

private final DiscoveryNode localNode;

private final ElectionStrategy electionStrategy;

// persisted state
private final PersistedState persistedState;

Expand All @@ -55,11 +58,12 @@ public class CoordinationState {
private VotingConfiguration lastPublishedConfiguration;
private VoteCollection publishVotes;

public CoordinationState(Settings settings, DiscoveryNode localNode, PersistedState persistedState) {
public CoordinationState(DiscoveryNode localNode, PersistedState persistedState, ElectionStrategy electionStrategy) {
this.localNode = localNode;

// persisted state
this.persistedState = persistedState;
this.electionStrategy = electionStrategy;

// transient state
this.joinVotes = new VoteCollection();
Expand Down Expand Up @@ -106,13 +110,9 @@ public boolean electionWon() {
return electionWon;
}

public boolean isElectionQuorum(VoteCollection votes) {
return isElectionQuorum(votes, getLastAcceptedState());
}

static boolean isElectionQuorum(VoteCollection votes, ClusterState lastAcceptedState) {
return votes.isQuorum(lastAcceptedState.getLastCommittedConfiguration())
&& votes.isQuorum(lastAcceptedState.getLastAcceptedConfiguration());
public boolean isElectionQuorum(VoteCollection joinVotes) {
return electionStrategy.isElectionQuorum(localNode, getCurrentTerm(), getLastAcceptedTerm(), getLastAcceptedVersion(),
getLastCommittedConfiguration(), getLastAcceptedConfiguration(), joinVotes);
}

public boolean isPublishQuorum(VoteCollection votes) {
Expand All @@ -123,6 +123,11 @@ public boolean containsJoinVoteFor(DiscoveryNode node) {
return joinVotes.containsVoteFor(node);
}

// used for tests
boolean containsJoin(Join join) {
return joinVotes.getJoins().contains(join);
}

public boolean joinVotesHaveQuorumFor(VotingConfiguration votingConfiguration) {
return joinVotes.isQuorum(votingConfiguration);
}
Expand Down Expand Up @@ -249,7 +254,7 @@ public boolean handleJoin(Join join) {
throw new CoordinationStateRejectedException("rejecting join since this node has not received its initial configuration yet");
}

boolean added = joinVotes.addVote(join.getSourceNode());
boolean added = joinVotes.addJoinVote(join);
boolean prevElectionWon = electionWon;
electionWon = isElectionQuorum(joinVotes);
assert !prevElectionWon || electionWon; // we cannot go from won to not won
Expand Down Expand Up @@ -503,18 +508,28 @@ default void markLastAcceptedStateAsCommitted() {
}

/**
* A collection of votes, used to calculate quorums.
* A collection of votes, used to calculate quorums. Optionally records the Joins as well.
*/
public static class VoteCollection {

private final Map<String, DiscoveryNode> nodes;
private final Set<Join> joins;

public boolean addVote(DiscoveryNode sourceNode) {
return nodes.put(sourceNode.getId(), sourceNode) == null;
}

public boolean addJoinVote(Join join) {
final boolean added = addVote(join.getSourceNode());
if (added) {
joins.add(join);
}
return added;
}

public VoteCollection() {
nodes = new HashMap<>();
joins = new HashSet<>();
}

public boolean isQuorum(VotingConfiguration configuration) {
Expand All @@ -533,24 +548,31 @@ public Collection<DiscoveryNode> nodes() {
return Collections.unmodifiableCollection(nodes.values());
}

public Set<Join> getJoins() {
return Collections.unmodifiableSet(joins);
}

@Override
public String toString() {
return "VoteCollection{" + String.join(",", nodes.keySet()) + "}";
return "VoteCollection{votes=" + nodes.keySet() + ", joins=" + joins + "}";
}

@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
if (!(o instanceof VoteCollection)) return false;

VoteCollection that = (VoteCollection) o;

return nodes.equals(that.nodes);
if (!nodes.equals(that.nodes)) return false;
return joins.equals(that.joins);
}

@Override
public int hashCode() {
return nodes.hashCode();
int result = nodes.hashCode();
result = 31 * result + joins.hashCode();
return result;
}
}
}
Loading

0 comments on commit 2049f71

Please sign in to comment.