Fix testFollowerCheckerDetectsUnresponsiveNodeAfterMasterReelection #84200

DaveCTurner · 2022-02-21T20:00:20Z

This test would fail if we introduce the network partition while the
master is still publishing a cluster state update and hasn't received
the ack from the victim node. In this case the default publish timeout
means that the master will wait for 30s before completing the stalled
publication and moving on to the node-left one, but
ensureStableCluster also times out after 30s which leaves not much
time for the master to remove the victim node.

This commit reduces the publish timeout to 10s so that the master
recovers well before ensureStableCluster times out.

Closes #84172

This test would fail if we introduce the network partition while the master is still publishing a cluster state update and hasn't received the ack from the victim node. In this case the default publish timeout means that the master will wait for 30s before completing the stalled publication and moving on to the `node-left` one, but `ensureStableCluster` also times out after 30s which leaves not much time for the master to remove the victim node. This commit reduces the publish timeout to 10s so that the master recovers well before `ensureStableCluster` times out. Closes elastic#84172

elasticmachine · 2022-02-21T20:00:23Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine · 2022-02-22T08:31:46Z

💔 Backport failed

The backport operation could not be completed due to the following error:
An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 84200

…84200) This test would fail if we introduce the network partition while the master is still publishing a cluster state update and hasn't received the ack from the victim node. In this case the default publish timeout means that the master will wait for 30s before completing the stalled publication and moving on to the `node-left` one, but `ensureStableCluster` also times out after 30s which leaves not much time for the master to remove the victim node. This commit reduces the publish timeout to 10s so that the master recovers well before `ensureStableCluster` times out. Closes #84172

…lastic#84200) This test would fail if we introduce the network partition while the master is still publishing a cluster state update and hasn't received the ack from the victim node. In this case the default publish timeout means that the master will wait for 30s before completing the stalled publication and moving on to the `node-left` one, but `ensureStableCluster` also times out after 30s which leaves not much time for the master to remove the victim node. This commit reduces the publish timeout to 10s so that the master recovers well before `ensureStableCluster` times out. Closes elastic#84172

DaveCTurner added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v7.17.1 v8.2.0 v8.1.1 v8.0.2 labels Feb 21, 2022

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Feb 21, 2022

idegtiarenko approved these changes Feb 22, 2022

View reviewed changes

DaveCTurner added the auto-backport-and-merge label Feb 22, 2022

DaveCTurner merged commit e853bf5 into elastic:master Feb 22, 2022

DaveCTurner deleted the 2022-02-21-fix-testFollowerCheckerDetectsUnresponsiveNodeAfterMasterReelection branch February 22, 2022 08:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix testFollowerCheckerDetectsUnresponsiveNodeAfterMasterReelection #84200

Fix testFollowerCheckerDetectsUnresponsiveNodeAfterMasterReelection #84200

DaveCTurner commented Feb 21, 2022

elasticmachine commented Feb 21, 2022

elasticsearchmachine commented Feb 22, 2022

Fix testFollowerCheckerDetectsUnresponsiveNodeAfterMasterReelection #84200

Fix testFollowerCheckerDetectsUnresponsiveNodeAfterMasterReelection #84200

Conversation

DaveCTurner commented Feb 21, 2022

elasticmachine commented Feb 21, 2022

elasticsearchmachine commented Feb 22, 2022

💔 Backport failed