Scale Down doesn't work with multiple ZK Clusters #103

maddisondavid · 2019-12-04T11:12:21Z

During Scale Down the ConfigMap is updated with a new CLUSTER_SIZE, there's a wait (6 reconcile loops) and then the StatefulSet replicas are updated. This triggers the zookeeperTeardown.sh script when the StatefulSet attempts to to remove the POD which in turn allows the dying node to be gracefully removed from the ensemble.

The problem is that the number of reconcile loops is stored in Operator state, so if the operator is managing multiple clusters some of them will trigger prematurely which will result in PODs being destroyed without being gracefully removed.

Problem Location
skipSTSReconcile should not be stored in controller state (operators should really be stateless)

zookeeper-operator/pkg/controller/zookeepercluster/zookeepercluster_controller.go

Line 117 in cea93ee

skipSTSReconcile int

The text was updated successfully, but these errors were encountered:

maddisondavid · 2019-12-04T12:33:39Z

Another problem is that if there are any clusters NOT downscaling, they will be resetting skipSTSReconcile at least every 30 seconds. This will stop any cluster attempting to downscale from updating the StatefulSet as the skipSTSReconcile will keep being reset to 0 and will never be > 6

maddisondavid changed the title ~~ScaleDown doesn't work with multiple ZK Clusters~~ Scale Down doesn't work with multiple ZK Clusters Dec 4, 2019

EronWright mentioned this issue Dec 5, 2019

Issue 92: Zookeeper pods fail to restart when quorum not available #93

Merged

pbelgundi mentioned this issue Dec 19, 2019

Changes for zk operator to be stateless #111

Closed

pbelgundi mentioned this issue Jan 28, 2020

Issue Changes for handling zk scale down #120

Merged

pbelgundi closed this as completed in #120 Feb 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale Down doesn't work with multiple ZK Clusters #103

Scale Down doesn't work with multiple ZK Clusters #103

maddisondavid commented Dec 4, 2019

maddisondavid commented Dec 4, 2019

Scale Down doesn't work with multiple ZK Clusters #103

Scale Down doesn't work with multiple ZK Clusters #103

Comments

maddisondavid commented Dec 4, 2019

maddisondavid commented Dec 4, 2019