You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During Scale Down the ConfigMap is updated with a new CLUSTER_SIZE, there's a wait (6 reconcile loops) and then the StatefulSet replicas are updated. This triggers the zookeeperTeardown.sh script when the StatefulSet attempts to to remove the POD which in turn allows the dying node to be gracefully removed from the ensemble.
The problem is that the number of reconcile loops is stored in Operator state, so if the operator is managing multiple clusters some of them will trigger prematurely which will result in PODs being destroyed without being gracefully removed.
Problem Location skipSTSReconcile should not be stored in controller state (operators should really be stateless)
Another problem is that if there are any clusters NOT downscaling, they will be resetting skipSTSReconcile at least every 30 seconds. This will stop any cluster attempting to downscale from updating the StatefulSet as the skipSTSReconcile will keep being reset to 0 and will never be > 6
During Scale Down the ConfigMap is updated with a new
CLUSTER_SIZE
, there's a wait (6 reconcile loops) and then the StatefulSet replicas are updated. This triggers thezookeeperTeardown.sh
script when the StatefulSet attempts to to remove the POD which in turn allows the dying node to be gracefully removed from the ensemble.The problem is that the number of reconcile loops is stored in Operator state, so if the operator is managing multiple clusters some of them will trigger prematurely which will result in PODs being destroyed without being gracefully removed.
Problem Location
skipSTSReconcile
should not be stored in controller state (operators should really be stateless)zookeeper-operator/pkg/controller/zookeepercluster/zookeepercluster_controller.go
Line 117 in cea93ee
The text was updated successfully, but these errors were encountered: