Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale Down doesn't work with multiple ZK Clusters #103

Closed
maddisondavid opened this issue Dec 4, 2019 · 1 comment · Fixed by #120
Closed

Scale Down doesn't work with multiple ZK Clusters #103

maddisondavid opened this issue Dec 4, 2019 · 1 comment · Fixed by #120

Comments

@maddisondavid
Copy link
Contributor

During Scale Down the ConfigMap is updated with a new CLUSTER_SIZE, there's a wait (6 reconcile loops) and then the StatefulSet replicas are updated. This triggers the zookeeperTeardown.sh script when the StatefulSet attempts to to remove the POD which in turn allows the dying node to be gracefully removed from the ensemble.

The problem is that the number of reconcile loops is stored in Operator state, so if the operator is managing multiple clusters some of them will trigger prematurely which will result in PODs being destroyed without being gracefully removed.

Problem Location
skipSTSReconcile should not be stored in controller state (operators should really be stateless)

@maddisondavid maddisondavid changed the title ScaleDown doesn't work with multiple ZK Clusters Scale Down doesn't work with multiple ZK Clusters Dec 4, 2019
@maddisondavid
Copy link
Contributor Author

Another problem is that if there are any clusters NOT downscaling, they will be resetting skipSTSReconcile at least every 30 seconds. This will stop any cluster attempting to downscale from updating the StatefulSet as the skipSTSReconcile will keep being reset to 0 and will never be > 6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant