You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 20, 2022. It is now read-only.
The casskop operator has the following logic to invalidate user request which scales down the cluster to 0, and the replica number of the cluster is then determined by the previous cluster state stored in the annotations of the operator instance,
//Global scaleDown to 0 is forbiddenifcc.Spec.NodesPerRacks==0 {
logrus.WithFields(logrus.Fields{"cluster": cc.Name}).
Warningf("The Operator has refused the change on NodesPerRack=0 restore to OldValue[%d]",
oldCRD.Spec.NodesPerRacks)
cc.Spec.NodesPerRacks=oldCRD.Spec.NodesPerRacksneedUpdate=true
}
Here the oldCRD refers last-applied-configuration for the CRD, which is dumped and stored inside operator's annotation field during every reconcile.
We ran a workload that changes replica (NodesPerRacks) from 2 -> 1 -> 0. Ideally, the replicas should remain 1 since scaling down to 0 not allowed. However, we observed there are 2 replicas at the end.
The reason is that the goroutine of performing reconciliation (g1) and the goroutine of updating the cassandraCluster object (g2) are concurrent, and certain interleaving of the two goroutines can lead to the unexpected scaling behavior.
Ideally, the following interleaving will lead to the correct result:
g2: set NodesPerRacks (from 2) to 1
g1: read NodesPerRacks and scales cluster from 2 to 1
g2: set NodesPerRacks (from 1) to 0
g1: read the oldCRD state (NodesPerRacks: 1), and set the NodesPerRacks (also scale the cluster) to 1
And we find that the following interleaving can lead to the unexpected scaling behavior mentioned above:
g2: set NodesPerRacks (from 2) to 1
g2: set NodesPerRacks (from 1) to 0
g1: read the oldCRD state (NodesPerRacks: 2), and set the NodesPerRacks (also scale the cluster) to 2 since scaling to 0 is of course not allowed
What did you do?
As mentioned above, we changed the replica (NodesPerRacks) from 2 -> 1 -> 0.
What did you expect to see?
There should be 1 replica at the end.
What did you see instead? Under which circumstances?
We observed that it ended up with 2 replicas.
Possible Solution
This issue is probably hard to fix from the operator side, as how the operator reads events from users (e.g., scaling up/down) and how the different goroutines interleave with each other are decided by the controller-runtime package, not the operator code. We have not found a good way to solve this issue for now. We also observed some similar behavior in other controllers. We are still working on finding a good solution to this kind of issues and will send a PR for that.
The text was updated successfully, but these errors were encountered:
This issue might be hard to reproduce as it will only manifest when the particular interleaving happens (e.g., when the reconcile goroutine runs slow). We are recently building a tool https://github.com/sieve-project/sieve to reliably detect and reproduce bugs like the above one in various k8s controllers. This bug can be automatically reproduced by just a few commands using our tool. Please take a try if you also want to reproduce the bug.
To use the tool, please first run the environment check:
python3 check-env.py
If the checker passes, please build the Kubernetes and controller images using the commands below as our tool needs to set up a kind cluster to reproduce the bug:
-d DOCKER_REPO_NAME should be the docker repo that you have write access to, -s f87c8e05c1a2896732fc5f3a174f1eb99e936907 is the commit where the bug is found.
It will try to scale the cassandra cluster from 2-> 1 -> 0 and also manipulate the goroutine interleaving of the operator to trigger the previously mentioned corner case interleaving.
The bug is successfully reproduced if you can see the message below
[ERROR] pod SIZE inconsistency: 2 seen after learning run, but 3 seen after testing run
Ideally there should be 2 pods eventually (1 for the operator pod and 1 for the cassandra replica), but after the test run we observe 3 pods (1 for the operator pod and 2 cassandra replicas as mentioned above).
Bug Report
The casskop operator has the following logic to invalidate user request which scales down the cluster to 0, and the replica number of the cluster is then determined by the previous cluster state stored in the annotations of the operator instance,
Here the
oldCRD
refers last-applied-configuration for the CRD, which is dumped and stored inside operator's annotation field during every reconcile.We ran a workload that changes replica (NodesPerRacks) from 2 -> 1 -> 0. Ideally, the replicas should remain 1 since scaling down to 0 not allowed. However, we observed there are 2 replicas at the end.
The reason is that the goroutine of performing reconciliation (g1) and the goroutine of updating the cassandraCluster object (g2) are concurrent, and certain interleaving of the two goroutines can lead to the unexpected scaling behavior.
Ideally, the following interleaving will lead to the correct result:
And we find that the following interleaving can lead to the unexpected scaling behavior mentioned above:
What did you do?
As mentioned above, we changed the replica (NodesPerRacks) from 2 -> 1 -> 0.
What did you expect to see?
There should be 1 replica at the end.
What did you see instead? Under which circumstances?
We observed that it ended up with 2 replicas.
Environment
casskop version:
f87c8e0 (master branch)
Kubernetes version information:
1.18.9
Cassandra version:
3.11
Possible Solution
This issue is probably hard to fix from the operator side, as how the operator reads events from users (e.g., scaling up/down) and how the different goroutines interleave with each other are decided by the
controller-runtime
package, not the operator code. We have not found a good way to solve this issue for now. We also observed some similar behavior in other controllers. We are still working on finding a good solution to this kind of issues and will send a PR for that.The text was updated successfully, but these errors were encountered: