Controller is not scaling-up degraded control plane #352
Labels
kind/bug
Something isn't working
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
priority/important-soon
Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
triage/accepted
Indicates an issue or PR is ready to be actively worked on.
What happened:
On a degraded cluster, 2 control-plane nodes out of 3 were unhealthy. One CP machine has been deleted, but it has not been re-created by control plane controller, we were observing the following logs:
Scaling up control plane" "Desired"=3 "Existing"=2 "RKE2ControlPlane
But right after that, we can see that control plane was not scaled up because of the following check:
"Waiting for control plane to pass preflight checks" [...] "failures"="machine management-cluster-control-plane-dklp7 reports AgentHealthy condition is false (Error, Missing node)"
Is is on purpose? Wouldn't it be legitimate to scale-up control-plane anyway in such cases? Even if some node is not healthy, wouldn't it be worth creating a new machine to match the requested number of replicas?
Here a more complete log, we see that machine has been generated as soon as the second unhealthy CP node has been deleted (at the end):
Environment:
sylva 1.1.0 rke2 + capo
The text was updated successfully, but these errors were encountered: