Controller is not scaling-up degraded control plane #352

zioc · 2024-06-25T07:10:58Z

What happened:

On a degraded cluster, 2 control-plane nodes out of 3 were unhealthy. One CP machine has been deleted, but it has not been re-created by control plane controller, we were observing the following logs:

Scaling up control plane" "Desired"=3 "Existing"=2 "RKE2ControlPlane

But right after that, we can see that control plane was not scaled up because of the following check:

"Waiting for control plane to pass preflight checks" [...] "failures"="machine management-cluster-control-plane-dklp7 reports AgentHealthy condition is false (Error, Missing node)"

Is is on purpose? Wouldn't it be legitimate to scale-up control-plane anyway in such cases? Even if some node is not healthy, wouldn't it be worth creating a new machine to match the requested number of replicas?

Here a more complete log, we see that machine has been generated as soon as the second unhealthy CP node has been deleted (at the end):

[rke2-control-plane-controller-manager-588d666c5d-sjgnn] I0624 14:45:24.724005       1 rke2controlplane_controller.go:387]  "msg"="Reconcile RKE2 Control Plane" "RKE2ControlPlane"={"name":"management-cluster-control-plane","namespace":"sylva-system"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="management-cluster-control-plane" "namespace"="sylva-system" "reconcileID"="fb967ad8-a26d-4431-bef7-5ae459a10cde"
[rke2-control-plane-controller-manager-588d666c5d-sjgnn] I0624 14:45:24.775706       1 rke2controlplane_controller.go:510]  "msg"="Scaling up control plane" "Desired"=3 "Existing"=2 "RKE2ControlPlane"={"name":"management-cluster-control-plane","namespace":"sylva-system"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="management-cluster-control-plane" "namespace"="sylva-system" "reconcileID"="fb967ad8-a26d-4431-bef7-5ae459a10cde"
[rke2-control-plane-controller-manager-588d666c5d-sjgnn] I0624 14:45:24.775772       1 scale.go:225]  "msg"="Waiting for control plane to pass preflight checks" "RKE2ControlPlane"={"name":"management-cluster-control-plane","namespace":"sylva-system"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "failures"="machine management-cluster-control-plane-dklp7 reports AgentHealthy condition is false (Error, Missing node)" "name"="management-cluster-control-plane" "namespace"="sylva-system" "reconcileID"="fb967ad8-a26d-4431-bef7-5ae459a10cde"
[rke2-control-plane-controller-manager-588d666c5d-sjgnn] I0624 14:46:01.000267       1 rke2controlplane_controller.go:387]  "msg"="Reconcile RKE2 Control Plane" "RKE2ControlPlane"={"name":"management-cluster-control-plane","namespace":"sylva-system"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="management-cluster-control-plane" "namespace"="sylva-system" "reconcileID"="c3a6ad0b-bfd1-4d74-99ad-62e6c7b94e5b"
[rke2-control-plane-controller-manager-588d666c5d-sjgnn] I0624 14:46:01.044557       1 rke2controlplane_controller.go:510]  "msg"="Scaling up control plane" "Desired"=3 "Existing"=2 "RKE2ControlPlane"={"name":"management-cluster-control-plane","namespace":"sylva-system"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="management-cluster-control-plane" "namespace"="sylva-system" "reconcileID"="c3a6ad0b-bfd1-4d74-99ad-62e6c7b94e5b"
[rke2-control-plane-controller-manager-588d666c5d-sjgnn] I0624 14:46:01.045034       1 scale.go:225]  "msg"="Waiting for control plane to pass preflight checks" "RKE2ControlPlane"={"name":"management-cluster-control-plane","namespace":"sylva-system"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "failures"="machine management-cluster-control-plane-dklp7 reports AgentHealthy condition is false (Error, Missing node)" "name"="management-cluster-control-plane" "namespace"="sylva-system" "reconcileID"="c3a6ad0b-bfd1-4d74-99ad-62e6c7b94e5b"
[rke2-control-plane-controller-manager-588d666c5d-sjgnn] I0624 14:46:39.960985       1 rke2controlplane_controller.go:387]  "msg"="Reconcile RKE2 Control Plane" "RKE2ControlPlane"={"name":"management-cluster-control-plane","namespace":"sylva-system"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="management-cluster-control-plane" "namespace"="sylva-system" "reconcileID"="b6992d19-b1aa-4ea3-91e4-382a84f512d0"
[rke2-control-plane-controller-manager-588d666c5d-sjgnn] I0624 14:46:40.009671       1 rke2controlplane_controller.go:510]  "msg"="Scaling up control plane" "Desired"=3 "Existing"=2 "RKE2ControlPlane"={"name":"management-cluster-control-plane","namespace":"sylva-system"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="management-cluster-control-plane" "namespace"="sylva-system" "reconcileID"="b6992d19-b1aa-4ea3-91e4-382a84f512d0"
[rke2-control-plane-controller-manager-588d666c5d-sjgnn] I0624 14:46:40.084152       1 scale.go:402]  "msg"="Version checking..." "RKE2ControlPlane"={"name":"management-cluster-control-plane","namespace":"sylva-system"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "machine-version: "="1.28.8" "name"="management-cluster-control-plane" "namespace"="sylva-system" "reconcileID"="b6992d19-b1aa-4ea3-91e4-382a84f512d0" "rke2-version"="v1.28.8+rke2r1"
[rke2-control-plane-controller-manager-588d666c5d-sjgnn] I0624 14:46:40.084188       1 scale.go:425]  "msg"="generating machine:" "RKE2ControlPlane"={"name":"management-cluster-control-plane","namespace":"sylva-system"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "machine-spec-version"="1.28.8" "name"="management-cluster-control-plane" "namespace"="sylva-system" "reconcileID"="b6992d19-b1aa-4ea3-91e4-382a84f512d0"
[rke2-control-plane-controller-manager-588d666c5d-sjgnn] I0624 14:46:40.156976       1 rke2controlplane_controller.go:387]  "msg"="Reconcile RKE2 Control Plane" "RKE2ControlPlane"={"name":"management-cluster-control-plane","namespace":"sylva-system"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="management-cluster-control-plane" "namespace"="sylva-system" "reconcileID"="2040b8a7-d273-458c-805a-cf45dc0b2e57"

Environment:

sylva 1.1.0 rke2 + capo

rke provider version: v0.2.7

The text was updated successfully, but these errors were encountered:

dhanabal1 · 2024-07-18T23:22:29Z

Yes, I also observed the same issue, Why do we need to check all the master node's health statuses, if we have 1 healthy master node would it be good to proceed with scaling up the next node? , it blocks the parallel node provisioning as well—time delay when there is more number of master in cluster.

alexander-demicev · 2024-08-27T09:34:04Z

@zioc Can you check if the issue still appears on latest version of CAPRKE2?

github-actions · 2024-11-26T07:24:46Z

This issue is stale because it has been open 90 days with no activity.

zioc added kind/bug Something isn't working needs-priority Indicates an issue or PR needs a priority assigning to it needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 25, 2024

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controller is not scaling-up degraded control plane #352

Controller is not scaling-up degraded control plane #352

zioc commented Jun 25, 2024

dhanabal1 commented Jul 18, 2024

alexander-demicev commented Aug 27, 2024

github-actions bot commented Nov 26, 2024

Controller is not scaling-up degraded control plane #352

Controller is not scaling-up degraded control plane #352

Comments

zioc commented Jun 25, 2024

dhanabal1 commented Jul 18, 2024

alexander-demicev commented Aug 27, 2024

github-actions bot commented Nov 26, 2024