CA Issues invalid scale downs due to scale-down processing delay in the MCM #342
Labels
area/auto-scaling
Auto-scaling (CA/HPA/VPA/HVPA, predominantly control plane, but also otherwise) related
kind/bug
Bug
What happened:
CA requested an undesired additional reduction of the MCM
MachineDeployment
deployment while the first request to remove the machine was still in processing. This occurred due to api-server timeouts.Example log
As shown below, the machine set was scaled down at
21:40:04
but till21:40:38
it was not processed due to client-side throttlingDuring this time, CA again requested to delete the nodes, which caused another scale-down.
What you expected to happen:
CA scale down should be idempotent and should only occur once regardless of any timeouts or throttling. There should be no further reduction of replications by the CA just because MCM is delayed or there are problems executing the scale down.
How to reproduce it (as minimally and precisely as possible):
Hang the scale down in the MCM with artificial extended delay so that the CA issues erroneous scaledowns in subsequent processing cycles
The text was updated successfully, but these errors were encountered: