-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow MachinePool autoscaler maxReplicas < #AZs #2215
Allow MachinePool autoscaler maxReplicas < #AZs #2215
Conversation
Our algorithm to spread MachinePool.Spec.Autoscaling.MinReplicas and .MaxReplicas across spoke MachineAutoscalers previously assumed that it was sane to create one MachineAutoscaler per AZ and set maxReplicas to zero if that's what our computation came out with. Not so. MachineAutoscaler.Spec.MaxReplicas -- in contrast to MachineSet.Spec.Replicas -- is [not allowed to be zero](https://github.com/openshift/cluster-autoscaler-operator/blob/67999a5e79d0200ee0a4aab3dcfbfd18e097b514/pkg/apis/autoscaling/v1beta1/machineautoscaler_types.go#L18). The resulting behavior would manifest as a hive-controllers error similar to: ``` time="2024-02-21T16:33:56.802Z" level=error msg="unable to create machine autoscaler" controller=machinepool error="MachineAutoscaler.autoscaling.openshift.io \"efried-rg2wn-worker-test-us-east-1c\" is invalid: spec.maxReplicas: Invalid value: 0: spec.maxReplicas in body should be greater than or equal to 1" machinePool=efried/efried-worker-test reconcileID=nwpxskln ``` So instead we have to include a special case for this and delete such MachineAutoscalers instead. (Further, since this error causes us to bail out of the machinepool controller's reconcile loop before updating the MachinePool status, the user doesn't have a great way to discover what went wrong. They just have to notice that MachineSets et al stop responding. We'll address this in a separate commit.) HIVE-2415
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2215 +/- ##
=======================================
Coverage 58.31% 58.32%
=======================================
Files 182 182
Lines 25697 25697
=======================================
+ Hits 14986 14988 +2
+ Misses 9453 9452 -1
+ Partials 1258 1257 -1
|
/test e2e |
@2uasimojo: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: 2uasimojo, lleshchi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick mce-2.5 |
@2uasimojo: new pull request created: #2216 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
MachineAutoscalers are not allowed to have MaxReplicas==0. PR openshift#2215 / bce2d47 partially fixed the scenario where a MachinePool's autoscaling maxReplicas is less than the number of AZs by causing such MAs not to be *created*. However, when reducing the MachinePool's maxReplicas to below the number of AZs, we were still trying to update *existing* MAs to have MaxReplicas==0. This commit adjusts the logic to delete them instead. HIVE-2415
MachineAutoscalers are not allowed to have MaxReplicas==0. PR openshift#2215 / bce2d47 partially fixed the scenario where a MachinePool's autoscaling maxReplicas is less than the number of AZs by causing such MAs not to be *created*. However, when reducing the MachinePool's maxReplicas to below the number of AZs, we were still trying to update *existing* MAs to have MaxReplicas==0. This commit adjusts the logic to delete them instead. HIVE-2415
MachineAutoscalers are not allowed to have MaxReplicas==0. PR openshift#2215 / bce2d47 partially fixed the scenario where a MachinePool's autoscaling maxReplicas is less than the number of AZs by causing such MAs not to be *created*. However, when reducing the MachinePool's maxReplicas to below the number of AZs, we were still trying to update *existing* MAs to have MaxReplicas==0. This commit adjusts the logic to delete them instead. HIVE-2415
MachineAutoscalers are not allowed to have MaxReplicas==0. PR openshift#2215 / bce2d47 partially fixed the scenario where a MachinePool's autoscaling maxReplicas is less than the number of AZs by causing such MAs not to be *created*. However, when reducing the MachinePool's maxReplicas to below the number of AZs, we were still trying to update *existing* MAs to have MaxReplicas==0. This commit adjusts the logic to delete them instead. HIVE-2415
MachineAutoscalers are not allowed to have MaxReplicas==0. PR openshift#2215 / bce2d47 partially fixed the scenario where a MachinePool's autoscaling maxReplicas is less than the number of AZs by causing such MAs not to be *created*. However, when reducing the MachinePool's maxReplicas to below the number of AZs, we were still trying to update *existing* MAs to have MaxReplicas==0. This commit adjusts the logic to delete them instead. HIVE-2415
MachineAutoscalers are not allowed to have MaxReplicas==0. PR openshift#2215 / bce2d47 partially fixed the scenario where a MachinePool's autoscaling maxReplicas is less than the number of AZs by causing such MAs not to be *created*. However, when reducing the MachinePool's maxReplicas to below the number of AZs, we were still trying to update *existing* MAs to have MaxReplicas==0. This commit adjusts the logic to delete them instead. HIVE-2415
Our algorithm to spread MachinePool.Spec.Autoscaling.MinReplicas and .MaxReplicas across spoke MachineAutoscalers previously assumed that it was sane to create one MachineAutoscaler per AZ and set maxReplicas to zero if that's what our computation came out with.
Not so.
MachineAutoscaler.Spec.MaxReplicas -- in contrast to MachineSet.Spec.Replicas -- is not allowed to be zero.
The resulting behavior would manifest as a hive-controllers error similar to:
So instead we have to include a special case for this and delete such MachineAutoscalers instead.
(Further, since this error causes us to bail out of the machinepool controller's reconcile loop before updating the MachinePool status, the user doesn't have a great way to discover what went wrong. They just have to notice that MachineSets et al stop responding. We'll address this in a separate commit.)
HIVE-2415