Nodegroup min/max should respect both explicit setting and ASG api return #1559

Jeffwan · 2019-01-07T06:58:31Z

Cluster scaler supports explicit nodeGroup min/max setting from user like following,

--nodes=1:10:k8s-worker-asg-1

The problem is user can try arbitrary min/max combination here. I think we should consider ASG API return as well at the time register asg. But it brings another problem once ASG min/max is updated on cloud provider side, explicitlyConfigured might be wiped. Could someone help verify if this is by design or should we fix this issue? If that's reasonable, I can submit a PR to address this issue.

if !m.explicitlyConfigured[asg.AwsRef] {
  existing.minSize = asg.minSize
  existing.maxSize = asg.maxSize
} else {
  minSize := cloudprovider.MaxOf(asg.minSize, existing.minSize)
  maxSize := cloudprovider.MinOf(asg.maxSize, existing.maxSize)

  if minSize <= maxSize {
    existing.minSize = minSize
    existing.maxSize = maxSize
  } else {
    klog.V(4).Infof("Min/Max %d/%d doesn't make sense", minSize, maxSize)
  }
}

Let's assume ASG min/max setting is 3/3, desired instance 3.

Option 1. --nodes=3:4:k8s-worker-asg-1. CA fails scaling up because it reach max of ASG.

I0106 22:04:07.483292   51532 scale_up.go:262] Pod default/gpu-deployment-687f9486f-zcz9d is unschedulable
I0106 22:04:07.483454   51532 scale_up.go:304] Upcoming 0 nodes
I0106 22:04:07.483551   51532 scale_up.go:427] Best option to resize: eksctl-eks-test-nodegroup-0-NodeGroup-3OK0XRHS0SA4
I0106 22:04:07.483564   51532 scale_up.go:431] Estimated 1 nodes needed in eksctl-eks-test-nodegroup-0-NodeGroup-3OK0XRHS0SA4
I0106 22:04:07.483582   51532 scale_up.go:533] Final scale-up plan: [{eksctl-eks-test-nodegroup-0-NodeGroup-3OK0XRHS0SA4 3->4 (max: 4)}]
I0106 22:04:07.483610   51532 scale_up.go:690] Scale-up: setting group eksctl-eks-test-nodegroup-0-NodeGroup-3OK0XRHS0SA4 size to 4
I0106 22:04:07.483649   51532 auto_scaling_groups.go:203] Setting asg eksctl-eks-test-nodegroup-0-NodeGroup-3OK0XRHS0SA4 size to 4
I0106 22:04:07.483727   51532 factory.go:33] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"f8eb720c-1241-11e9-8127-0233a42c3342", APIVersion:"v1", ResourceVersion:"48396", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: setting group eksctl-eks-test-nodegroup-0-NodeGroup-3OK0XRHS0SA4 size to 4
W0106 22:04:07.709530   51532 clusterstate.go:252] Disabling scale-up for node group eksctl-eks-test-nodegroup-0-NodeGroup-3OK0XRHS0SA4 until 2019-01-06 22:09:07.483285 -0800 PST m=+332.011590914
I0106 22:04:07.709545   51532 factory.go:33] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"f8eb720c-1241-11e9-8127-0233a42c3342", APIVersion:"v1", ResourceVersion:"48396", FieldPath:""}): type: 'Warning' reason: 'FailedToScaleUpGroup' Scale-up failed for group eksctl-eks-test-nodegroup-0-NodeGroup-3OK0XRHS0SA4: ValidationError: New SetDesiredCapacity value 4 is above max value 3 for the AutoScalingGroup.
    status code: 400, request id: 0bc126e5-1242-11e9-bdaa-31a801fb61d3
E0106 22:04:07.710168   51532 static_autoscaler.go:316] Failed to scale up: failed to increase node group size: ValidationError: New SetDesiredCapacity value 4 is above max value 3 for the AutoScalingGroup.
    status code: 400, request id: 0bc126e5-1242-11e9-bdaa-31a801fb61d3

Option 1. --nodes=2:3:k8s-worker-asg-1. CA fails scaling down because it reach min of ASG.

I0106 21:55:41.350524   50789 static_autoscaler.go:353] ip-192-168-89-16.us-west-2.compute.internal is unneeded since 2019-01-06 21:45:37.952451 -0800 PST m=+10.924306654 duration 10m3.171756787s
I0106 21:55:41.350633   50789 static_autoscaler.go:353] ip-192-168-63-220.us-west-2.compute.internal is unneeded since 2019-01-06 21:45:37.952451 -0800 PST m=+10.924306654 duration 10m3.171756787s
I0106 21:55:41.350653   50789 static_autoscaler.go:364] Scale down status: unneededOnly=false lastScaleUpTime=2019-01-06 21:45:27.948718 -0800 PST m=+0.920658372 lastScaleDownDeleteTime=2019-01-06 21:45:27.948718 -0800 PST m=+0.920658474 lastScaleDownFailTime=2019-01-06 21:45:27.948718 -0800 PST m=+0.920658593 scaleDownForbidden=false isDeleteInProgress=false
I0106 21:55:41.350675   50789 static_autoscaler.go:374] Starting scale down
I0106 21:55:41.350705   50789 scale_down.go:600] ip-192-168-63-220.us-west-2.compute.internal was unneeded for 10m3.171756787s
I0106 21:55:41.350727   50789 scale_down.go:600] ip-192-168-89-16.us-west-2.compute.internal was unneeded for 10m3.171756787s
I0106 21:55:41.350776   50789 scale_down.go:819] Scale-down: removing empty node ip-192-168-63-220.us-west-2.compute.internal
I0106 21:55:41.350896   50789 factory.go:33] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"704067f2-123f-11e9-bbab-06ef894b7ba6", APIVersion:"v1", ResourceVersion:"47291", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: removing empty node ip-192-168-63-220.us-west-2.compute.internal
I0106 21:55:41.411900   50789 delete.go:64] Successfully added toBeDeletedTaint on node ip-192-168-63-220.us-west-2.compute.internal
E0106 21:55:41.830057   50789 scale_down.go:869] Problem with empty node deletion: failed to delete ip-192-168-63-220.us-west-2.compute.internal: ValidationError: Currently, desiredSize equals minSize (3). Terminating instance without replacement will violate group's min size constraint. Either set shouldDecrementDesiredCapacity flag to false or lower group's min size.
    status code: 400, request id: de1b03cd-1240-11e9-9b00-739c82dfa8b4
E0106 21:55:41.832861   50789 static_autoscaler.go:396] Failed to scale down: failed to delete at least one empty node: failed to delete ip-192-168-63-220.us-west-2.compute.internal: ValidationError: Currently, desiredSize equals minSize (3). Terminating instance without replacement will violate group's min size constraint. Either set shouldDecrementDesiredCapacity flag to false or lower group's min size.
    status code: 400, request id: de1b03cd-1240-11e9-9b00-739c82dfa8b4
I0106 21:55:41.871769   50789 delete.go:120] Releasing taint {Key:ToBeDeletedByClusterAutoscaler Value:1546840541 Effect:NoSchedule TimeAdded:<nil>} on node ip-192-168-63-220.us-west-2.compute.internal
I0106 21:55:41.901328   50789 delete.go:139] Successfully released toBeDeletedTaint on node ip-192-168-63-220.us-west-2.compute.internal

Option 3. --nodes=10:20:k8s-worker-asg-1. min > ASG.MaxSize
Option 4. --nodes=1:2:k8s-worker-asg-1. max < ASG.MinSize
Have above issues on scaling up & scaling down.

The text was updated successfully, but these errors were encountered:

aleksandra-malinowska · 2019-01-07T12:15:06Z

I may not be 100% up-to-date on changes to AWS cloud provider, but IIRC by design manual configuration (passed via CA flag) overrides ASG settings. Flag can be skipped entirely by using auto-discovery mode.

johanneswuerbach · 2019-01-07T20:56:42Z

Yes, if the min and max is defined manually, those values will be used and not the actual ASG values.

Probably invalid values could be sanitized during the reconcile loop when getting the ASGs from AWS, but I‘m not sure what the expected behavior should be in in those cases. Auto-correct and log or log and crash?

What is your use-case for having defined values, but still want to rely on ASG boundaries?

Jeffwan · 2019-01-07T21:21:01Z

I think defined value should also respect ASG boundaries. Otherwise, it doesn't make sense and make CA behavior abnormally(scale up/down a nodegroup can not be adjusted).

CA managed range should inside range ASG manages. What's the reason to allow invalidate inputs?

 [            [ managed range by CA]               ]
 asg.Min   define.Min        define.Max       asg.Max

johanneswuerbach · 2019-01-07T21:30:55Z

Yes, but what would you expect in this case to happen? Also the input might be only temporarily invalid as the CA can't influence if an ASG is updated via the console or API.

This sounds mainly like the autoscaler could log a warning in those cases and auto-correct the values temporarily, but that sounds more like an optimisation to failed scale attempts without an API call and shouldn't have any other negative effects.

Do you see actually any issues caused by this?

Jeffwan · 2019-01-07T22:10:04Z

@johanneswuerbach True. It could be temporarily invalid and that's also my concern. What I expect is CA respect node config and ASG boundary in every loop. For example, if defined min/max is invalid in this round, it doesn't have to waste resource to try to scale Up/Down. Otherwise, it brings unnecessary errors either in CA and cloudprovider side.

I found out this issue because I carelessly set minSize which is large than my current nodegroup size. Then it refuse to scale down because minSIze reached.

A question, what's the best practice of using CA? people should define min/max in CA and adjust ASG boundary? Or give a wide boundary in ASG and reset min/max in CA based on needs?

johanneswuerbach · 2019-01-07T22:46:13Z

At my company we purely rely on ASG auto-discovery to avoid having multiple, potentially different, configurations for min/max values. If we want to change the values, we generally never touch the CA, but modify the ASG values directly which is reflected in CA after the next sync (max. 60s).

Afaik this manual specification options comes from the GCE origin of autoscaler as Google Managed Instance Groups (MIGs) don't have a min/max value and therefor the CA values are the only way to configure boundaries.

Also auto-discovery was only added later, while manual configuration was always in-place.

Jeffwan · 2019-01-07T23:20:58Z

Thanks for the story. I like auto-discovery much better and it doesn't come with these confusions.
Let me double check code and see any improvement we can do there. I will not wipe user defined boundaries. At least, as you said, logs invalidate configurations and add guidance in documentation.

aleksandra-malinowska · 2019-01-08T12:44:02Z

Yeah, it was historically the only way to pass the limits. GKE doesn't use it either, but I believe it also makes it more friendly to implement support for a new cloud provider (which may not necessarily support any kind of scaling groups). As for the overriding use-case, IIRC ASGs prevent the user from manually setting the value outside of min/max autoscaling range. If true, the users may prefer to use different settings for autoscaling and different in cloud provider, to be allowed to adjust the size manually in case of emergency (e.g. a truly unexpected spike).

Jeffwan · 2019-04-03T20:40:17Z

As documentation change was merged, I feel good at this moment and I will just close this issue.

Jeffwan mentioned this issue Jan 7, 2019

cluster-autoscaler 1.2.2 not scaling AWS ASG to zero #1555

Closed

aleksandra-malinowska added area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider labels Jan 7, 2019

Jeffwan mentioned this issue Jan 7, 2019

REQUEST: New membership for Jeffwan kubernetes/org#334

Closed

6 tasks

Jeffwan mentioned this issue Feb 1, 2019

AWS EKS CloudFormation NodeAutoScalingGroupMinSize NodeAutoScalingGroupMaxSize #1640

Closed

Jeffwan mentioned this issue Feb 12, 2019

Update AWS documentation and CA version in examples #1670

Merged

Jeffwan mentioned this issue Apr 2, 2019

Behavior wrt min/max limits is not publicly documented #1849

Closed

Jeffwan closed this as completed Apr 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nodegroup min/max should respect both explicit setting and ASG api return #1559

Nodegroup min/max should respect both explicit setting and ASG api return #1559

Jeffwan commented Jan 7, 2019

aleksandra-malinowska commented Jan 7, 2019

johanneswuerbach commented Jan 7, 2019

Jeffwan commented Jan 7, 2019 •

edited

Loading

johanneswuerbach commented Jan 7, 2019 •

edited

Loading

Jeffwan commented Jan 7, 2019 •

edited

Loading

johanneswuerbach commented Jan 7, 2019

Jeffwan commented Jan 7, 2019

aleksandra-malinowska commented Jan 8, 2019

Jeffwan commented Apr 3, 2019

Nodegroup min/max should respect both explicit setting and ASG api return #1559

Nodegroup min/max should respect both explicit setting and ASG api return #1559

Comments

Jeffwan commented Jan 7, 2019

aleksandra-malinowska commented Jan 7, 2019

johanneswuerbach commented Jan 7, 2019

Jeffwan commented Jan 7, 2019 • edited Loading

johanneswuerbach commented Jan 7, 2019 • edited Loading

Jeffwan commented Jan 7, 2019 • edited Loading

johanneswuerbach commented Jan 7, 2019

Jeffwan commented Jan 7, 2019

aleksandra-malinowska commented Jan 8, 2019

Jeffwan commented Apr 3, 2019

Jeffwan commented Jan 7, 2019 •

edited

Loading

johanneswuerbach commented Jan 7, 2019 •

edited

Loading

Jeffwan commented Jan 7, 2019 •

edited

Loading