Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodegroup min/max should respect both explicit setting and ASG api return #1559

Closed
Jeffwan opened this issue Jan 7, 2019 · 9 comments
Closed
Labels
area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider

Comments

@Jeffwan
Copy link
Contributor

Jeffwan commented Jan 7, 2019

Cluster scaler supports explicit nodeGroup min/max setting from user like following,

--nodes=1:10:k8s-worker-asg-1

The problem is user can try arbitrary min/max combination here. I think we should consider ASG API return as well at the time register asg. But it brings another problem once ASG min/max is updated on cloud provider side, explicitlyConfigured might be wiped. Could someone help verify if this is by design or should we fix this issue? If that's reasonable, I can submit a PR to address this issue.

if !m.explicitlyConfigured[asg.AwsRef] {
  existing.minSize = asg.minSize
  existing.maxSize = asg.maxSize
} else {
  minSize := cloudprovider.MaxOf(asg.minSize, existing.minSize)
  maxSize := cloudprovider.MinOf(asg.maxSize, existing.maxSize)

  if minSize <= maxSize {
    existing.minSize = minSize
    existing.maxSize = maxSize
  } else {
    klog.V(4).Infof("Min/Max %d/%d doesn't make sense", minSize, maxSize)
  }
}

Let's assume ASG min/max setting is 3/3, desired instance 3.

Option 1. --nodes=3:4:k8s-worker-asg-1. CA fails scaling up because it reach max of ASG.

I0106 22:04:07.483292   51532 scale_up.go:262] Pod default/gpu-deployment-687f9486f-zcz9d is unschedulable
I0106 22:04:07.483454   51532 scale_up.go:304] Upcoming 0 nodes
I0106 22:04:07.483551   51532 scale_up.go:427] Best option to resize: eksctl-eks-test-nodegroup-0-NodeGroup-3OK0XRHS0SA4
I0106 22:04:07.483564   51532 scale_up.go:431] Estimated 1 nodes needed in eksctl-eks-test-nodegroup-0-NodeGroup-3OK0XRHS0SA4
I0106 22:04:07.483582   51532 scale_up.go:533] Final scale-up plan: [{eksctl-eks-test-nodegroup-0-NodeGroup-3OK0XRHS0SA4 3->4 (max: 4)}]
I0106 22:04:07.483610   51532 scale_up.go:690] Scale-up: setting group eksctl-eks-test-nodegroup-0-NodeGroup-3OK0XRHS0SA4 size to 4
I0106 22:04:07.483649   51532 auto_scaling_groups.go:203] Setting asg eksctl-eks-test-nodegroup-0-NodeGroup-3OK0XRHS0SA4 size to 4
I0106 22:04:07.483727   51532 factory.go:33] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"f8eb720c-1241-11e9-8127-0233a42c3342", APIVersion:"v1", ResourceVersion:"48396", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: setting group eksctl-eks-test-nodegroup-0-NodeGroup-3OK0XRHS0SA4 size to 4
W0106 22:04:07.709530   51532 clusterstate.go:252] Disabling scale-up for node group eksctl-eks-test-nodegroup-0-NodeGroup-3OK0XRHS0SA4 until 2019-01-06 22:09:07.483285 -0800 PST m=+332.011590914
I0106 22:04:07.709545   51532 factory.go:33] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"f8eb720c-1241-11e9-8127-0233a42c3342", APIVersion:"v1", ResourceVersion:"48396", FieldPath:""}): type: 'Warning' reason: 'FailedToScaleUpGroup' Scale-up failed for group eksctl-eks-test-nodegroup-0-NodeGroup-3OK0XRHS0SA4: ValidationError: New SetDesiredCapacity value 4 is above max value 3 for the AutoScalingGroup.
    status code: 400, request id: 0bc126e5-1242-11e9-bdaa-31a801fb61d3
E0106 22:04:07.710168   51532 static_autoscaler.go:316] Failed to scale up: failed to increase node group size: ValidationError: New SetDesiredCapacity value 4 is above max value 3 for the AutoScalingGroup.
    status code: 400, request id: 0bc126e5-1242-11e9-bdaa-31a801fb61d3

Option 1. --nodes=2:3:k8s-worker-asg-1. CA fails scaling down because it reach min of ASG.

I0106 21:55:41.350524   50789 static_autoscaler.go:353] ip-192-168-89-16.us-west-2.compute.internal is unneeded since 2019-01-06 21:45:37.952451 -0800 PST m=+10.924306654 duration 10m3.171756787s
I0106 21:55:41.350633   50789 static_autoscaler.go:353] ip-192-168-63-220.us-west-2.compute.internal is unneeded since 2019-01-06 21:45:37.952451 -0800 PST m=+10.924306654 duration 10m3.171756787s
I0106 21:55:41.350653   50789 static_autoscaler.go:364] Scale down status: unneededOnly=false lastScaleUpTime=2019-01-06 21:45:27.948718 -0800 PST m=+0.920658372 lastScaleDownDeleteTime=2019-01-06 21:45:27.948718 -0800 PST m=+0.920658474 lastScaleDownFailTime=2019-01-06 21:45:27.948718 -0800 PST m=+0.920658593 scaleDownForbidden=false isDeleteInProgress=false
I0106 21:55:41.350675   50789 static_autoscaler.go:374] Starting scale down
I0106 21:55:41.350705   50789 scale_down.go:600] ip-192-168-63-220.us-west-2.compute.internal was unneeded for 10m3.171756787s
I0106 21:55:41.350727   50789 scale_down.go:600] ip-192-168-89-16.us-west-2.compute.internal was unneeded for 10m3.171756787s
I0106 21:55:41.350776   50789 scale_down.go:819] Scale-down: removing empty node ip-192-168-63-220.us-west-2.compute.internal
I0106 21:55:41.350896   50789 factory.go:33] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"704067f2-123f-11e9-bbab-06ef894b7ba6", APIVersion:"v1", ResourceVersion:"47291", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: removing empty node ip-192-168-63-220.us-west-2.compute.internal
I0106 21:55:41.411900   50789 delete.go:64] Successfully added toBeDeletedTaint on node ip-192-168-63-220.us-west-2.compute.internal
E0106 21:55:41.830057   50789 scale_down.go:869] Problem with empty node deletion: failed to delete ip-192-168-63-220.us-west-2.compute.internal: ValidationError: Currently, desiredSize equals minSize (3). Terminating instance without replacement will violate group's min size constraint. Either set shouldDecrementDesiredCapacity flag to false or lower group's min size.
    status code: 400, request id: de1b03cd-1240-11e9-9b00-739c82dfa8b4
E0106 21:55:41.832861   50789 static_autoscaler.go:396] Failed to scale down: failed to delete at least one empty node: failed to delete ip-192-168-63-220.us-west-2.compute.internal: ValidationError: Currently, desiredSize equals minSize (3). Terminating instance without replacement will violate group's min size constraint. Either set shouldDecrementDesiredCapacity flag to false or lower group's min size.
    status code: 400, request id: de1b03cd-1240-11e9-9b00-739c82dfa8b4
I0106 21:55:41.871769   50789 delete.go:120] Releasing taint {Key:ToBeDeletedByClusterAutoscaler Value:1546840541 Effect:NoSchedule TimeAdded:<nil>} on node ip-192-168-63-220.us-west-2.compute.internal
I0106 21:55:41.901328   50789 delete.go:139] Successfully released toBeDeletedTaint on node ip-192-168-63-220.us-west-2.compute.internal

Option 3. --nodes=10:20:k8s-worker-asg-1. min > ASG.MaxSize
Option 4. --nodes=1:2:k8s-worker-asg-1. max < ASG.MinSize
Have above issues on scaling up & scaling down.

@aleksandra-malinowska
Copy link
Contributor

I may not be 100% up-to-date on changes to AWS cloud provider, but IIRC by design manual configuration (passed via CA flag) overrides ASG settings. Flag can be skipped entirely by using auto-discovery mode.

@aleksandra-malinowska aleksandra-malinowska added area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider labels Jan 7, 2019
@johanneswuerbach
Copy link
Contributor

Yes, if the min and max is defined manually, those values will be used and not the actual ASG values.

Probably invalid values could be sanitized during the reconcile loop when getting the ASGs from AWS, but I‘m not sure what the expected behavior should be in in those cases. Auto-correct and log or log and crash?

What is your use-case for having defined values, but still want to rely on ASG boundaries?

@Jeffwan
Copy link
Contributor Author

Jeffwan commented Jan 7, 2019

I think defined value should also respect ASG boundaries. Otherwise, it doesn't make sense and make CA behavior abnormally(scale up/down a nodegroup can not be adjusted).

CA managed range should inside range ASG manages. What's the reason to allow invalidate inputs?

 [            [ managed range by CA]               ]
 asg.Min   define.Min        define.Max       asg.Max

@johanneswuerbach
Copy link
Contributor

johanneswuerbach commented Jan 7, 2019

Yes, but what would you expect in this case to happen? Also the input might be only temporarily invalid as the CA can't influence if an ASG is updated via the console or API.

This sounds mainly like the autoscaler could log a warning in those cases and auto-correct the values temporarily, but that sounds more like an optimisation to failed scale attempts without an API call and shouldn't have any other negative effects.

Do you see actually any issues caused by this?

@Jeffwan
Copy link
Contributor Author

Jeffwan commented Jan 7, 2019

@johanneswuerbach True. It could be temporarily invalid and that's also my concern. What I expect is CA respect node config and ASG boundary in every loop. For example, if defined min/max is invalid in this round, it doesn't have to waste resource to try to scale Up/Down. Otherwise, it brings unnecessary errors either in CA and cloudprovider side.

I found out this issue because I carelessly set minSize which is large than my current nodegroup size. Then it refuse to scale down because minSIze reached.

A question, what's the best practice of using CA? people should define min/max in CA and adjust ASG boundary? Or give a wide boundary in ASG and reset min/max in CA based on needs?

@johanneswuerbach
Copy link
Contributor

At my company we purely rely on ASG auto-discovery to avoid having multiple, potentially different, configurations for min/max values. If we want to change the values, we generally never touch the CA, but modify the ASG values directly which is reflected in CA after the next sync (max. 60s).

Afaik this manual specification options comes from the GCE origin of autoscaler as Google Managed Instance Groups (MIGs) don't have a min/max value and therefor the CA values are the only way to configure boundaries.

Also auto-discovery was only added later, while manual configuration was always in-place.

@Jeffwan
Copy link
Contributor Author

Jeffwan commented Jan 7, 2019

Thanks for the story. I like auto-discovery much better and it doesn't come with these confusions.
Let me double check code and see any improvement we can do there. I will not wipe user defined boundaries. At least, as you said, logs invalidate configurations and add guidance in documentation.

@aleksandra-malinowska
Copy link
Contributor

Yeah, it was historically the only way to pass the limits. GKE doesn't use it either, but I believe it also makes it more friendly to implement support for a new cloud provider (which may not necessarily support any kind of scaling groups). As for the overriding use-case, IIRC ASGs prevent the user from manually setting the value outside of min/max autoscaling range. If true, the users may prefer to use different settings for autoscaling and different in cloud provider, to be allowed to adjust the size manually in case of emergency (e.g. a truly unexpected spike).

@Jeffwan
Copy link
Contributor Author

Jeffwan commented Apr 3, 2019

As documentation change was merged, I feel good at this moment and I will just close this issue.

@Jeffwan Jeffwan closed this as completed Apr 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider
Projects
None yet
Development

No branches or pull requests

3 participants