Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow modifying the minSize and maxSize value of a NodeGroup without changing desiredCapacity #2332

Closed
TBBle opened this issue Jun 15, 2020 · 8 comments · Fixed by #3511 or #3543
Closed
Assignees
Labels
area/autoscaler good first issue Good for newcomers kind/improvement priority/backlog Not staffed at the moment. Help wanted.

Comments

@TBBle
Copy link
Contributor

TBBle commented Jun 15, 2020

Why do you want this feature?

I use the Cluster Autoscaler to manage the number of nodes in my nodegroups. However, I want to be able to change the limits (min and max) without changing the number of currently active nodes.

This appears to be what was requested in #809 and #1893 as well. They were closed by #2022 which added --nodes-max and --nodes-min, but left --nodes as mandatory.

What feature/behavior/change do you want?

eksctl --cluster mycluster scale nodegroup --name ng-a --nodes-max 100

should change the maximum size to 100, but not affect current or minimum sizes.

As noted in #809's comments, this is equivalent to

aws autoscaling update-auto-scaling-group --auto-scaling-group-name $ASG_NAME --max-size=20

for $ASG_NAME of eksctl-<mycluster>-nodegroup-<ng-a>-NodeGroup-<HASH>.

@TBBle TBBle added the kind/feature New feature or request label Jun 15, 2020
@martina-if martina-if added the priority/backlog Not staffed at the moment. Help wanted. label Jul 3, 2020
@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Jan 18, 2021
@TBBle
Copy link
Contributor Author

TBBle commented Jan 19, 2021

I'm unaware of this being resolved at this time.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@TBBle
Copy link
Contributor Author

TBBle commented Apr 6, 2021

As a status update, #3511 has introduced the ability to change the minSize and maxSize of a NodeGroup without changing desiredCapacity, but has the side-effect that the ASG's size is reset to the desiredCapacity of the NodeGroup (by updating CloudFormation).

The Cluster Autoscaler interaction we need for the use case of this ticket is that somehow we need the minSize/maxSize to come from eksctl -> CloudFormation -> ASG -> Cluster Autoscaler, but desiredSize needs to come from Cluster AutoScaler -> ASG -> (?) -> Cloud Formation.

This is perhaps more complex to fix than I envisaged when I opened the ticket, because we can see in #3532 that updating the NodeGroup CF stack at all will reset the ASG's desired size to the eksctl/CF desired size, losing the Cluster Autoscaler's size value.

So to make the UX work as I envisaged, eksctl would need to get the ASG's current desired size, and use that as the "desired capacity" when updating the Nodegroup CF Stack, effectively applying Cluster Autoscaler's changes back from the ASG to the Nodegroup CF Stack.

#2331 is related to this approach.


It'd be nicer if AWS had a way to say "Cloud Formation stack doesn't manage 'desiredCapacity' of the ASG it templates. That would also fix the fact that Cluster Autoscaler will trigger the Cloud Formation "drift check" because it updates the ASG.

DesiredCapacity is optional in an ASG, but I don't know what the behaviour is if you don't provide it, or if not providing it in CF and then providing it directly will be any better, or if CF will delete the value when it is reapplied.

For that idea to make sense, eksctl scale would have to apply the desired capacity value directly to the ASG, and when first creating the group, the ASG DesiredCapcity would need to be set on the ASG to MinSize, unless AWS does that automatically in the absence of DesiredCapacity.

And of course, I suspect this would not be generally applicable in eksctl, but only for NodeGroups intended to be managed by Cluster Autoscaler, so that's even more complexity. For the user-side it could be tied to the tagForAutoScaler option mentioned in #2263, making it more a "clusterAutoscalerEnabled" option for the NodeGroup that both applies appropriate tags from the node labels given in eksctl, and also makes CF non-authoritative for ASG DesiredCapacity in favour of CA or eksctl scale changes.

@aclevername
Copy link
Contributor

As a status update, #3511 has introduced the ability to change the minSize and maxSize of a NodeGroup without changing desiredCapacity, but has the side-effect that the ASG's size is reset to the desiredCapacity of the NodeGroup (by updating CloudFormation).

The Cluster Autoscaler interaction we need for the use case of this ticket is that somehow we need the minSize/maxSize to come from eksctl -> CloudFormation -> ASG -> Cluster Autoscaler, but desiredSize needs to come from Cluster AutoScaler -> ASG -> (?) -> Cloud Formation.

This is perhaps more complex to fix than I envisaged when I opened the ticket, because we can see in #3532 that updating the NodeGroup CF stack at all will reset the ASG's desired size to the eksctl/CF desired size, losing the Cluster Autoscaler's size value.

So to make the UX work as I envisaged, eksctl would need to get the ASG's current desired size, and use that as the "desired capacity" when updating the Nodegroup CF Stack, effectively applying Cluster Autoscaler's changes back from the ASG to the Nodegroup CF Stack.

#2331 is related to this approach.

Yeah this is quite frustrating. I think the best solution is for eksctl to treat the ASG as the source of truth for min/max/desired after the nodegroups created. We have a similar PR open currently to change the eksctl get nodegroup output to fetch desired from the ASG #3515

It'd be nicer if AWS had a way to say "Cloud Formation stack doesn't manage 'desiredCapacity' of the ASG it templates. That would also fix the fact that Cluster Autoscaler will trigger the Cloud Formation "drift check" because it updates the ASG.

DesiredCapacity is optional in an ASG, but I don't know what the behaviour is if you don't provide it, or if not providing it in CF and then providing it directly will be any better, or if CF will delete the value when it is reapplied.

DesiredCapacity would default to minSize if not specified https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-as-group.html#cfn-as-group-desiredcapacity

@TBBle
Copy link
Contributor Author

TBBle commented Apr 8, 2021

DesiredCapacity would default to minSize if not specified https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-as-group.html#cfn-as-group-desiredcapacity

Interesting. I expect that a CF update that does not include DesiredCapacity would not cause this to be reset to MinSize, i.e. that default is only used when creating the ASG from CF. The notes on the UpdateAutoScalingGroup API imply that it works this way, assuming CF is calling that API when a CF stack's ASG template is updated, which seems the sanest implementation anyway.

@aclevername
Copy link
Contributor

aclevername commented Apr 8, 2021

DesiredCapacity would default to minSize if not specified https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-as-group.html#cfn-as-group-desiredcapacity

Interesting. I expect that a CF update that does not include DesiredCapacity would not cause this to be reset to MinSize, i.e. that default is only used when creating the ASG from CF. The notes on the UpdateAutoScalingGroup API imply that it works this way, assuming CF is calling that API when a CF stack's ASG template is updated, which seems the sanest implementation anyway.

it might be, although I've found cloudformation often does things you don't expect 😉

I'm going to go ahead with changing the scale command to use the ASG as the source of truth, I think its inevitable that the values in the CF stack will go out of date.

@aclevername aclevername linked a pull request Apr 8, 2021 that will close this issue
7 tasks
@aclevername
Copy link
Contributor

fix should be out in todays 0.45.0-rc.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/autoscaler good first issue Good for newcomers kind/improvement priority/backlog Not staffed at the moment. Help wanted.
Projects
None yet
5 participants