AWS EKS cluster autoscaler only using ASGs that have at least one node #1676

leonsodhi-lf · 2019-02-12T16:36:47Z

CloudProvider: AWS EKS
Kubernetes version: 1.11.5
Cluster Autoscaler version: 1.3.6

With the autoscaler options shown below and multiple identical ASGs (other than AZ), new nodes always end up being created using an ASG that has at least 1 node. Looking at the logs, autoscaler considers these ASGs to have the least memory wastage:

waste.go:57] Expanding Node Group m5_large_az1 would waste 95.00% CPU, 99.09% Memory, 97.04% Blended
waste.go:57] Expanding Node Group m5_large_az2 would waste 95.00% CPU, 99.15% Memory, 97.07% Blended
waste.go:57] Expanding Node Group m5_large_az3 would waste 95.00% CPU, 99.15% Memory, 97.07% Blended
waste.go:57] Expanding Node Group m5_large_az4 would waste 95.00% CPU, 99.15% Memory, 97.07% Blended

If I manually set the desired count on the m5_large_az2 ASG to 1, I then see:

I0212 14:25:26.668623 1 waste.go:57] Expanding Node Group m5_large_az1 would waste 95.00% CPU, 99.09% Memory, 97.04% Blended
I0212 14:25:26.668656 1 waste.go:57] Expanding Node Group m5_large_az2 would waste 95.00% CPU, 99.09% Memory, 97.04% Blended
I0212 14:25:26.668668 1 waste.go:57] Expanding Node Group m5_large_az3 would waste 95.00% CPU, 99.15% Memory, 97.07% Blended
I0212 14:25:26.668678 1 waste.go:57] Expanding Node Group m5_large_az4 would waste 95.00% CPU, 99.15% Memory, 97.07% Blended

I believe this has started happening since I added taints to the nodes, and tolerations and nodeselectors to the pods.

Autoscaler options

--v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --skip-nodes-with-system-pods=false
- --balance-similar-node-groups=true
- --expander=least-waste

The text was updated successfully, but these errors were encountered:

aleksandra-malinowska · 2019-02-12T17:17:14Z

That's probably because actual node's memory is slightly different from predicted memory based on machine type (due to kernel reservation, specific to a given OS/machine combination). It follows that least waste expander prefers existing groups as it believes they have less memory. #1643 will somehow improve the prediction by caching node templates, but it's not a complete solution: if there was any node in this group since last autoscaler restart, resources from that node will be used in simulations.

Some options to fix this:

use a different expander or modify least waste expander to ignore negligible differences (e.g. less than 1%?)
follow up on Adding ability to override allocatable resources via ASG tags. #1656 and add overriding memory via tags, set tags based on actual resources (makes configuration harder)
rely on caching in future versions or better yet, implement caching resources based on machine type in cloud provider module (still not a complete solution, but would work across groups - and we only really care about the case with both empty and non-empty node groups with the same machine type)

I would probably go with (1), others are more complicated.

leonsodhi-lf · 2019-02-12T19:00:01Z

Thanks @aleksandra-malinowska. Focusing on (1), running another test with a pod that requests most of a node's memory resources seems to produce a larger difference between actual vs. predicted:

I0212 15:19:29.299066 1 waste.go:57] Expanding Node Group m5_large_az1 would waste 95.00% CPU, 30.25% Memory, 62.62% Blended
I0212 15:19:29.299080 1 waste.go:57] Expanding Node Group m5_large_az2 would waste 95.00% CPU, 25.63% Memory, 60.31% Blended
I0212 15:19:29.299091 1 waste.go:57] Expanding Node Group m5_large_az3 would waste 95.00% CPU, 30.25% Memory, 62.62% Blended
I0212 15:19:29.299098 1 waste.go:57] Expanding Node Group m5_large_az4 would waste 95.00% CPU, 30.25% Memory, 62.62% Blended

The rest of the suggested options sound promising, though. I'll think on it.

leonsodhi-lf · 2019-07-20T21:45:16Z

@aleksandra-malinowska, in revisiting this I came up with an option 4 that, although not perfect, seems simple to implement and configure, and reliable.

Autoscaler could compare a label value on each node (e.g. cluster-autoscaler.kubernetes.io/hardware-configuration-id) and skip the resource check if the values are equivalent. A similar implementation is already used for GKE on 1.14.X.

Does this seem reasonable?

aleksandra-malinowska · 2019-07-26T15:05:37Z

Does this seem reasonable?

Seems so, @MaciekPytel WDYT?

Although I wonder if it could be more generic than hard-coding a constant for each cloud provider. For example, why not make the key to use for node group comparison configurable?

fejta-bot · 2019-10-24T15:33:54Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

leonsodhi-lf · 2019-10-24T15:59:11Z

/remove-lifecycle stale

leonsodhi-lf · 2019-10-24T16:00:27Z

@MaciekPytel would you mind having a look at this issue? I've forked and applied the fix I proposed, so let me know if a PR would be preferable.

JacobHenner · 2019-12-21T15:31:09Z

This issue extends beyond detected memory capacity and template-estimated memory capacity. The issues I've found so far which preclude even scale-ups from 0 include:

The template-based memory capacity is greater than the kubelet reported memory capacity, and on larger instances, this difference is greater than the toleration for similar amounts of memory. (For example, an m5.4xlarge instance is allocated 64GiB, but the OS presents approx 2.5GiB less)
The k8s API reports capacity information that's not generated when the autoscaler simulates a templated node (e.g. hugepages). These capacity items are compared (for equality) to determine if ASGs are similar, so the similarity test will always fail.
Similarly, the templated node reports items that are not present in the k8s API - e.g. the nvidia.com/gpu resource.
Some of these capacity resources exist in both places, but are simulated with constants which don't match reality (e.g. https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L362) - there's even a comment stating this value is contrived.
Some capacity items aren't simulated and need to be set using ASG tags - specifically ephemeral-storage. This isn't documented in the general or AWS FAQ.

fejta-bot · 2020-03-20T16:21:57Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

JacobHenner · 2020-04-08T20:48:53Z

/remove-lifecycle stale

fejta-bot · 2020-07-07T20:58:50Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-08-06T21:39:06Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-09-05T22:21:37Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-09-05T22:21:49Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

…etes#1676) * Wait for webhooks server using probes * Delete KueueReadyForTesting * revert the setting of healthz * Add a comment about the readyz probe

aleksandra-malinowska added area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider labels Feb 12, 2019

leonsodhi-lf closed this as completed Feb 12, 2019

leonsodhi-lf mentioned this issue Jun 25, 2019

Uneven scale-up of AWS ASG's #2020

Closed

leonsodhi-lf reopened this Jul 20, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 24, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 24, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 20, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 8, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 7, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 6, 2020

k8s-ci-robot closed this as completed Sep 5, 2020

This was referenced Jun 25, 2021

AWS - CA does not tolerate balance-similar-node-groups when ASG min and desired capacity is 0 #2503

Closed

Redesign balance-similar-node-groups logic #4165

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS EKS cluster autoscaler only using ASGs that have at least one node #1676

AWS EKS cluster autoscaler only using ASGs that have at least one node #1676

leonsodhi-lf commented Feb 12, 2019 •

edited

Loading

aleksandra-malinowska commented Feb 12, 2019

leonsodhi-lf commented Feb 12, 2019

leonsodhi-lf commented Jul 20, 2019

aleksandra-malinowska commented Jul 26, 2019

fejta-bot commented Oct 24, 2019

leonsodhi-lf commented Oct 24, 2019

leonsodhi-lf commented Oct 24, 2019

JacobHenner commented Dec 21, 2019

fejta-bot commented Mar 20, 2020

JacobHenner commented Apr 8, 2020

fejta-bot commented Jul 7, 2020

fejta-bot commented Aug 6, 2020

fejta-bot commented Sep 5, 2020

k8s-ci-robot commented Sep 5, 2020

AWS EKS cluster autoscaler only using ASGs that have at least one node #1676

AWS EKS cluster autoscaler only using ASGs that have at least one node #1676

Comments

leonsodhi-lf commented Feb 12, 2019 • edited Loading

Autoscaler options

aleksandra-malinowska commented Feb 12, 2019

leonsodhi-lf commented Feb 12, 2019

leonsodhi-lf commented Jul 20, 2019

aleksandra-malinowska commented Jul 26, 2019

fejta-bot commented Oct 24, 2019

leonsodhi-lf commented Oct 24, 2019

leonsodhi-lf commented Oct 24, 2019

JacobHenner commented Dec 21, 2019

fejta-bot commented Mar 20, 2020

JacobHenner commented Apr 8, 2020

fejta-bot commented Jul 7, 2020

fejta-bot commented Aug 6, 2020

fejta-bot commented Sep 5, 2020

k8s-ci-robot commented Sep 5, 2020

leonsodhi-lf commented Feb 12, 2019 •

edited

Loading