Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scaling from 0 only scales one of the ASG's detected in spite of balance-similar-node-groups #5352

Closed
dindurthy opened this issue Dec 5, 2022 · 6 comments
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@dindurthy
Copy link

dindurthy commented Dec 5, 2022

We encountered the same issue as #4516.

Which component are you using?:
Cluster Autoscaler

What version of the component are you using?:
v1.22.0

Component version:

What k8s version are you using (kubectl version)?:
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.15-eks-fb459a0", GitCommit:"be82fa628e60d024275efaa239bfe53a9119c2d9", GitTreeState:"clean", BuildDate:"2022-10-24T20:33:23Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?:
AWS

What did you expect to happen?:
All ASG's that the cluster-autoscaler detects will be used for scale-up from 0 in a balanced way

What happened instead?:
even when tagging ASG (with propagation) with the "k8s.io/cluster-autoscaler/node-template/label/" tag
and using the flag "--balance-similar-node-groups=true"

when scaled, it's using only one of the ASG's, causing it to be single-AZ(if ASG is per AZ)
so all except 1 ASG are not in use

Here's an image before/after we set the ASG min to 1 for all ASGs (~9:30)
image

How to reproduce it (as minimally and precisely as possible):
create 3 ASG's, set to 0.
raise pods that need to be on the instances from that ASG.
see the CA raise instances only on one of the ASG's

We ran the command like so:

          command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=aws
            - --skip-nodes-with-local-storage=false
            - --expander=least-waste
            - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/redoxlocal-eks-cluster
            - --balance-similar-node-groups
            - --skip-nodes-with-system-pods=false

Node labels on the node are as follows:

beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=m5.xlarge
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=us-east-1
k8s.io/cloud-provider-aws=486e6037dbcd0d2fd504654a5eafd1b9
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-172-31-210-81
kubernetes.io/os=linux
node.kubernetes.io/instance-type=m5.xlarge
purpose=ci
topology.kubernetes.io/region=us-east-1
vpc.amazonaws.com/has-trunk-attached=false
failure-domain.beta.kubernetes.io/zone=us-east-1c # or 'us-east-1a' or 'us-east-1b' for other asg nodes
topology.kubernetes.io/zone=us-east-1c # or 'us-east-1a' or 'us-east-1b' for other asg nodes

Taints:

isolate=nonci:NoSchedule

ASG cluster-autoscaler tags

k8s.io/cluster-autoscaler/node-template/label/purpose | ci 
k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zone | us-east-1b
k8s.io/cluster-autoscaler/node-template/taint/isolate | nonci:NoSchedule

Pod schedules with:

  nodeSelector:
    purpose: ci
  tolerations:
  - key: "isolate"
    operator: "Equal"
    value: "nonci"
    effect: "NoSchedule"

Anything else we need to know?:

My first thought is that there are more tags we need to represent on the ASG, but I didn't see clear documentation on what's calculated, e.g. kubernetes.io/hostname, and what we need to explicitly set. All the node tags and taints that we manage are established on the ASGs. For an ASG scaled >0, I understand the CA picks a node and generates a template of labels and taints for the corresponding group. This is highly reliable, because the node is running and is a live representation. For an ASG scaled to zero, we attempt to set some labels and taints on the ASG to help the CA infer what a node would look like if it scaled up, but many labels are set via the EKS bootstrap that we don’t directly manage and so maybe the CA isn’t interpolating the full list of labels. This is unfortunate because even if we set the ASG accurately with every label on a node today, that is fragile. That list may change unpredictably as eks changes and it will be hard for us to detect the breakage.

My second thought is that this is least-waste specific, as we had that setting in common with issue 4516.

@dindurthy dindurthy added the kind/bug Categorizes issue or PR as related to a bug. label Dec 5, 2022
@MohanRajRavi
Copy link

we do face the same problem, but @dindurthy the balance-similar-node-groups default value is false, could you try passing true explicitly?

@bpineau
Copy link
Contributor

bpineau commented Mar 7, 2023

This sounds like a know issue (see attempts to address it in #1021 , #2892 , #3609 , #3608 , #3761 , #4000 and probably more): balance-similar-node-groups verifies that the various candidate nodegroups looks indeed similar, but this test frequently fails (on AWS and Azure at least) because real world nodes capacity is much lower than the raw capacity offered by the instances (for instance due to memory reserved for the kernel, or for the kubelet). Anything altering real world nodes labels will also cause the "balance-similar-node-groups" comparisons to fail.

Recent cluster-autoscaler versions have means to work around the issue in many cases:

  • We can allow larger resources (well, memory) differences by setting --memory-difference-ratio/--max-free-difference-ratio/--max-allocatable-difference-ratio ; the node.kubernetes.io/instance-type label (also considered during the comparison) might be enough to expect nodes to have the same capacity after they are upscaled, in most cases.
  • We can also pass estimates for effective memory and cpu overhead within labels attached to ASG (or VMSS)
  • We can tell it to ignore irrelevant real world labels with --balancing-ignore-label (or even, tell it to only consider --balancing-label)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 5, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 5, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

6 participants