Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Labels match but Cluster Autoscaler says "are not similar, labels do not match" when trying to balance similar node groups. #6954

Closed
nicksecurity opened this issue Jun 21, 2024 · 7 comments
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@nicksecurity
Copy link

Which component are you using?:
cluster-autoscaler

What version of the component are you using?:
Component version: 1.28.5

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Client Version: v1.28.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.9-eks-036c24b

What environment is this in?:
AWS, EKS using EC2

What did you expect to happen?:
When new nodes are added I expect them to be balanced on to 3 similar node groups, which have the same labels.

What happened instead?:
I have 3 node groups, one per AZ, the new nodes are only added to 1 node group.

The error says the labels are different, but I've checked them all, and except for a couple I've excluded, they are similar.

I0621 14:19:48.338960 1 compare_nodegroups.go:157] nodes template-node-for-eks-nodegroup-4-128-1ec80b90 and template-node-for-eks-nodegroup-3-128-f0c80b8d are not similar, labels do not match

How to reproduce it (as minimally and precisely as possible):
I scale up the pods so it will add several new nodes and then check to see which nodegroup it added them to.

Anything else we need to know?:
No

@nicksecurity nicksecurity added the kind/bug Categorizes issue or PR as related to a bug. label Jun 21, 2024
@adrianmoisey
Copy link
Member

/area cluster-autoscaler

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 20, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 20, 2024
@jbilliau-rcd
Copy link

Having same issue when testing out cluster-autoscaler 1.29.4....I have 3 node groups, one per AZ, and spun up 20 pods....it increased only ONE node group by 11?

ster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.182407       1 klogx.go:87] Pod beta-whale/beta-whale-5fb675658c-n9kls is unschedulable
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.182410       1 klogx.go:87] Pod beta-whale/beta-whale-5fb675658c-hmfr7 is unschedulable
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.182412       1 klogx.go:87] Pod beta-whale/beta-whale-5fb675658c-zx4gc is unschedulable
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.182414       1 klogx.go:87] Pod beta-whale/beta-whale-5fb675658c-nqr45 is unschedulable
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.182416       1 klogx.go:87] Pod beta-whale/beta-whale-5fb675658c-krgdk is unschedulable
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.182423       1 klogx.go:87] Pod kube-system/overprovisioning-757f8f8fbc-9bftr is unschedulable
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.182425       1 klogx.go:87] 1 other pods are also unschedulable
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.184241       1 orchestrator.go:108] Upcoming 0 nodes
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.184397       1 compare_nodegroups.go:157] nodes template-node-for-eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6-1335741201004455891 and template-node-for-eks-spot-jason20241030133218207200000005-dac96ed0-43e6-8b44-fa5d-8ffe23060302-574218763908410739 are not similar, labels do not match
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.184413       1 compare_nodegroups.go:157] nodes template-node-for-eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6-1335741201004455891 and template-node-for-eks-spot-jason20241030133218211400000007-cec96ed0-43eb-c9e9-8f1c-1d9adaa3c01b-1772456504083316692 are not similar, labels do not match
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.185704       1 compare_nodegroups.go:157] nodes template-node-for-eks-spot-jason20241030133218207200000005-dac96ed0-43e6-8b44-fa5d-8ffe23060302-574218763908410739 and template-node-for-eks-spot-jason20241030133218211400000007-cec96ed0-43eb-c9e9-8f1c-1d9adaa3c01b-1772456504083316692 are not similar, labels do not match
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.185720       1 compare_nodegroups.go:157] nodes template-node-for-eks-spot-jason20241030133218207200000005-dac96ed0-43e6-8b44-fa5d-8ffe23060302-574218763908410739 and template-node-for-eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6-1335741201004455891 are not similar, labels do not match
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.186994       1 compare_nodegroups.go:157] nodes template-node-for-eks-spot-jason20241030133218211400000007-cec96ed0-43eb-c9e9-8f1c-1d9adaa3c01b-1772456504083316692 and template-node-for-eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6-1335741201004455891 are not similar, labels do not match
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.187009       1 compare_nodegroups.go:157] nodes template-node-for-eks-spot-jason20241030133218211400000007-cec96ed0-43eb-c9e9-8f1c-1d9adaa3c01b-1772456504083316692 and template-node-for-eks-spot-jason20241030133218207200000005-dac96ed0-43e6-8b44-fa5d-8ffe23060302-574218763908410739 are not similar, labels do not match
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.188273       1 priority.go:114] Successfully loaded priority configuration from configmap.
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.188285       1 priority.go:163] priority expander: eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6 chosen as the highest available
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.188288       1 priority.go:163] priority expander: eks-spot-jason20241030133218207200000005-dac96ed0-43e6-8b44-fa5d-8ffe23060302 chosen as the highest available
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.188291       1 priority.go:163] priority expander: eks-spot-jason20241030133218211400000007-cec96ed0-43eb-c9e9-8f1c-1d9adaa3c01b chosen as the highest available
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.188298       1 orchestrator.go:181] Best option to resize: eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.188307       1 orchestrator.go:185] Estimated 11 nodes needed in eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.188323       1 compare_nodegroups.go:157] nodes template-node-for-eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6-1335741201004455891 and template-node-for-eks-spot-jason20241030133218207200000005-dac96ed0-43e6-8b44-fa5d-8ffe23060302-574218763908410739 are not similar, labels do not match
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.188335       1 compare_nodegroups.go:157] nodes template-node-for-eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6-1335741201004455891 and template-node-for-eks-spot-jason20241030133218211400000007-cec96ed0-43eb-c9e9-8f1c-1d9adaa3c01b-1772456504083316692 are not similar, labels do not match
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.188342       1 orchestrator.go:249] No similar node groups found
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.188358       1 orchestrator.go:291] Final scale-up plan: [{eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6 2->13 (max: 50)}]
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.188385       1 executor.go:147] Scale-up: setting group eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6 size to 13
cluster-autoscaler-86c94b64cd-mv5mm aws-cluster-autoscaler I1031 14:33:48.188409       1 auto_scaling_groups.go:267] Setting asg eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6 size to 13

@jbilliau-rcd
Copy link

Hmmm never mind, just fixed by switching to --balance-labels.....balance-ignore-labels seems overly complicated and is prone to breaking if labels change on nodes. @nicksecurity see this PR here - #4174

cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.230480       1 priority.go:163] priority expander: eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6 chosen as the highest available
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.230484       1 priority.go:163] priority expander: eks-spot-jason20241030133218207200000005-dac96ed0-43e6-8b44-fa5d-8ffe23060302 chosen as the highest available
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.230486       1 priority.go:163] priority expander: eks-spot-jason20241030133218211400000007-cec96ed0-43eb-c9e9-8f1c-1d9adaa3c01b chosen as the highest available
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.230493       1 orchestrator.go:181] Best option to resize: eks-spot-jason20241030133218207200000005-dac96ed0-43e6-8b44-fa5d-8ffe23060302
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.230498       1 orchestrator.go:185] Estimated 8 nodes needed in eks-spot-jason20241030133218207200000005-dac96ed0-43e6-8b44-fa5d-8ffe23060302
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.230520       1 orchestrator.go:246] Found 2 similar node groups: [eks-spot-jason20241030133218211400000007-cec96ed0-43eb-c9e9-8f1c-1d9adaa3c01b eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6]
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.230548       1 orchestrator.go:281] Splitting scale-up between 3 similar node groups: {eks-spot-jason20241030133218207200000005-dac96ed0-43e6-8b44-fa5d-8ffe23060302, eks-spot-jason20241030133218211400000007-cec96ed0-43eb-c9e9-8f1c-1d9adaa3c01b, eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6}
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.230567       1 orchestrator.go:291] Final scale-up plan: [{eks-spot-jason20241030133218211400000007-cec96ed0-43eb-c9e9-8f1c-1d9adaa3c01b 1->5 (max: 50)} {eks-spot-jason20241030133218207200000005-dac96ed0-43e6-8b44-fa5d-8ffe23060302 2->5 (max: 50)} {eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6 3->4 (max: 50)}]
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.230583       1 executor.go:147] Scale-up: setting group eks-spot-jason20241030133218211400000007-cec96ed0-43eb-c9e9-8f1c-1d9adaa3c01b size to 5
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.230606       1 auto_scaling_groups.go:267] Setting asg eks-spot-jason20241030133218211400000007-cec96ed0-43eb-c9e9-8f1c-1d9adaa3c01b size to 5
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.230830       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"1cd02fde-5df3-4755-a3be-880101f4f685", APIVersion:"v1", ResourceVersion:"110291641", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: setting group eks-spot-jason20241030133218211400000007-cec96ed0-43eb-c9e9-8f1c-1d9adaa3c01b size to 5 instead of 1 (max: 50)
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.353381       1 executor.go:147] Scale-up: setting group eks-spot-jason20241030133218207200000005-dac96ed0-43e6-8b44-fa5d-8ffe23060302 size to 5
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.353413       1 auto_scaling_groups.go:267] Setting asg eks-spot-jason20241030133218207200000005-dac96ed0-43e6-8b44-fa5d-8ffe23060302 size to 5
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.353488       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"1cd02fde-5df3-4755-a3be-880101f4f685", APIVersion:"v1", ResourceVersion:"110291641", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: group eks-spot-jason20241030133218211400000007-cec96ed0-43eb-c9e9-8f1c-1d9adaa3c01b size set to 5 instead of 1 (max: 50)
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.367471       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"1cd02fde-5df3-4755-a3be-880101f4f685", APIVersion:"v1", ResourceVersion:"110291641", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: setting group eks-spot-jason20241030133218207200000005-dac96ed0-43e6-8b44-fa5d-8ffe23060302 size to 5 instead of 2 (max: 50)
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.434135       1 executor.go:147] Scale-up: setting group eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6 size to 4
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.434161       1 auto_scaling_groups.go:267] Setting asg eks-spot-jason20241030133218199100000003-02c96ed0-43e2-76e6-7f01-f1c22b7851f6 size to 4
cluster-autoscaler-759cb97bf8-nrdjc aws-cluster-autoscaler I1031 14:59:06.434240       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"1cd02fde-5df3-4755-a3be-880101f4f685", APIVersion:"v1", ResourceVersion:"110291641", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: group eks-spot-jason20241030133218207200000005-dac96ed0-43e6-8b44-fa5d-8ffe23060302 size set to 5 instead of 2 (max: 50)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants