Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AWS][Cluster Autoscale] Cluster Autoscaler is randomly adding and deleting nodes in the node groups, results in uneven node distribution across different zones #3082

Closed
avedpathak opened this issue Apr 23, 2020 · 6 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@avedpathak
Copy link

Although I have the same labels for all ASGs, the same instance size and also enabled balance-similar-node-groups, cluster auto-scaler does not balance or evenly distribute instances.

We have 4 ASGs, one ASG per AZ.
There are 6 instances in ASG-A, 2 in ASG-B, 1 in ASG-C, 1 in ASG-D

  1. Cloud provider: AWS
  2. EKS version is 1.15.6
  3. The current app version is k8s.gcr.io/cluster-autoscaler:v1.14.6 and Chart version is cluster-autoscaler-6.2.0
  4. We have multiple AZ in EKS cluster and each AZ has one ASG. Min no of nodes in each asg is set to 1 and Max number of nodes is set to 6.
  5. Each ASG has two tags on which cluster autoscaler works those are k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/ Cluster autoscaler will identify the ASG/Node group to scale using these tags.
  6. We are using LaunchConfiguration to launch the nodes
  7. Instance types are also similar. When the cluster autoscaler attempts to discover similar node groups, it requires an exact match in memory capacity
  8. All nodes have same labels kubectl get nodes --show-labels

Container Configuration:

containers:

  • command:
    • ./cluster-autoscaler
    • --cloud-provider=aws
    • --namespace=utilities
    • --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/eks
    • --balance-similar-node-groups=true
    • --logtostderr=true
    • --stderrthreshold=warning
    • --v=0

Config map logs:
status: |+
Cluster-autoscaler status at 2020-04-23 09:44:25.020239986 +0000 UTC:
Cluster-wide:
Health: Healthy (ready=12 unready=0 notStarted=0 longNotStarted=0 registered=13 longUnregistered=0)
LastProbeTime: 2020-04-23 09:44:25.016382263 +0000 UTC m=+25908.679659720
LastTransitionTime: 2020-04-23 02:33:05.172110061 +0000 UTC m=+28.835387548
ScaleUp: NoActivity (ready=12 registered=13)
LastProbeTime: 2020-04-23 09:44:25.016382263 +0000 UTC m=+25908.679659720
LastTransitionTime: 2020-04-23 09:33:51.210503791 +0000 UTC m=+25274.873781228
ScaleDown: NoCandidates (candidates=0)
LastProbeTime: 2020-04-23 09:44:25.016382263 +0000 UTC m=+25908.679659720
LastTransitionTime: 2020-04-23 09:43:44.950793146 +0000 UTC m=+25868.614070533
NodeGroups:
Name: eks-travel-qa-subnet-1e46c654-workers-NodeGroup-PBB6IJ6ZNHKF
Health: Healthy (ready=0 unready=0 notStarted=0 longNotStarted=0 registered=0 longUnregistered=0 cloudProviderTarget=0 (minSize=0, maxSize=0))
LastProbeTime: 0001-01-01 00:00:00 +0000 UTC
LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
ScaleUp: NoActivity (ready=0 cloudProviderTarget=0)
LastProbeTime: 0001-01-01 00:00:00 +0000 UTC
LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
ScaleDown: NoCandidates (candidates=0)
LastProbeTime: 2020-04-23 09:44:25.016382263 +0000 UTC m=+25908.679659720
LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
Name: eks-travel-qa-subnet-4d4d8c2a-workers-NodeGroup-PC6EPZSXRVAT
Health: Healthy (ready=0 unready=0 notStarted=0 longNotStarted=0 registered=0 longUnregistered=0 cloudProviderTarget=0 (minSize=0, maxSize=0))
LastProbeTime: 0001-01-01 00:00:00 +0000 UTC
LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
ScaleUp: NoActivity (ready=0 cloudProviderTarget=0)
LastProbeTime: 0001-01-01 00:00:00 +0000 UTC
LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
ScaleDown: NoCandidates (candidates=0)
LastProbeTime: 2020-04-23 09:44:25.016382263 +0000 UTC m=+25908.679659720
LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
Name: eks-travel-qa-subnet-9e48b5c2-workers-NodeGroup-J61VTAEY3A7U
Health: Healthy (ready=0 unready=0 notStarted=0 longNotStarted=0 registered=0 longUnregistered=0 cloudProviderTarget=0 (minSize=0, maxSize=0))
LastProbeTime: 0001-01-01 00:00:00 +0000 UTC
LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
ScaleUp: NoActivity (ready=0 cloudProviderTarget=0)
LastProbeTime: 0001-01-01 00:00:00 +0000 UTC
LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
ScaleDown: NoCandidates (candidates=0)
LastProbeTime: 2020-04-23 09:44:25.016382263 +0000 UTC m=+25908.679659720
LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
Name: eks-travel-qa-subnet-faf608d4-workers-NodeGroup-NW2O6EXRYLO
Health: Healthy (ready=0 unready=0 notStarted=0 longNotStarted=0 registered=0 longUnregistered=0 cloudProviderTarget=0 (minSize=0, maxSize=0))
LastProbeTime: 0001-01-01 00:00:00 +0000 UTC
LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
ScaleUp: NoActivity (ready=0 cloudProviderTarget=0)
LastProbeTime: 0001-01-01 00:00:00 +0000 UTC
LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
ScaleDown: NoCandidates (candidates=0)
LastProbeTime: 2020-04-23 09:44:25.016382263 +0000 UTC m=+25908.679659720
LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC

Any help would be appreciated, Thanks.

@avedpathak avedpathak changed the title Cluster Autoscaler is randomly adding and deleting nodes in those node groups, results in uneven node distribution across different zones [AWS][Cluster Autoscale] Cluster Autoscaler is randomly adding and deleting nodes in those node groups, results in uneven node distribution across different zones Apr 23, 2020
@avedpathak avedpathak changed the title [AWS][Cluster Autoscale] Cluster Autoscaler is randomly adding and deleting nodes in those node groups, results in uneven node distribution across different zones [AWS][Cluster Autoscale] Cluster Autoscaler is randomly adding and deleting nodes in the node groups, results in uneven node distribution across different zones Apr 23, 2020
@r8474
Copy link

r8474 commented Jul 20, 2020

We are seeing very similar behaviour in multiple clusters. Each cluster has 3 ASGs (3 AZs) with varying maximum and minimum instance numbers.

Some current ASG numbers:
5,4,0 (min 0, max 5)
3,2,0 (min 0, max 5)
3,0,0 (min 0, max 5)
4,2,1 (min 1, max 5)

  1. Cloud provider: AWS
  2. EKS version: 1.16
  3. CA version: k8s.gcr.io/cluster-autoscaler:v1.16.4
    Chart version: cluster-autoscaler-7.2.2
  4. Each AZ has one ASG. Minimum nodes in each ASG are either 0 or 1 and maximum number of nodes varies
  5. Each ASG has two tags which CA uses:
    k8s.io/cluster-autoscaler/enabled
    k8s.io/cluster-autoscaler/cluster-name
  6. We are using Launch Configurations
  7. Only one instance type is set per ASG
  8. All nodes have the same labels

Command:
./cluster-autoscaler
--cloud-provider=aws
--namespace=kube-system
--node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/cluster-name
--balance-similar-node-groups=true
--expander=random
--leader-elect=true
--logtostderr=true
--scale-down-enabled=true
--scale-down-unneeded-time=10m
--scale-down-unready-time=10m
--scale-down-utilization-threshold=0.3
--scan-interval=10s
--skip-nodes-with-local-storage=false
--skip-nodes-with-system-pods=false
--stderrthreshold=info
--v=2
--write-status-configmap=true

@carlosjgp
Copy link

carlosjgp commented Aug 5, 2020

Digging a little bit on the CA source code I've seen this
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_cloud_provider.go#L85

// NodeGroups returns all node groups configured for this cloud provider.
func (aws *awsCloudProvider) NodeGroups() []cloudprovider.NodeGroup {
	asgs := aws.awsManager.getAsgs()
	ngs := make([]cloudprovider.NodeGroup, len(asgs))
	for i, asg := range asgs {
		ngs[i] = &AwsNodeGroup{
			asg:        asg,
			awsManager: aws.awsManager,
		}
	}

	return ngs
}

Meaning, if I understood correctly how CA works, that each ASG is a node group on itself instead of grouping by tags or name pattern or similar...

Once said that the behaviour seen on k8s.gcr.io/cluster-autoscaler:v1.16.4 by @r8474 (coworker) is a little bit different with k8s.gcr.io/autoscaling/cluster-autoscaler:v1.16.6 released yesterday
because we thought that this fixture will help
https://github.com/kubernetes/autoscaler/releases/tag/cluster-autoscaler-1.16.6

Nodes with small difference in available memory will now be considered similar for the purposes of balancing NodeGroup sizes. This should increase the reliability of NodeGroup balancing on some providers (#3124).

and I've observed these logs entries. <INSTANCE-IP> AND <REGION> are placeholders 😅

 node_tree.go:93] Added node "ip-<INSTANCE-IP>.<REGION>.compute.internal" in group "<REGION>:\x00:<REGION>a" to NodeTree
 node_tree.go:93] Added node "ip-<INSTANCE-IP>.<REGION>.compute.internal" in group "<REGION>:\x00:<REGION>b" to NodeTree
 node_tree.go:93] Added node "ip-<INSTANCE-IP>.<REGION>.compute.internal" in group "<REGION>:\x00:<REGION>c" to NodeTree
...

reaching a more evenly distributed cluster across AZs using a quick test...

$ kubectl create deployment --image nginx nginx

Set resource request and limits. I'm using 2Gi and 500m of cPU

$ kubectl edit deployments.apps nginx
$ kubectl scale deployment nginx --replicas 30

Sit back and relax...

at the end the chosen instance ASG type chosen was scaled evenly

3,3,3 for m5a.xlarge instance types

(Please bear in mind that we have only been running this CA version for a couple of hours... I'll repeat the test a couple of times over this week and see what happens)

At the moment we have 3 different ASG with the same configuration one per AZ
but maybe the way of doing this properly is using one single ASG with multiple AZs
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_group#availability_zones

but AWS docs say that
https://docs.aws.amazon.com/eks/latest/userguide/cluster-autoscaler.html

If you are running a stateful application across multiple Availability Zones that is backed by Amazon EBS volumes and using the Kubernetes Cluster Autoscaler, you should configure multiple node groups, each scoped to a single Availability Zone. In addition, you should enable the --balance-similar-node-groups feature. Otherwise, you can create a single node group that spans multiple Availability Zones.

which makes specific reference to StatefukSets but same rules apply to other workloads...???

I hope this helps someone to through some light on this issue

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 3, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants