Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster-autoscaler 1.2.2 not scaling AWS ASG to zero #1555

Closed
hobbsh opened this issue Jan 5, 2019 · 30 comments
Closed

cluster-autoscaler 1.2.2 not scaling AWS ASG to zero #1555

hobbsh opened this issue Jan 5, 2019 · 30 comments

Comments

@hobbsh
Copy link

hobbsh commented Jan 5, 2019

Hi,

I have been unable to get the cluster-autoscaler to scale one of my two autoscaling groups to zero. In this scenario, the blue worker group should be scaled to zero. It's very likely I missed something but have been unable to track down what that might be based on the documentation/information available on the internet.

I have tagged both ASGs with k8s.io/cluster-autoscaler/node-template/label/eks_worker_group: [blue|green] and nodes have labels eks_worker_group: [blue|green]. Nodes are also tagged with the same tag on the EC2 side as well.

Running with the following options:

        - ./cluster-autoscaler
        - --cloud-provider=aws
        - --namespace=default
        - --nodes=0:0:staging-k8s-worker-blue20180814211923516200000002
        - --nodes=3:7:staging-k8s-worker-green20181212202045835900000004
        - --logtostderr=true
        - --scale-down-delay-after-add=5m
        - --scale-down-delay-after-delete=5m
        - --skip-nodes-with-system-pods=false
        - --stderrthreshold=info
        - --v=4

Here is status:

Cluster-wide:
  Health:      Healthy (ready=6 unready=0 notStarted=0 longNotStarted=0 registered=6 longUnregistered=0)
               LastProbeTime:      2019-01-05 01:26:23.670947407 +0000 UTC
               LastTransitionTime: 2019-01-05 01:26:23.670947407 +0000 UTC
  ScaleUp:     NoActivity (ready=6 registered=6)
               LastProbeTime:      2019-01-05 01:26:23.670947407 +0000 UTC
               LastTransitionTime: 2019-01-05 01:26:23.670947407 +0000 UTC
  ScaleDown:   NoCandidates (candidates=0)
               LastProbeTime:      2019-01-05 01:26:23.670947407 +0000 UTC
               LastTransitionTime: 2019-01-05 01:26:23.670947407 +0000 UTC

NodeGroups:
  Name:        staging-k8s-worker-blue20180814211923516200000002
  Health:      Healthy (ready=3 unready=0 notStarted=0 longNotStarted=0 registered=3 longUnregistered=0 cloudProviderTarget=0 (minSize=0, maxSize=0))
               LastProbeTime:      2019-01-05 01:26:23.670947407 +0000 UTC
               LastTransitionTime: 2019-01-05 01:26:23.670947407 +0000 UTC
  ScaleUp:     NoActivity (ready=3 cloudProviderTarget=0)
               LastProbeTime:      2019-01-05 01:26:23.670947407 +0000 UTC
               LastTransitionTime: 2019-01-05 01:26:23.670947407 +0000 UTC
  ScaleDown:   NoCandidates (candidates=0)
               LastProbeTime:      2019-01-05 01:26:23.670947407 +0000 UTC
               LastTransitionTime: 2019-01-05 01:26:23.670947407 +0000 UTC

  Name:        staging-k8s-worker-green20181212202045835900000004
  Health:      Healthy (ready=3 unready=0 notStarted=0 longNotStarted=0 registered=3 longUnregistered=0 cloudProviderTarget=3 (minSize=3, maxSize=7))
               LastProbeTime:      2019-01-05 01:26:23.670947407 +0000 UTC
               LastTransitionTime: 2019-01-05 01:26:23.670947407 +0000 UTC
  ScaleUp:     NoActivity (ready=3 cloudProviderTarget=3)
               LastProbeTime:      2019-01-05 01:26:23.670947407 +0000 UTC
               LastTransitionTime: 2019-01-05 01:26:23.670947407 +0000 UTC
  ScaleDown:   NoCandidates (candidates=0)
               LastProbeTime:      2019-01-05 01:26:23.670947407 +0000 UTC
               LastTransitionTime: 2019-01-05 01:26:23.670947407 +0000 UTC

And this is what I see in the logs, no smoking gun that I can see:

I0105 01:24:33.384719       1 static_autoscaler.go:114] Starting main loop
I0105 01:24:33.384753       1 aws_manager.go:241] Refreshed ASG list, next refresh after 2019-01-05 01:25:33.384747177 +0000 UTC
I0105 01:24:33.757020       1 leaderelection.go:199] successfully renewed lease default/cluster-autoscaler
I0105 01:24:34.099184       1 utils.go:456] No pod using affinity / antiaffinity found in cluster, disabling affinity predicate for this loop
I0105 01:24:34.099232       1 static_autoscaler.go:263] Filtering out schedulables
I0105 01:24:34.099609       1 static_autoscaler.go:273] No schedulable pods
I0105 01:24:34.099644       1 static_autoscaler.go:280] No unschedulable pods
I0105 01:24:34.099662       1 static_autoscaler.go:322] Calculating unneeded nodes
I0105 01:24:34.395094       1 utils.go:413] Skipping ip-10-99-62-151.us-west-2.compute.internal - node group min size reached
I0105 01:24:34.395127       1 utils.go:413] Skipping ip-10-99-60-164.us-west-2.compute.internal - node group min size reached
I0105 01:24:34.395139       1 utils.go:413] Skipping ip-10-99-61-85.us-west-2.compute.internal - node group min size reached
I0105 01:24:34.395149       1 utils.go:413] Skipping ip-10-99-61-86.us-west-2.compute.internal - node group min size reached
I0105 01:24:34.395159       1 utils.go:413] Skipping ip-10-99-62-166.us-west-2.compute.internal - node group min size reached
I0105 01:24:34.395169       1 utils.go:413] Skipping ip-10-99-60-139.us-west-2.compute.internal - node group min size reached
I0105 01:24:34.395482       1 static_autoscaler.go:352] Scale down status: unneededOnly=false lastScaleUpTime=2019-01-05 01:01:11.941954422 +0000 UTC lastScaleDownDeleteTime=2019-01-05 01:01:11.941954962 +0000 UTC lastScaleDownFailTime=2019-01-05 01:01:11.941955491 +0000 UTC schedulablePodsPresent=false isDeleteInProgress=false
I0105 01:24:34.395513       1 static_autoscaler.go:355] Starting scale down
I0105 01:24:34.674506       1 scale_down.go:446] No candidates for scale down
@Jeffwan
Copy link
Contributor

Jeffwan commented Jan 5, 2019

I will have a look at this issue.

@Jeffwan
Copy link
Contributor

Jeffwan commented Jan 7, 2019

I can not reproduce this issue easily.. Looks like you change nodegroup min/max to 0/0 in CA but I am not sure how's your ASG min/max setting. Based on the logs and config map status, your blue nodegroup has 3 nodes which can not be smaller than or equal to nodeGroup.MinSize().

Could you also share your ASG setting?

BTW, explicit node setting does have some issues. It's not just in v1.2.x but all versions. I am not sure it's by design or a bug. I submit a issue #1559 and I can make a quick fix once maintainer confirm it. That will help reduce confusion when user set min/max in CA. I think the way you use it might not be a elegant way, CA should manage all the node groups based on utilization. Set to 0/0 doesn't make sense to me.

@d-nishi
Copy link

d-nishi commented Jan 7, 2019

/sig aws

@hobbsh
Copy link
Author

hobbsh commented Jan 7, 2019

@Jeffwan I change the ASG min/max setting using terraform-aws-eks module as well. It did not seem to matter if I used CA autodiscovery or static nodegroups - the instances did not scale down. The blue nodegroup has the discrepancy because the ASG setting and the CA setting were changed but no nodes were scaled down. The ASG is using proect_from_scale_in on the instances to make sure that the ASG does not stomp on CA in terms of scaling up/down.

In this below output, the blue nodegroup instances were manually terminated FYI.

$ aws autoscaling describe-auto-scaling-groups
        {
            "AutoScalingGroupARN": "arn:aws:autoscaling:us-west-2:REDACTED:autoScalingGroup:d1a7759a-6666-4619-b47a-d1e89bf2c659:autoScalingGroupName/staging-k8s-worker-blue20180814211923516200000002", 
            "ServiceLinkedRoleARN": "arn:aws:iam::REDACTED:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling", 
            "TargetGroupARNs": [
                REDACTED
            ], 
            "SuspendedProcesses": [], 
            "DesiredCapacity": 0, 
            "Tags": [
                {
                    "ResourceType": "auto-scaling-group", 
                    "ResourceId": "staging-k8s-worker-blue20180814211923516200000002", 
                    "PropagateAtLaunch": true, 
                    "Value": "staging", 
                    "Key": "Environment"
                }, 
                {
                    "ResourceType": "auto-scaling-group", 
                    "ResourceId": "staging-k8s-worker-blue20180814211923516200000002", 
                    "PropagateAtLaunch": true, 
                    "Value": "staging-k8s-worker-blue-eks_asg", 
                    "Key": "Name"
                }, 
                {
                    "ResourceType": "auto-scaling-group", 
                    "ResourceId": "staging-k8s-worker-blue20180814211923516200000002", 
                    "PropagateAtLaunch": false, 
                    "Value": "true", 
                    "Key": "k8s.io/cluster-autoscaler/enabled"
                }, 
                {
                    "ResourceType": "auto-scaling-group", 
                    "ResourceId": "staging-k8s-worker-blue20180814211923516200000002", 
                    "PropagateAtLaunch": true, 
                    "Value": "blue", 
                    "Key": "k8s.io/cluster-autoscaler/node-template/label/eks_worker_group"
                }, 
                {
                    "ResourceType": "auto-scaling-group", 
                    "ResourceId": "staging-k8s-worker-blue20180814211923516200000002", 
                    "PropagateAtLaunch": true, 
                    "Value": "owned", 
                    "Key": "kubernetes.io/cluster/staging"
                }
            ], 
            "EnabledMetrics": [], 
            "LoadBalancerNames": [], 
            "AutoScalingGroupName": "staging-k8s-worker-blue20180814211923516200000002", 
            "DefaultCooldown": 300, 
            "MinSize": 0, 
            "Instances": [], 
            "MaxSize": 0, 
            "VPCZoneIdentifier": "REDACTED", 
            "HealthCheckGracePeriod": 300, 
            "TerminationPolicies": [
                "Default"
            ], 
            "LaunchConfigurationName": "staging-k8s-worker-blue20181227194639154700000001", 
            "CreatedTime": "2018-08-14T21:19:24.013Z", 
            "AvailabilityZones": [
                "us-west-2a", 
                "us-west-2b", 
                "us-west-2c"
            ], 
            "HealthCheckType": "EC2", 
            "NewInstancesProtectedFromScaleIn": true
        }, 
        {
            "AutoScalingGroupARN": "arn:aws:autoscaling:us-west-2:REDACTED:autoScalingGroup:a322b65b-512d-4e6a-856b-620a295de676:autoScalingGroupName/staging-k8s-worker-green20181212202045835900000004", 
            "ServiceLinkedRoleARN": "arn:aws:iam::REDACTED:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling", 
            "TargetGroupARNs": [
               REDACTED
            ], 
            "SuspendedProcesses": [], 
            "DesiredCapacity": 3, 
            "Tags": [
                {
                    "ResourceType": "auto-scaling-group", 
                    "ResourceId": "staging-k8s-worker-green20181212202045835900000004", 
                    "PropagateAtLaunch": true, 
                    "Value": "staging", 
                    "Key": "Environment"
                }, 
                {
                    "ResourceType": "auto-scaling-group", 
                    "ResourceId": "staging-k8s-worker-green20181212202045835900000004", 
                    "PropagateAtLaunch": true, 
                    "Value": "staging-k8s-worker-green-eks_asg", 
                    "Key": "Name"
                }, 
                {
                    "ResourceType": "auto-scaling-group", 
                    "ResourceId": "staging-k8s-worker-green20181212202045835900000004", 
                    "PropagateAtLaunch": false, 
                    "Value": "true", 
                    "Key": "k8s.io/cluster-autoscaler/enabled"
                }, 
                {
                    "ResourceType": "auto-scaling-group", 
                    "ResourceId": "staging-k8s-worker-green20181212202045835900000004", 
                    "PropagateAtLaunch": true, 
                    "Value": "green", 
                    "Key": "k8s.io/cluster-autoscaler/node-template/label/eks_worker_group"
                }, 
                {
                    "ResourceType": "auto-scaling-group", 
                    "ResourceId": "staging-k8s-worker-green20181212202045835900000004", 
                    "PropagateAtLaunch": true, 
                    "Value": "owned", 
                    "Key": "kubernetes.io/cluster/staging"
                }
            ], 
            "EnabledMetrics": [], 
            "LoadBalancerNames": [], 
            "AutoScalingGroupName": "staging-k8s-worker-green20181212202045835900000004", 
            "DefaultCooldown": 300, 
            "MinSize": 3, 
            "Instances": [
                {
                    "ProtectedFromScaleIn": true, 
                    "AvailabilityZone": "us-west-2b", 
                    "InstanceId": "i-0a8b297a3fefba8f2", 
                    "HealthStatus": "Healthy", 
                    "LifecycleState": "InService", 
                    "LaunchConfigurationName": "staging-k8s-worker-green20190104233309393100000001"
                }, 
                {
                    "ProtectedFromScaleIn": true, 
                    "AvailabilityZone": "us-west-2c", 
                    "InstanceId": "i-0c2c10abefacf6165", 
                    "HealthStatus": "Healthy", 
                    "LifecycleState": "InService", 
                    "LaunchConfigurationName": "staging-k8s-worker-green20190104233309393100000001"
                }, 
                {
                    "ProtectedFromScaleIn": true, 
                    "AvailabilityZone": "us-west-2a", 
                    "InstanceId": "i-0c60b7598af6dfaea", 
                    "HealthStatus": "Healthy", 
                    "LifecycleState": "InService", 
                    "LaunchConfigurationName": "staging-k8s-worker-green20190104233309393100000001"
                }
            ], 
            "MaxSize": 7, 
            "VPCZoneIdentifier": "REDACTED", 
            "HealthCheckGracePeriod": 300, 
            "TerminationPolicies": [
                "Default"
            ], 
            "LaunchConfigurationName": "staging-k8s-worker-green20190104233309393100000001", 
            "CreatedTime": "2018-12-12T20:20:46.603Z", 
            "AvailabilityZones": [
                "us-west-2a", 
                "us-west-2b", 
                "us-west-2c"
            ], 
            "HealthCheckType": "EC2", 
            "NewInstancesProtectedFromScaleIn": true
        },

CA should support scaling to/from 0 based on this issue: #166

@johanneswuerbach
Copy link
Contributor

johanneswuerbach commented Jan 7, 2019

The cloudProviderTarget=0 in that staging-k8s-worker-blue20180814211923516200000002 looks strange as this means that the CA thinks the ASG is already at 0 and therefor doesn't allow further instances to be terminated as this would bring the ASG to -1, which is less then the allowed minSize.

Sadly I don't really have an idea why the target size and the ASG could be out-of-sync, maybe it might be caused by the explicitly defined max being below the running instances.

Did you try running with 0:3 and could you post the configmap and logs?

@hobbsh
Copy link
Author

hobbsh commented Jan 8, 2019

@johanneswuerbach I have also tried this same scenario with autodiscovery and maxSize of 0 is also not respected to scale a nodegroup to 0. I would expect setting the maxSize in the ASG (which is how I am telling CA to scale down) would terminate all instances in that ASG. Interesting point about cloudProviderTarget, that might be the disconnect here. Maybe maxSize was never intended to provide a "force terminate" function by setting to 0?

Using your suggestion (leave maxSize alone and set minSize to 0), I've been able to get a nodegroup to scale down to zero (using autodiscovery) by cordoning the the appropriate nodes so they are detected as unneeded. I don't think this really solves the problem because it's still one step too many in terms of automating scale-up and scale-down. IMO, it should be enough to just specify a 0 for maxSize in the ASG and have CA terminate all instances in that group.

Here are the logs from this test:

I0108 01:08:22.764134       1 scale_down.go:594] Scale-down: removing empty node ip-10-99-60-76.us-west-2.compute.internal
I0108 01:08:22.764232       1 scale_down.go:594] Scale-down: removing empty node ip-10-99-61-149.us-west-2.compute.internal
I0108 01:08:22.764252       1 scale_down.go:594] Scale-down: removing empty node ip-10-99-62-55.us-west-2.compute.internal
I0108 01:08:22.782741       1 delete.go:53] Successfully added toBeDeletedTaint on node ip-10-99-60-76.us-west-2.compute.internal
I0108 01:08:22.786214       1 delete.go:53] Successfully added toBeDeletedTaint on node ip-10-99-61-149.us-west-2.compute.internal
I0108 01:08:22.787611       1 delete.go:53] Successfully added toBeDeletedTaint on node ip-10-99-62-55.us-west-2.compute.internal

The config in this case is:

        - ./cluster-autoscaler
        - --cloud-provider=aws
        - --namespace=default
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,kubernetes.io/cluster/staging
        - --logtostderr=true
        - --scale-down-delay-after-add=5m
        - --scale-down-delay-after-delete=5m
        - --skip-nodes-with-system-pods=false
        - --stderrthreshold=info
        - --v=1

And the status configmap:

Cluster-autoscaler status at 2019-01-08 02:00:37.130872102 +0000 UTC:
Cluster-wide:
  Health:      Healthy (ready=3 unready=0 notStarted=0 longNotStarted=0 registered=3 longUnregistered=0)
               LastProbeTime:      2019-01-08 02:00:36.068491075 +0000 UTC
               LastTransitionTime: 2019-01-08 00:58:13.61347861 +0000 UTC
  ScaleUp:     NoActivity (ready=3 registered=3)
               LastProbeTime:      2019-01-08 02:00:36.068491075 +0000 UTC
               LastTransitionTime: 2019-01-08 00:58:13.61347861 +0000 UTC
  ScaleDown:   NoCandidates (candidates=0)
               LastProbeTime:      2019-01-08 02:00:36.068491075 +0000 UTC
               LastTransitionTime: 2019-01-08 01:08:34.699980756 +0000 UTC

NodeGroups:
  Name:        staging-k8s-worker-blue20180814211923516200000002
  Health:      Healthy (ready=0 unready=0 notStarted=0 longNotStarted=0 registered=0 longUnregistered=0 cloudProviderTarget=0 (minSize=0, maxSize=3))
               LastProbeTime:      0001-01-01 00:00:00 +0000 UTC
               LastTransitionTime: 2019-01-08 00:58:13.61347861 +0000 UTC
  ScaleUp:     NoActivity (ready=0 cloudProviderTarget=0)
               LastProbeTime:      0001-01-01 00:00:00 +0000 UTC
               LastTransitionTime: 2019-01-08 00:58:13.61347861 +0000 UTC
  ScaleDown:   NoCandidates (candidates=0)
               LastProbeTime:      2019-01-08 02:00:36.068491075 +0000 UTC
               LastTransitionTime: 2019-01-08 01:08:34.699980756 +0000 UTC

  Name:        staging-k8s-worker-green20181212202045835900000004
  Health:      Healthy (ready=3 unready=0 notStarted=0 longNotStarted=0 registered=3 longUnregistered=0 cloudProviderTarget=3 (minSize=3, maxSize=7))
               LastProbeTime:      2019-01-08 02:00:36.068491075 +0000 UTC
               LastTransitionTime: 2019-01-08 00:58:13.61347861 +0000 UTC
  ScaleUp:     NoActivity (ready=3 cloudProviderTarget=3)
               LastProbeTime:      2019-01-08 02:00:36.068491075 +0000 UTC
               LastTransitionTime: 2019-01-08 00:58:13.61347861 +0000 UTC
  ScaleDown:   NoCandidates (candidates=0)
               LastProbeTime:      2019-01-08 02:00:36.068491075 +0000 UTC
               LastTransitionTime: 2019-01-08 00:58:13.61347861 +0000 UTC


Events:
  Type    Reason          Age   From                Message
  ----    ------          ----  ----                -------
  Normal  ScaleDownEmpty  52m   cluster-autoscaler  Scale-down: removing empty node ip-10-99-60-76.us-west-2.compute.internal
  Normal  ScaleDownEmpty  52m   cluster-autoscaler  Scale-down: removing empty node ip-10-99-61-149.us-west-2.compute.internal
  Normal  ScaleDownEmpty  52m   cluster-autoscaler  Scale-down: removing empty node ip-10-99-62-55.us-west-2.compute.internal

@johanneswuerbach
Copy link
Contributor

Yes, lowering the max size, doesn’t force instances to be terminated (unlike on AWS)

Why do you need to scale up and down manually? The CA should scale down automatically once there is enough free capacity to move pods to another nodes and manual intervention shouldn’t be required.

Could you explain a bit further what you are trying to do?

@Jeffwan
Copy link
Contributor

Jeffwan commented Jan 8, 2019

@hobbsh
Your ASG "DesiredCapacity" is 0 but CA still have 3 registered nodes. I am not sure if you check EC2 instances or not. At that time, I think there's no node in blue group as all nodes are terminated by ASG. For some reason, kubernetes cluster has not remove k8s nodes in time and CA fetches ASG setting, and get current size 0. Since 0 <= node setting.MinSize(). It shows Skipping xxxx.us-west-2.compute.internal - node group min size reached.

Could you help confirm that once your ASG sets to 0:0, the log is still same after a few minutes? Try to restart to check logs.
If not, kubernetes takes time to remove node and CA will get change later and then update config map status.
If so, CA probably has bug on some edge case for certain moment

@johanneswuerbach
Copy link
Contributor

In #1555 (comment) @hobbsh mentioned that he is using protect_from_scale_in, which would explain the difference in ASG current size and that there are actual nodes. I don't think protect_from_scale_in is currently supported by the autoscaler as it mainly relies on reported size of the ASG.

@hobbsh
Copy link
Author

hobbsh commented Jan 8, 2019

@johanneswuerbach maxSize not forcing all instances to terminate is definitely the crux of the issue. I want to maintain blue/green worker groups where at least one of the two will always be scaled down to zero. The main reason for this is to roll out new AMIs and other worker configuration changes. It has proven to be a very painless way of updating workers except for this last hitch with CA.

To force all the nodes to terminate with the existing functionality, I would have to just cordon the old worker group but I feel like that step could be avoided if CA just interpreted a maxSize of 0 to mean all instances should be terminated as this is how the ASG settings work.

protect_from_scale_in does indeed prevent the ASG from performing any operations on the node group and I was under the impression was necessary for CA to operate properly. After setting 0:0 in the ASG, everything drops to 0 except instance count because of that setting. At that point, I would expect CA to see that there are still 3 instances running even though maxSize is 0 and terminate all the instances.

I feel like there is definitely some missing functionality here or a bug. IMO CA should at the least behave the same as the ASG when setting min/max to 0.

@Jeffwan I have waited up to 25 minutes for CA to scale down after setting to 0:0 and the log is still the same, including restarting the pod.

@johanneswuerbach
Copy link
Contributor

Tbh I'm not sure whether CA is the best place to add such functionality as AWS would currently be the only cloud-provider supporting this and there might be various edge cases like what happens if there isn't enough space to fit currently running pods, what is this is actually not intended behaviour etc. by the user, etc.

But maybe @aleksandra-malinowska could give a 👍 / 👎 on that.

@aleksandra-malinowska
Copy link
Contributor

Limits constrain only autoscaler, not the user. E.g. if there's a spike way outside of expected operation range, user can manually resize the group without giving autoscaler permission to do the same, or having to disable it. Autoscaler enforcing size to its own minimum and maximum limits would only make such intervention more difficult.

As for the use-case of utilizing autoscaler to drain the node group that is being manually removed, it doesn't sound like it's related to autoscaling at all. On the other hand, it already does some house-keeping tasks (removing unregistered nodes etc.), so perhaps it would be a reasonable feature as well. E.g. an annotation/taint essentially saying "this node is about to be removed, find space for all its pods and drain". @MaciekPytel WDYT?

IMO CA should at the least behave the same as the ASG when setting min/max to 0.

Why? Please note that Cluster Autoscaler supports diverse environments and not all of them make the same assumptions.

@hobbsh
Copy link
Author

hobbsh commented Jan 8, 2019

CA looking for a specific tag on the ASG to scale to zero would be ideal for this use-case, otherwise telling CA what taint to observe would probably be more complicated than just cordoning a node group.

I understand that force termination may not fall under the umbrella of CA's purpose and if that is truly the case then this doc sure adds a lot of confusion to the mix. When CA integrates in an AWS environment, is it not intended to replace the functionality that the ASG provides in terms of handling scaling events? If so, then IMO it should support setting maxSize to zero terminating all instances in a node group.

I also think this practice (blue/green ASGs) is relatively common with EKS.

@aleksandra-malinowska
Copy link
Contributor

CA looking for a specific tag on the ASG to scale to zero would be ideal for this use-case, otherwise telling CA what taint to observe would probably be more complicated than just cordoning a node group.

This logic would have to be in CA core. Triggering it in an environment-independent way sounds more realistic, although cloud provider code could probably artificially set this on nodes as well.

I understand that force termination may not fall under the umbrella of CA's purpose and if that is truly the case then this doc sure adds a lot of confusion to the mix. When CA integrates in an AWS environment, is it not intended to replace the functionality that the ASG provides in terms of handling scaling events? I

No, it's intended to autoscale Kubernetes clusters in a scheduling-aware way (which is what traditional metric-based autoscalers are missing). Using more than one autoscaler to scale the same resource usually ends badly, especially if they have completely different logic - one removes nodes, the other adds them back, the first one removes them again, and so on. This is why we recommend all other autoscalers are disabled.

The linked section refers to a case when node group size is already at 0. Then there are no existing nodes to use as template and CA has no idea how a new node will look like (and therefore, whether it makes sense to add it at all). To avoid a situation when a pod without toleration causes scale up in empty node group with a taint, this taint needs to be in the template. This way it's included in the simulation and CA won't add unnecessary node.

@hobbsh
Copy link
Author

hobbsh commented Jan 8, 2019

@aleksandra-malinowska Thanks for the clarification. Maybe the phraseology should be changed on the doc because Scaling a node group to 0 and it is possible to scale a node group to 0 is pretty confusing. On the subject of draining/housekeeping - I thought CA already did graceful evictions when a node was being terminated?

It would be great if something in the cloud provider code (or something generic enough for core) could accommodate this if modifications for setting maxSize to zero terminating a node group is out of the question.

@aleksandra-malinowska
Copy link
Contributor

"Scaling to 0" here means "removing the last node from the node group because it's empty/underutilized". It's trivial, the only problem is scaling back from 0 - it needs to be implemented so it's not one way only.

@aleksandra-malinowska
Copy link
Contributor

As for improving it, PRs are always welcome. Note that this is part of AWS cloud provider documentation and not really supported by any of the regular maintainers right now.

@hobbsh
Copy link
Author

hobbsh commented Jan 8, 2019

I'd be glad to take a stab at it, if nothing else to understand the internals more - however I have no Go experience so I may not be the best one to get a PR out anytime soon.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 8, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 8, 2019
@dhawal55
Copy link

We got bitten by this too (CA will not scale up ASG from 0). I feel this needs to be documented clearly. We use auto-discovery and had a pool which got scaled down to zero but will never scale back up. Is there any workaround for this? I would like a solution other than running an idle node all the time (minSize:1)

@aackerman
Copy link

@hobbsh I'm looking into doing the same thing that you're trying to do, blue/green ASGs to deploy new VM image changes.

Did you come up with a solution for this using the cluster-autoscaler out of the box behavior or did you have to build custom behavior to drain the ASG when you wanted it to scale to zero for reasons other than utilization?

@hobbsh
Copy link
Author

hobbsh commented Jun 10, 2019

@aackerman I started down the path of using Zalando's kube node drainer systemd unit which basically just does a kubectl drain before the node shuts down but I was not able to spend the time to get that to work fully. The caveat with this is that the order is wrong. I think an operator that watches for a specific label/taint on nodes and drains them is the best solution. I say this because cluster-autoscaler will reap unused workers (and in fact all of them in an entire group if you set minSize to 0) so this approach should adhere to the correct order of things. I don't think that would be very hard to implement but I haven't had much time to think about it.

@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rverma-nikiai
Copy link

/reopen

@k8s-ci-robot
Copy link
Contributor

@rverma-nikiai: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rverma-nikiai
Copy link

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 1, 2019
@Jeffwan
Copy link
Contributor

Jeffwan commented Aug 1, 2019

@rverma-nikiai Do you still see this issue in version > 1.2.2?

@rverma-nikiai
Copy link

rverma-nikiai commented Aug 2, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants