[EKS] [request]: Managed Nodes scale to 0 #724

mikestef9 · 2020-01-26T01:20:38Z

Currently, managed node groups has a required minimum of 1 node in a node group. This request is to update behavior to support node groups of size 0, to unlock batch and ML use cases.

https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-scale-a-node-group-to-0

mathewpower · 2020-02-03T12:49:34Z

This feature would be great for me. I'm looking to run GitLab workers on my EKS cluster to run ML training workloads. Typically, these jobs only run for a couple of hours a day (on big instances) so being able to scale down would make thing much more cost effective for us.

Any ideas when this feature might land?

jzjones-lc · 2020-02-24T19:52:11Z

@mathewpower you might want to use a vanilla autoscaling group instead of EKS managed.

Pretty much this issue makes EKS managed nodes a nonstarter for any ML projects due to one node in each group always being on

jcampbell05 · 2020-03-04T19:28:49Z

There is tasks now - perhaps that's the solution for this.

jzjones-lc · 2020-03-04T19:36:39Z

@jcampbell05 can you elaborate? What tasks are you referring to?

yann-soubeyrand · 2020-05-22T15:11:59Z

I guess that node taints will have to be managed like node labels already are in order for the necessary node template to be set: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#scaling-a-node-group-to-0.

mikestef9 · 2020-05-22T17:05:50Z

Hey @yann-soubeyrand that is correct. Looking for some feedback on that, would you want all labels and taints to automatically propagate to the ASG in the required format for scale to 0, or have selective control over which ones propagate?

dcherman · 2020-05-22T17:22:53Z

@mikestef9 If AWS has enough information to propagate the labels/taints to the ASG, then I think it'd be preferable to have it "just work" as much as possible.

There will still be scenarios where manual intervention will be needed by the consumer I think such as setting region/AZ labels for single AZ nodegroups so that cluster-autoscaler can make intelligent decisions if a specific AZ is needed, however we should probably try to minimize that work as much as possible.

yann-soubeyrand · 2020-05-23T16:33:50Z

@mikestef9 in my understanding, all the labels and taints should be propagated to the ASG in the k8s.io/cluster-autoscaler/node-template/[label|taint]/<key> format since the cluster autoscaler takes its decisions based on it. If some taints or labels are missing, this could mislead the cluster autoscaler. Also, I'm not aware of any good reason not to propagate certain labels or taints.

A feature which could be useful though, is to be able to disable cluster autoscaler for specific node groups (that is, not setting k8s.io/cluster-autoscaler/enabled tag on these node groups).

@dcherman isn't the AZ case already managed by cluster autoscaler without specifying label templates?

dcherman · 2020-05-23T17:11:06Z

@yann-soubeyrand I think you're right! Just read through the cluster-autoscaler code, and it looks like it discovers what AZs the ASG creates nodes in from the ASG itself; I always thought it had discovered those from the nodes initially created by the ASG.

In that case, we can disregard my earlier comment.

Ghazgkull · 2020-06-02T17:42:32Z

I would like to be able to forcibly scale a managed node group to 0 via the CLI, by setting something like desired or maximum number of nodes to 0. Ignoring things like pod disruption budgets, etc.

I would like this in order for developers to have their own clusters which get scaled to 0 outside of working hours. I would like to use a simple cron to force clusters to size 0 at night, then give them 1 node in the morning and let cluster-autoscaler scale them back up.

sibendu · 2020-06-16T17:25:19Z

Hi All
is this feature already for AWS EKS?
From following documentation it appears EKS supports it - From CA 0.6 for GCE/GKE and CA 0.6.1 for AWS, it is possible to scale a node group to 0
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-scale-a-node-group-to-0
Can someone please confirm?

yann-soubeyrand · 2020-06-16T17:30:11Z

Hi All
is this feature already for AWS EKS?
From following documentation it appears EKS supports it - From CA 0.6 for GCE/GKE and CA 0.6.1 for AWS, it is possible to scale a node group to 0
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-scale-a-node-group-to-0
Can someone please confirm?

@sibendu it's not supported with managed node groups yet (this is the object of this issue) but you can achieve it with non managed node groups (following the documentation you linked).

cfarrend · 2020-06-25T02:10:38Z

Would be great to have this, we make use of cluster autoscaling in order to demand GPU nodes on GKE and scale down when there are no requests. Having one node idle is definitely not cost effective for us if we want to use managed nodes on EKS

antonosmond · 2020-07-01T12:59:56Z

Putting use cases aside (although I have many), autoscaling groups already support min, max & desired size being 0. A node group is ultimately just an autoscaling group (and therefore already supports size 0). You can go into the AWS web console, find the ASG created for a node group and set the size to 0 and it's fine therefore it doesn't make sense that node groups are not supporting a zero size. As a loyal AWS customer it's frustrating to see things like this - there appears to be no good technical reason for preventing a size of zero but forcing customers to have a least 1 instance makes AWS more £££. Hmmm... was the decision to prevent a zero size about making it better for the customer or is Jeff a bit short of cash?

yann-soubeyrand · 2020-07-01T13:30:17Z

@antonosmond there are good technical reasons why you cannot scale from 0 with the actual configuration: for the autoscaler to be able to scale from 0, one have to put tags on the ASG indicating labels and taints the nodes will have. These tags are missing as of now. This is the purpose of this issue.

antonosmond · 2020-07-01T14:10:11Z

@yann-soubeyrand The cluster autoscaler is just one use case but this issue shouldn't relate specifically to the cluster autoscaler. The issue should be that you can't set a size of zero and regardless of use case or whether or not you run the cluster autoscaler, you should be able to set a size of zero as this is supported in autoscaling groups.

In addition to the use cases above, other use cases for 0 size include:

PoCs and testing (I may want 0 nodes so I can test my config without incurring instance charges)
having different node groups for different instance types where I don't necessarily need all instance types running at all times
cost saving e.g. scaling to zero overnight / at weekends

yann-soubeyrand · 2020-07-01T14:50:24Z

@antonosmond if you're not using cluster autoscaler, you're scaling the ASG manually, right? What prevents you from setting a min and desired count to 0? It seems to work as intended.

antonosmond · 2020-07-01T14:55:29Z

@yann-soubeyrand I got to this issue from here.
It's nothing to do with the cluster autoscaler, I simply want to create a node group with an initial size of 0.
I have some terraform to create a node group but if I set the size to 0 it fails because the AWS API behind the resource creation validates that size is greater than zero.
Update - and yes I can create a node group with a size of 1 and then manually scale it zero but I shouldn't need to. The API should allow me to create a node group with a zero size.

yann-soubeyrand · 2020-07-01T15:19:50Z

The API should allow me to create a node group with a zero size.

I think we all agree with this ;-)

MatteoMori · 2020-08-06T16:09:32Z

Hey guys,

is there any update on this one?

thanks!

stevehipwell · 2022-04-10T16:11:20Z

If anyone is interested I can drop a pattern which works with the latest community module and has comments?

artificial-aidan · 2022-04-10T16:15:04Z

@stevehipwell thanks for the suggestion about declaring as locals. Another good way to do it.

I haven't followed the transition to 18 closely, it's been on my to-do list to catch up on. I followed the flurry of activity around the auth configmap changes and at that point decided it was going to be more effort than it was worth at the time.

I will experiment with the local node group definitions, that will likely work for us, just venting some slight annoyance that there isn't a clear documented well supported path for scale to zero. (It's possible, and I currently do it, just clunky). Maybe refactoring into local definitions will clean things up.

artificial-aidan · 2022-04-10T16:15:50Z

If anyone is interested I can drop a pattern which works with the latest community module and has comments?

Yes please.

stevehipwell · 2022-04-11T15:58:41Z

This is the example from above with a bit more context added, it turns out that the module changes don't help as they only provide the names and not the ID to lookup the required tags.

locals {
  cluster_name = "my-cluster"

  # Define MNGs here so we can reference them later
  mngs = {
    my-mng = {
      name = "my-mng"
      labels = {
        "my-label" = "foo"
      }
      taints = [{
        key = "my-taint"
        value = "bar"
        effect = "NO_SCHEDULE"
      }]
    }
  }

  # We need to lookup K8s taint effect from the AWS API value
  taint_effects = {
    "NO_SCHEDULE" = "NoSchedule"
    "NO_EXECUTE"  = "NoExecute"
    "PREFER_NO_SCHEDULE" = "PreferNoSchedule"
  }

  # Calculate the tags required by CA based on the MNG inputs
  mng_ca_tags = { for mng_key, mng_value in local.mngs : mng_key => merge({
    "k8s.io/cluster-autoscaler/enabled" = "true"
    "k8s.io/cluster-autoscaler/${local.cluster_name}" = "owned"
    },
    { for label_key, label_value in mng_value.labels : "k8s.io/cluster-autoscaler/node-template/label/${label_key}" => label_value },
    { for taint in mng_value.taints : "k8s.io/cluster-autoscaler/node-template/taint/${taint.key}" => "${taint.value}:${local.taint_effects[taint.effect]}" }
  )
}

# Use the module
module "eks" {
  ...
  eks_managed_node_groups = local.mngs
  ...
}

resource "aws_autoscaling_group_tag" "mng_ca" {
  # Create a tuple in a map for each ASG tag combo
  for_each = merge([for mng_key, mng_tags in local.mng_ca_tags : { for tag_key, tag_value in mng_tags : "${mng_key}-${substr(tag_key, 25, -1)}" => { mng = mng_key, key = tag_key, value = tag_value }}]...)

  # Lookup the ASG name for the MNG, erroring if there is more than one
  autoscaling_group_name = one(module.eks.eks_managed_node_groups[each.value.mng].node_group_autoscaling_group_names)

  tag {
    key   = each.value.key
    value = each.value.value

    propagate_at_launch = false
  }
}

TomasHradecky · 2022-05-09T11:45:46Z

hi all, am I missing something or what is the reason why not to set minimum capacity for Managed node group to 0 ?
https://aws.amazon.com/blogs/containers/catching-up-with-managed-node-groups-in-amazon-eks/

jbg · 2022-05-09T12:51:58Z

You can absolutely set the minimum to zero.

If you want cluster-autoscaler to automatically scale the node group up from zero when a pod is unschedulable on existing nodes but could be scheduled on a new node created in that node group, you need to tag the ASG that the node group creates with k8s.io/cluster-autoscaler/node-template/label/{key} = {value} and/or k8s.io/cluster-autoscaler/node-template/taint/{key} = {value} according to the labels and taints that the created nodes will have.

That's easy to automate using IaaC solutions like Terraform or with ad-hoc scripts as described above.

ArchiFleKs · 2022-05-10T22:53:52Z

Hi, here is a snippet I use with Terragrunt and terraform-aws-eks module. I use it as a symlink in the same folder where my terragrunt.hcl reside. (thanks @stevehipwell for the snippet).

This allows to set implicit labels to be able to scale to and from 0 with restricted labels such as:

k8s.io/cluster-autoscaler/node-template/label/node.kubernetes.io/instance-type
k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zone
k8s.io/cluster-autoscaler/node-template/label/topology.kubernetes.io/zone

And also to add restricted labels to the ASG as tags (labels that are forbidden via the EKS API for example). I think there is room for improvement, let me know what you think.

It also allows to mix between taint and labels defined in defaults and directly in MNG.

locals {
  mngs         = var.eks_managed_node_groups
  mng_defaults = var.eks_managed_node_group_defaults

  cluster_name = var.cluster_name

  taint_effects = {
    NO_SCHEDULE        = "NoSchedule"
    NO_EXECUTE         = "NoExecute"
    PREFER_NO_SCHEDULE = "PreferNoSchedule"
  }

  mng_ca_tags_defaults = {
    "k8s.io/cluster-autoscaler/enabled"               = "true"
    "k8s.io/cluster-autoscaler/${local.cluster_name}" = "owned"
  }

  mng_ca_tags_taints_defaults = try(local.mng_defaults.taints, []) != [] ? {
    for taint in local.mng_defaults.taints : "k8s.io/cluster-autoscaler/node-template/taint/${taint.key}" => "${taint.value}:${local.taint_effects[taint.effect]}"
  } : {}

  mng_ca_tags_labels_defaults = try(local.mng_defaults.labels, {}) != {} ? {
    for label_key, label_value in local.mng_defaults.labels : "k8s.io/cluster-autoscaler/node-template/label/${label_key}" => label_value
  } : {}

  mng_ca_tags_taints = { for mng_key, mng_value in local.mngs : mng_key => merge(
    { for taint in mng_value.taints : "k8s.io/cluster-autoscaler/node-template/taint/${taint.key}" => "${taint.value}:${local.taint_effects[taint.effect]}" }
    ) if try(mng_value.taints, []) != []
  }

  mng_ca_tags_labels = { for mng_key, mng_value in local.mngs : mng_key => merge(
    { for label_key, label_value in mng_value.labels : "k8s.io/cluster-autoscaler/node-template/label/${label_key}" => label_value },
    ) if try(mng_value.labels, {}) != {}
  }

  mng_ca_tags_restricted_labels = { for mng_key, mng_value in local.mngs : mng_key => merge(
    { for label_key, label_value in mng_value.restricted_labels : "k8s.io/cluster-autoscaler/node-template/label/${label_key}" => label_value },
    ) if try(mng_value.restricted_labels, {}) != {}
  }

  mng_ca_tags_implicit = { for mng_key, mng_value in local.mngs : mng_key => merge(
    length(try(mng_value.instance_types, local.mng_defaults.instance_types)) == 1 ? { "k8s.io/cluster-autoscaler/node-template/label/node.kubernetes.io/instance-type" = one(try(mng_value.instance_types, local.mng_defaults.instance_types)) } : {},
    length(data.aws_autoscaling_group.node_groups[mng_key].availability_zones) == 1 ? { "k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zone" = one(data.aws_autoscaling_group.node_groups[mng_key].availability_zones) } : {},
    length(data.aws_autoscaling_group.node_groups[mng_key].availability_zones) == 1 ? { "k8s.io/cluster-autoscaler/node-template/label/topology.kubernetes.io/zone" = one(data.aws_autoscaling_group.node_groups[mng_key].availability_zones) } : {},
    )
  }

  mng_ca_tags = { for mng_key, mng_value in local.mngs : mng_key => merge(
    local.mng_ca_tags_defaults,
    local.mng_ca_tags_taints_defaults,
    local.mng_ca_tags_labels_defaults,
    try(local.mng_ca_tags_taints[mng_key], {}),
    try(local.mng_ca_tags_labels[mng_key], {}),
    try(local.mng_ca_tags_restricted_labels[mng_key], {}),
    local.mng_ca_tags_implicit[mng_key],
  ) }

  mng_asg_custom_tags = { for mng_key, mng_value in local.mngs : mng_key => merge(var.tags) }
}

data "aws_autoscaling_group" "node_groups" {
  for_each = module.eks_managed_node_group
  name     = each.value.node_group_resources.0.autoscaling_groups.0.name
}

resource "aws_autoscaling_group_tag" "mng_ca" {
  # Create a tuple in a map for each ASG tag combo
  for_each = merge([for mng_key, mng_tags in local.mng_ca_tags : { for tag_key, tag_value in mng_tags : "${mng_key}-${substr(tag_key, 25, -1)}" => { mng = mng_key, key = tag_key, value = tag_value } }]...)

  # Lookup the ASG name for the MNG, erroring if there is more than one
  autoscaling_group_name = one(module.eks_managed_node_group[each.value.mng].node_group_autoscaling_group_names)

  tag {
    key                 = each.value.key
    value               = each.value.value
    propagate_at_launch = false
  }
}

resource "aws_autoscaling_group_tag" "mng_asg_tags" {
  # Create a tuple in a map for each ASG tag combo
  for_each = merge([for mng_key, mng_tags in local.mng_asg_custom_tags : { for tag_key, tag_value in mng_tags : "${mng_key}-${tag_key}" => { mng = mng_key, key = tag_key, value = tag_value } }]...)

  # Lookup the ASG name for the MNG, erroring if there is more than one
  autoscaling_group_name = one(module.eks_managed_node_group[each.value.mng].node_group_autoscaling_group_names)

  tag {
    key                 = each.value.key
    value               = each.value.value
    propagate_at_launch = true
  }
}

This produces the following output for example in console:

{                                                                                                                   
  "c5-xlarge-pub-a" = {                                                                                             
    "k8s.io/cluster-autoscaler/enabled" = "true"                                                                                                   
    "k8s.io/cluster-autoscaler/node-template/label/network" = "public"                          
    "k8s.io/cluster-autoscaler/node-template/label/node.kubernetes.io/instance-type" = "c5.xlarge"
    "k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zone" = "eu-west-1a"
    "k8s.io/cluster-autoscaler/node-template/label/topology.kubernetes.io/zone" = "eu-west-1a"
    "k8s.io/cluster-autoscaler/node-template/taint/dedicated" = "true:NoSchedule"
  }                                                                                                                 
  "c5-xlarge-pub-b" = {                                                                                             
    "k8s.io/cluster-autoscaler/enabled" = "true"                                                                                                
    "k8s.io/cluster-autoscaler/node-template/label/network" = "public"                          
    "k8s.io/cluster-autoscaler/node-template/label/node.kubernetes.io/instance-type" = "c5.xlarge"
    "k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zone" = "eu-west-1b"
    "k8s.io/cluster-autoscaler/node-template/label/topology.kubernetes.io/zone" = "eu-west-1b"
    "k8s.io/cluster-autoscaler/node-template/taint/dedicated" = "true:NoSchedule"
  }                                             
  "c5-xlarge-pub-c" = {                                                                                             
    "k8s.io/cluster-autoscaler/enabled" = "true"                                                                                                
    "k8s.io/cluster-autoscaler/node-template/label/network" = "public"                          
    "k8s.io/cluster-autoscaler/node-template/label/node.kubernetes.io/instance-type" = "c5.xlarge"
    "k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zone" = "eu-west-1c"
    "k8s.io/cluster-autoscaler/node-template/label/topology.kubernetes.io/zone" = "eu-west-1c"                      
    "k8s.io/cluster-autoscaler/node-template/taint/dedicated" = "true:NoSchedule"
  }

TomasHradecky · 2022-05-11T09:01:18Z

Still not sure if you have to do it this way. My definition:

locals {
  rds_db_large_mng_count_green  = var.create_rds_db_large_mng_green ? var.rds_db_large_mng_count_blue : 0
}

data "aws_ec2_instance_type" "rds_db_large" {
  instance_type = var.rds_db_large_cluster_instance_type
}

resource "aws_eks_node_group" "rds_instances_db_large" {
  count = local.rds_db_large_mng_count_blue

  launch_template {
    id      = aws_launch_template.cluster_instances.id
    version = aws_launch_template.cluster_instances.latest_version
  }

  cluster_name           = module.eks.cluster_id
  node_group_name_prefix = "eks-rds-db-r6g-large"
  capacity_type          = "ON_DEMAND"
  node_role_arn          = module.eks.worker_iam_role_arn
  subnet_ids             = [data.terraform_remote_state.core.outputs.vpc_private_subnets[count.index]]
  instance_types         = [var.rds_db_large_cluster_instance_type]
  ami_type               = var.custom_bottlerocket_ami_id != "" ? null : "BOTTLEROCKET_ARM_64"
  release_version        = var.rds_db_large_custom_bottlerocket_version != "" ? var.rds_db_large_custom_bottlerocket_version : ""

  scaling_config {
    desired_size = var.rds-db-large_mng_desired_size
    max_size     = var.rds-db-large_mng_max_size
    min_size     = var.rds-db-large_mng_min_size
  }

  labels = {
    "node_group" = "rds_db_large"
    "node_size"  = "r6g_large"
    "blue_group" = "true"
    "aws_vcpus" : data.aws_ec2_instance_type.rds_db_large.default_vcpus
    "aws_memory" : data.aws_ec2_instance_type.rds_db_large.memory_size
  }

  lifecycle {
    create_before_destroy = false
    ignore_changes = [scaling_config[0].desired_size] #max_size, min_size
  }

  depends_on = [module.eks]

  tags = var.create_rds_db_large_mng_green ? {} : merge(var.tags,
    {
      "k8s.io/cluster-autoscaler/${var.cluster_name}" = "owned",
      "k8s.io/cluster-autoscaler/enabled"             = "true",
      "kubernetes.io/cluster/bottlerocket"            = "owned"
    }
  )
}

Values for size

min_size = 0
desired_size = 1
max_size = 10

Is important to set desired_size to 1, there will be created MNG with 1 node and after some time cluster autoscaler will scale down to 0. Now If you create e.g. deployment with nodeSelector set for specific MNG scaled to 0 deployment trigger scale up and new node is created with all labels specified in terraform.
If you specify desired_size = 0 no new nodes will be created

TomasHradecky · 2022-06-20T17:59:42Z

After some time I've returned to this topic with as I believe final fix at least for our usecase.
During my first testing, scale to 0 and from 0 works well but I didn't test case when managed node groups are scaled to 0 for longer than few hours.
What happened now ? Our node groups were scaled to 0 for some days and after that cluster-autoscaler was unable to scale from 0. Main reason was that we use managed node group labels as nodeSelector for pods and ASG inherit only tags from node group not labels.
So after some time scaled to 0 cluster-autoscaler lost information about which ASG should be scaled up for this pod condition and pod which require specific nodeSelector is marked as unschedulable.
After ad this selector as ASG as tag, cluster-autoscaler scale from 0 like a charm.
So here is main info from my testing:

aws currently support node group min_size=0 and desired_size=0 and cluster-autoscaler is able to work with it
managed node group labels are only on node group, even if cluster autoscaler can scale besed on their value
managed node group tags are inherited by ASG
ASG has references on launch_template and instance size and is not required to add it as a tags
if you want to use managed node group labels to scale from 0 is necessary to set them as ASG tags with following prefix "k8s.io/cluster-autoscaler/node-template/label/node.kubernetes.io/"

Hope it helps to someone.

stevehipwell · 2022-06-20T18:12:50Z

@TomasHradecky what your describing is the expected and documented Cluster Autoscaler behaviour which is easy to configure for un-managed node groups but slightly trickier to do for AWS managed node groups. This issue has covered a number of ways to "hack" the ASG behind the MNG but the real issue here is waiting for AWS to support this natively through the MNG API; this is still as far away as it was when the issue was opened and when it was moved to coming soon!

khteh · 2022-07-07T05:53:26Z

+1 cost is major concern!

dprateek1991 · 2022-10-12T15:22:06Z

Scaling Managed Node Groups from 0 to 1 and scaling down back to 0 works well on EKS. We have just tested it out. The hack is to TAG your Autoscaling Group used by Managed Node Group with a specific tag, similar to the K8s label added to your Node Group.

I just had to add the following block of code in my Terraform Module for EKS Node Groups and post this the Cluster Autoscaler is able to scale up from 0 to 1 and scale back down to 0 as well. This is also mentioned in the Cluster Autoscaler document here - https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-scale-a-node-group-to-0

Our K8s label on Managed Node Group is -
group = spot-m5-8xlarge
Therefore, in "k8s.io/cluster-autoscaler/node-template/label/group", the group at the end is the KEY of the label

# ASG Tag
resource "aws_autoscaling_group_tag" "tag" {
  for_each = toset(
    [for asg in flatten(
      [for resources in aws_eks_node_group.ng.resources : resources.autoscaling_groups]
    ) : asg.name]
  )

  autoscaling_group_name = each.value

  tag  {
    key   = "k8s.io/cluster-autoscaler/node-template/label/group"
    value = "${var.node_group_name}"
    propagate_at_launch = true
  }
}

I tested it on our EKS Jupyterhub environment and you can see the scale-up worked well as per logs -

2022-10-12T15:16:27Z [Normal] pod triggered scale-up: [{eks-spot-m5-8xlarge-20221005083326499900000005-88c1d3a3-9d15-f436-afd7-52c200e7678c 0->1 (max: 5)}]

Hope it helps others as well, who're facing the same challenges as us.

PS: An example to achieve this is also mentioned here - https://github.com/terraform-aws-modules/terraform-aws-eks/blob/312e4a4d59cb10a762a4045e9944f3f837126933/examples/eks_managed_node_group/main.tf#L673-L712

khteh · 2022-10-15T10:25:09Z

Scaling Managed Node Groups from 0 to 1 and scaling down back to 0 works well on EKS. We have just tested it out. The hack is to TAG your Autoscaling Group used by Managed Node Group with a specific tag, similar to the K8s label added to your Node Group.

I just had to add the following block of code in my Terraform Module for EKS Node Groups and post this the Cluster Autoscaler is able to scale up from 0 to 1 and scale back down to 0 as well. This is also mentioned in the Cluster Autoscaler document here - https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-scale-a-node-group-to-0

Our K8s label on Managed Node Group is - group = spot-m5-8xlarge Therefore, in "k8s.io/cluster-autoscaler/node-template/label/group", the group at the end is the KEY of the label
# ASG Tag
resource "aws_autoscaling_group_tag" "tag" {
  for_each = toset(
    [for asg in flatten(
      [for resources in aws_eks_node_group.ng.resources : resources.autoscaling_groups]
    ) : asg.name]
  )

  autoscaling_group_name = each.value

  tag  {
    key   = "k8s.io/cluster-autoscaler/node-template/label/group"
    value = "${var.node_group_name}"
    propagate_at_launch = true
  }
}
I tested it on our EKS Jupyterhub environment and you can see the scale-up worked well as per logs -

2022-10-12T15:16:27Z [Normal] pod triggered scale-up: [{eks-spot-m5-8xlarge-20221005083326499900000005-88c1d3a3-9d15-f436-afd7-52c200e7678c 0->1 (max: 5)}]

Hope it helps others as well, who're facing the same challenges as us.

PS: An example to achieve this is also mentioned here - https://github.com/terraform-aws-modules/terraform-aws-eks/blob/312e4a4d59cb10a762a4045e9944f3f837126933/examples/eks_managed_node_group/main.tf#L673-L712

How does it impact cold start? Imagine if the cluster has 3 AZs and all are allowed to scale to 0, wouldn't it impact the first request that comes in with 0 node in ALL AZs?

jbg · 2022-10-15T10:27:40Z

Of course, it takes time (30 sec ~ 3 min depending on node OS, probe settings etc) for a node to start, join the cluster and get Ready. If you are responding to requests that need fast responses, don't scale all your node groups to zero.

khteh · 2022-10-15T10:28:54Z

How to scale all AZs to zero except one?

jbg · 2022-10-15T10:32:09Z

If you're following best practices, you already have one node group per AZ, so you could set min size for each AZ separately. But it's probably better to handle that higher up in the stack — even if you have min size set to 0 for the node group for all AZs, cluster-autoscaler won't scale a node group down to zero if there are pods scheduled on the node(s) that can't be moved elsewhere.

dprateek1991 · 2022-10-15T13:29:29Z

@khteh - Regarding your question, I would rather suggest to design your workloads in a way to submit on different Node Groups and not all Node Groups have to be 0. In our use-case, we have multiple Node Groups like

Services NG - We run DE services like Airflow, JupyterHub, MLflow etc on this. We can't have this NG as Min 0, as services are running 24x7. These are mostly On-Demand EC2
Workloads NG - We use this to run workloads related to DE and ML. We can set this as Min 0 and use SPOT EC2 in this.

I would rather design in a way which fits specific use-cases and not to keep all NG to 0.

yuvipanda · 2022-10-18T03:58:22Z

Would karpenter help here at all? I'd imagine no, but wanted to check.

dprateek1991 · 2022-10-18T04:04:01Z

Would karpenter help here at all? I'd imagine no, but wanted to check.

I don't think so, as that's not the original purpose of Karpenter. The solution to TAG the ASG of Managed NG is provided by AWS itself to us when we checked about it with them. This is something already done in Terraform Module of EKS provided here - https://github.com/terraform-aws-modules/terraform-aws-eks/blob/312e4a4d59cb10a762a4045e9944f3f837126933/examples/eks_managed_node_group/main.tf#L673-L712

AFAIK, for now, this is the only solution to support Min 0 for Managed NGs in EKS. Unless someone has tried a better solution :). @yuvipanda - I have tried it with Z2JH (which you help maintain) and it works like a charm. Just have to wait for the Autoscaler to kick in, so if we have a timeout on Jhub, need to do couple of retries to spawn the server, but works well and helps save cost a lot

valorl · 2022-10-22T17:04:56Z

Would karpenter help here at all? I'd imagine no, but wanted to check.

I think Karpenter is certainly a viable way to handle the use-case of spinning up nodes just-in-time for e.g. batch workloads. Karpenter will simply spin up a node directly when it sees pending pods, no ASGs involved. The controller itself may run on e.g. a statically scaled ASG, or for a completely ASG-less cluster, it should be possible to run it on Fargate. For us at least, Karpenter massively simplified running batch workloads.

mbevc1 · 2022-10-23T15:32:09Z

Yeah, good candidate and can run on Fargate nodes 👍

bryantbiggs · 2022-11-15T16:48:59Z

Support for scaling managed node groups to 0 will be available starting with Kubernetes 1.24 - kubernetes/autoscaler#4491 (comment)

Once EKS releases support for Kuberenetes 1.24, you should be able to configure this functionality

akestner · 2022-11-16T01:36:43Z

🚀🚀🚀 Launch Announcement 🚀🚀🚀
Today we launched EKS support for Kubernetes 1.24 which includes a feature we contributed to the upstream Cluster Autoscaler project that simplifies scaling an EKS MNG to/from 0 nodes. When there are no running nodes in the MNG, the Cluster Autoscaler will call the EKS DescribeNodegroup API to get the information it needs about MNG resources, labels, and taints.

Read more about this and other new features available on EKS and Kubernetes 1.24 in the launch blog.

mikestef9 added EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue labels Jan 26, 2020

mikestef9 self-assigned this Jan 26, 2020

odellus mentioned this issue May 7, 2020

[EKS + Fargate] [request]: Managed Knative (i.e., competitor to Google Cloud Run) #763

Open

mikestef9 added EKS Managed Nodes EKS Managed Nodes and removed Proposed Community submitted issue labels Jun 11, 2020

TBBle mentioned this issue Jun 16, 2020

Cluster Autoscaler Addon: Support scaling up from 0 eksctl-io/eksctl#1481

Closed

bflad mentioned this issue Jun 29, 2020

aws_eks_node_group can't set min_size of 0 hashicorp/terraform-provider-aws#13984

Closed

mikestef9 mentioned this issue Aug 17, 2020

[EKS] [request]: Nodegroup should support tagging ASGs #608

Open

flowinh2o mentioned this issue Oct 3, 2022

Incorporate "Tags for the ASG to support cluster-autoscaler scale up from 0" example into managed node groups terraform-aws-modules/terraform-aws-eks#2255

Closed

yuvipanda mentioned this issue Nov 9, 2022

Extract-replicate our AWS cluster creation workflow into the mybinder.org-deploy repo 2i2c-org/infrastructure#1824

Closed

akestner closed this as completed Nov 16, 2022

yuvipanda mentioned this issue Nov 16, 2022

Replace eksctl with terraform 2i2c-org/infrastructure#1924

Open

3 tasks

afirth mentioned this issue Jan 4, 2023

Could not get a CSINode object for the node kubernetes/autoscaler#4811

Closed

TBBle mentioned this issue Feb 8, 2023

[EKS] [request]: Kubernetes Restricted Label support for Managed Node Groups #1451

Open

[EKS] [request]: Managed Nodes scale to 0 #724

[EKS] [request]: Managed Nodes scale to 0 #724

Comments

mikestef9 commented Jan 26, 2020

mathewpower commented Feb 3, 2020

jzjones-lc commented Feb 24, 2020

jcampbell05 commented Mar 4, 2020

jzjones-lc commented Mar 4, 2020

yann-soubeyrand commented May 22, 2020

mikestef9 commented May 22, 2020

dcherman commented May 22, 2020

yann-soubeyrand commented May 23, 2020

dcherman commented May 23, 2020

Ghazgkull commented Jun 2, 2020

sibendu commented Jun 16, 2020

yann-soubeyrand commented Jun 16, 2020

cfarrend commented Jun 25, 2020 • edited Loading

antonosmond commented Jul 1, 2020

yann-soubeyrand commented Jul 1, 2020

antonosmond commented Jul 1, 2020

yann-soubeyrand commented Jul 1, 2020

antonosmond commented Jul 1, 2020 • edited Loading

yann-soubeyrand commented Jul 1, 2020

MatteoMori commented Aug 6, 2020 • edited Loading

stevehipwell commented Apr 10, 2022

artificial-aidan commented Apr 10, 2022

artificial-aidan commented Apr 10, 2022

stevehipwell commented Apr 11, 2022

TomasHradecky commented May 9, 2022

jbg commented May 9, 2022

ArchiFleKs commented May 10, 2022 • edited Loading

TomasHradecky commented May 11, 2022 • edited Loading

TomasHradecky commented Jun 20, 2022 • edited Loading

stevehipwell commented Jun 20, 2022

khteh commented Jul 7, 2022

dprateek1991 commented Oct 12, 2022 • edited Loading

khteh commented Oct 15, 2022 • edited Loading

jbg commented Oct 15, 2022

khteh commented Oct 15, 2022

jbg commented Oct 15, 2022

dprateek1991 commented Oct 15, 2022 • edited Loading

yuvipanda commented Oct 18, 2022

dprateek1991 commented Oct 18, 2022 • edited Loading

valorl commented Oct 22, 2022 • edited Loading

mbevc1 commented Oct 23, 2022

bryantbiggs commented Nov 15, 2022

akestner commented Nov 16, 2022

cfarrend commented Jun 25, 2020 •

edited

Loading

antonosmond commented Jul 1, 2020 •

edited

Loading

MatteoMori commented Aug 6, 2020 •

edited

Loading

ArchiFleKs commented May 10, 2022 •

edited

Loading

TomasHradecky commented May 11, 2022 •

edited

Loading

TomasHradecky commented Jun 20, 2022 •

edited

Loading

dprateek1991 commented Oct 12, 2022 •

edited

Loading

khteh commented Oct 15, 2022 •

edited

Loading

dprateek1991 commented Oct 15, 2022 •

edited

Loading

dprateek1991 commented Oct 18, 2022 •

edited

Loading

valorl commented Oct 22, 2022 •

edited

Loading