Worker nodes have status of Ready,SchedulingDisabled #3713

sudo-justinwilson · 2020-11-23T06:57:12Z

What happened:
Nodes are being marked as deletion candidates, but then the deletion candidate taints are released, but the nodes still have a SchedulingDisabled status after the node is no longer considered for deletion. This prevents pods from getting scheduled to the node and wastes operational costs.

NAME                                           STATUS                     ROLES    AGE   VERSION              INTERNAL-IP   EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
ip-10-2-34-6.ap-southeast-2.compute.internal   Ready,SchedulingDisabled   <none>   25m   v1.14.9-eks-cc7316   10.2.34.6     <none>        Amazon Linux 2   4.14.198-152.320.amzn2.x86_64   docker://19.3.6

Here are the cluster-autoscaler logs related to the node where we can see the node being considered for deletion, but then removed from consideration:

I1124 00:45:56.966552       1 node_tree.go:86] Added node "ip-10-2-34-6.ap-southeast-2.compute.internal" in group "ap-southeast-2:\x00:ap-southeast-2c" to NodeTree
I1124 00:45:59.581289       1 scale_down.go:462] Node ip-10-2-34-6.ap-southeast-2.compute.internal - cpu utilization 0.095855
I1124 00:45:59.582031       1 static_autoscaler.go:428] ip-10-2-34-6.ap-southeast-2.compute.internal is unneeded since 2020-11-24 00:45:59.579457067 +0000 UTC m=+84989.989842946 duration 0s
I1124 00:45:59.582134       1 scale_down.go:716] ip-10-2-34-6.ap-southeast-2.compute.internal was unneeded for 0s
I1124 00:45:59.596212       1 delete.go:102] Successfully added DeletionCandidateTaint on node ip-10-2-34-6.ap-southeast-2.compute.internal
I1124 00:46:09.621506       1 scale_down.go:462] Node ip-10-2-34-6.ap-southeast-2.compute.internal - cpu utilization 0.095855
I1124 00:46:09.622086       1 static_autoscaler.go:428] ip-10-2-34-6.ap-southeast-2.compute.internal is unneeded since 2020-11-24 00:45:59.579457067 +0000 UTC m=+84989.989842946 duration 10.040258663s
I1124 00:46:09.622164       1 scale_down.go:716] ip-10-2-34-6.ap-southeast-2.compute.internal was unneeded for 10.040258663s
I1124 00:46:19.637702       1 scale_down.go:462] Node ip-10-2-34-6.ap-southeast-2.compute.internal - cpu utilization 0.095855
I1124 00:46:19.638261       1 static_autoscaler.go:428] ip-10-2-34-6.ap-southeast-2.compute.internal is unneeded since 2020-11-24 00:45:59.579457067 +0000 UTC m=+84989.989842946 duration 20.056288029s
I1124 00:46:19.638410       1 scale_down.go:716] ip-10-2-34-6.ap-southeast-2.compute.internal was unneeded for 20.056288029s
I1124 00:46:29.654060       1 scale_down.go:462] Node ip-10-2-34-6.ap-southeast-2.compute.internal - cpu utilization 0.795337
I1124 00:46:29.654065       1 scale_down.go:466] Node ip-10-2-34-6.ap-southeast-2.compute.internal is not suitable for removal - cpu utilization too big (0.795337)
I1124 00:46:29.654674       1 delete.go:192] Releasing taint {Key:DeletionCandidateOfClusterAutoscaler Value:1606178759 Effect:PreferNoSchedule TimeAdded:<nil>} on node ip-10-2-34-6.ap-southeast-2.compute.internal

The affected nodes do not have the following taint (my suspicion is that CA added the taint, then removed it):

    taints:
    - effect: PreferNoSchedule
      key: DeletionCandidateOfClusterAutoscaler

We can see that some nodes have the SchedulingDisabled for up to 6 hours:

Prometheus query:

kube_node_spec_taint{key="node.kubernetes.io/unschedulable", key!="DeletionCandidateOfClusterAutoscaler"}

What you expected to happen:
I expect the nodes to have a Ready status, or get deleted.

How to reproduce it (as minimally and precisely as possible):
On a kubernetes 1.17 EKS cluster, launch a worker node using the amazon-eks-node-1.14-v20201007 AMI, with cluster-autoscaler and the following user-data:

#!/bin/bash
set -o xtrace
export AWS_DEFAULT_REGION="$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document | grep -oP '\"region\"[[:space:]]*:[[:space:]]*\"\K[^\"]+')"
iid=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
ilc=`aws ec2 describe-instances --instance-ids  $iid  --query 'Reservations[0].Instances[0].InstanceLifecycle' --output text`
if [ "$ilc" == "spot" ]; then
  /etc/eks/bootstrap.sh --kubelet-extra-args '--node-labels=lifecycle=Spot --cluster-dns=169.254.20.10 --register-with-taints=spotInstance=true:PreferNoSchedule' --apiserver-endpoint '${aws_eks_cluster.eks.endpoint}' --b64-cluster-ca '${aws_eks_cluster.eks.certificate_authority[0].data}' 'eks-${var.environment}'
else
  /etc/eks/bootstrap.sh --kubelet-extra-args '--node-labels=lifecycle=OnDemand --cluster-dns=169.254.20.10' --apiserver-endpoint '${aws_eks_cluster.eks.endpoint}' --b64-cluster-ca '${aws_eks_cluster.eks.certificate_authority[0].data}' 'eks-${var.environment}'
fi

Anything else we need to know?:

The nodes are underutilised:

NAME                                             CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
ip-10-2-34-6.ap-southeast-2.compute.internal     105m         5%     754Mi           10%

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         1535m (79%)   2600m (134%)
  memory                      1337Mi (18%)  2688Mi (37%)
  ephemeral-storage           0 (0%)        0 (0%)
  hugepages-1Gi               0 (0%)        0 (0%)
  hugepages-2Mi               0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0

Here are the node taints :

Taints:             node.kubernetes.io/unschedulable:NoSchedule
                    spotInstance=true:PreferNoSchedule
Unschedulable:      true

Here are the events:

Events:
  Type    Reason                   Age                From        Message
  ----    ------                   ----               ----        -------
  Normal  Starting                 40m                kubelet     Starting kubelet.
  Normal  NodeHasSufficientMemory  40m (x2 over 40m)  kubelet     Node ip-10-2-34-6.ap-southeast-2.compute.internal status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    40m (x2 over 40m)  kubelet     Node ip-10-2-34-6.ap-southeast-2.compute.internal status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     40m (x2 over 40m)  kubelet     Node ip-10-2-34-6.ap-southeast-2.compute.internal status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced  40m                kubelet     Updated Node Allocatable limit across pods
  Normal  Starting                 39m                kube-proxy  Starting kube-proxy.
  Normal  NodeNotSchedulable       39m                kubelet     Node ip-10-2-34-6.ap-southeast-2.compute.internal status is now: NodeNotSchedulable
  Normal  NodeReady                39m                kubelet     Node ip-10-2-34-6.ap-southeast-2.compute.internal status is now: NodeReady

I am using a deployment with 2 replicas that have the following cluster-autoscaler image:

k8s.gcr.io/autoscaling/cluster-autoscaler:v1.17.4

Contents of the cluster-autoscaler-status ConfigMap:

  status: |+
    Cluster-autoscaler status at 2020-11-24 01:28:12.471105109 +0000 UTC:
    Cluster-wide:
      Health:      Healthy (ready=9 unready=0 notStarted=0 longNotStarted=0 registered=9 longUnregistered=0)
                   LastProbeTime:      2020-11-24 01:28:12.467836437 +0000 UTC m=+87522.878222347
                   LastTransitionTime: 2020-11-23 01:21:08.047368651 +0000 UTC m=+698.457754608
      ScaleUp:     NoActivity (ready=9 registered=9)
                   LastProbeTime:      2020-11-24 01:28:12.467836437 +0000 UTC m=+87522.878222347
                   LastTransitionTime: 2020-11-24 00:54:11.86357796 +0000 UTC m=+85482.273963885
      ScaleDown:   NoCandidates (candidates=0)
                   LastProbeTime:      2020-11-24 01:28:12.467836437 +0000 UTC m=+87522.878222347
                   LastTransitionTime: 2020-11-24 00:54:11.86357796 +0000 UTC m=+85482.273963885

    NodeGroups:
      Name:        tf-asg-20200525114556707500000001
      Health:      Healthy (ready=9 unready=0 notStarted=0 longNotStarted=0 registered=9 longUnregistered=0 cloudProviderTarget=9 (minSize=2, maxSize=12))
                   LastProbeTime:      2020-11-24 01:28:12.467836437 +0000 UTC m=+87522.878222347
                   LastTransitionTime: 2020-11-23 01:21:08.047368651 +0000 UTC m=+698.457754608
      ScaleUp:     NoActivity (ready=9 cloudProviderTarget=9)
                   LastProbeTime:      2020-11-24 01:28:12.467836437 +0000 UTC m=+87522.878222347
                   LastTransitionTime: 2020-11-24 00:54:11.86357796 +0000 UTC m=+85482.273963885
      ScaleDown:   NoCandidates (candidates=0)
                   LastProbeTime:      2020-11-24 01:28:12.467836437 +0000 UTC m=+87522.878222347
                   LastTransitionTime: 2020-11-24 00:54:11.86357796 +0000 UTC m=+85482.273963885

Environment:

AWS Region: ap-southeast-2
Instance Type(s): m5.large
EKS Platform version: eks.2
Kubernetes version: 1.17
AMI Version: ami-087315adc4086bcef
Kernel : Linux ip-10-2-34-223.ap-southeast-2.compute.internal 4.14.198-152.320.amzn2.x86_64 Fix imports in cluster autoscaler after migrating it from contrib #1 SMP Wed Sep 23 23:57:28 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Release information:

BASE_AMI_ID="ami-01f93ce28477e7be1"
BUILD_TIME="Wed Oct  7 19:11:36 UTC 2020"
BUILD_KERNEL="4.14.198-152.320.amzn2.x86_64"
ARCH="x86_64"

BASE_AMI_ID="ami-01f93ce28477e7be1"
BUILD_TIME="Wed Oct  7 19:11:36 UTC 2020"
BUILD_KERNEL="4.14.198-152.320.amzn2.x86_64"
ARCH="x86_64"

The text was updated successfully, but these errors were encountered:

sudo-justinwilson · 2020-12-14T05:18:15Z

It turns out that the aws-node-termination-handler is responsible for giving the nodes an unschedulable status when a REBALANCE_RECOMMENDATION occurs.

adriannieto-attechnest · 2022-01-12T10:47:01Z

@sudo-justinwilson how did you fix the issue? We are experiencing the same issue here.

ghost · 2022-03-14T05:01:39Z

Try patching for the workaround:

For example

kubectl patch node 10.xx.xx.xxx -p '{"spec":{"unschedulable":false}}'

sudo-justinwilson · 2022-05-18T06:43:49Z

aws-node-termination-handler

Are you using aws-node-termination-handler? If so, check out the previous comment..

sudo-justinwilson closed this as completed Dec 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker nodes have status of Ready,SchedulingDisabled #3713

Worker nodes have status of Ready,SchedulingDisabled #3713

sudo-justinwilson commented Nov 23, 2020 •

edited

Loading

sudo-justinwilson commented Dec 14, 2020

adriannieto-attechnest commented Jan 12, 2022

ghost commented Mar 14, 2022

sudo-justinwilson commented May 18, 2022

Worker nodes have status of Ready,SchedulingDisabled #3713

Worker nodes have status of Ready,SchedulingDisabled #3713

Comments

sudo-justinwilson commented Nov 23, 2020 • edited Loading

sudo-justinwilson commented Dec 14, 2020

adriannieto-attechnest commented Jan 12, 2022

ghost commented Mar 14, 2022

sudo-justinwilson commented May 18, 2022

sudo-justinwilson commented Nov 23, 2020 •

edited

Loading