Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker nodes have status of Ready,SchedulingDisabled #3713

Closed
sudo-justinwilson opened this issue Nov 23, 2020 · 4 comments
Closed

Worker nodes have status of Ready,SchedulingDisabled #3713

sudo-justinwilson opened this issue Nov 23, 2020 · 4 comments

Comments

@sudo-justinwilson
Copy link

sudo-justinwilson commented Nov 23, 2020

What happened:
Nodes are being marked as deletion candidates, but then the deletion candidate taints are released, but the nodes still have a SchedulingDisabled status after the node is no longer considered for deletion. This prevents pods from getting scheduled to the node and wastes operational costs.

NAME                                           STATUS                     ROLES    AGE   VERSION              INTERNAL-IP   EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
ip-10-2-34-6.ap-southeast-2.compute.internal   Ready,SchedulingDisabled   <none>   25m   v1.14.9-eks-cc7316   10.2.34.6     <none>        Amazon Linux 2   4.14.198-152.320.amzn2.x86_64   docker://19.3.6

Here are the cluster-autoscaler logs related to the node where we can see the node being considered for deletion, but then removed from consideration:

I1124 00:45:56.966552       1 node_tree.go:86] Added node "ip-10-2-34-6.ap-southeast-2.compute.internal" in group "ap-southeast-2:\x00:ap-southeast-2c" to NodeTree
I1124 00:45:59.581289       1 scale_down.go:462] Node ip-10-2-34-6.ap-southeast-2.compute.internal - cpu utilization 0.095855
I1124 00:45:59.582031       1 static_autoscaler.go:428] ip-10-2-34-6.ap-southeast-2.compute.internal is unneeded since 2020-11-24 00:45:59.579457067 +0000 UTC m=+84989.989842946 duration 0s
I1124 00:45:59.582134       1 scale_down.go:716] ip-10-2-34-6.ap-southeast-2.compute.internal was unneeded for 0s
I1124 00:45:59.596212       1 delete.go:102] Successfully added DeletionCandidateTaint on node ip-10-2-34-6.ap-southeast-2.compute.internal
I1124 00:46:09.621506       1 scale_down.go:462] Node ip-10-2-34-6.ap-southeast-2.compute.internal - cpu utilization 0.095855
I1124 00:46:09.622086       1 static_autoscaler.go:428] ip-10-2-34-6.ap-southeast-2.compute.internal is unneeded since 2020-11-24 00:45:59.579457067 +0000 UTC m=+84989.989842946 duration 10.040258663s
I1124 00:46:09.622164       1 scale_down.go:716] ip-10-2-34-6.ap-southeast-2.compute.internal was unneeded for 10.040258663s
I1124 00:46:19.637702       1 scale_down.go:462] Node ip-10-2-34-6.ap-southeast-2.compute.internal - cpu utilization 0.095855
I1124 00:46:19.638261       1 static_autoscaler.go:428] ip-10-2-34-6.ap-southeast-2.compute.internal is unneeded since 2020-11-24 00:45:59.579457067 +0000 UTC m=+84989.989842946 duration 20.056288029s
I1124 00:46:19.638410       1 scale_down.go:716] ip-10-2-34-6.ap-southeast-2.compute.internal was unneeded for 20.056288029s
I1124 00:46:29.654060       1 scale_down.go:462] Node ip-10-2-34-6.ap-southeast-2.compute.internal - cpu utilization 0.795337
I1124 00:46:29.654065       1 scale_down.go:466] Node ip-10-2-34-6.ap-southeast-2.compute.internal is not suitable for removal - cpu utilization too big (0.795337)
I1124 00:46:29.654674       1 delete.go:192] Releasing taint {Key:DeletionCandidateOfClusterAutoscaler Value:1606178759 Effect:PreferNoSchedule TimeAdded:<nil>} on node ip-10-2-34-6.ap-southeast-2.compute.internal

The affected nodes do not have the following taint (my suspicion is that CA added the taint, then removed it):

    taints:
    - effect: PreferNoSchedule
      key: DeletionCandidateOfClusterAutoscaler

We can see that some nodes have the SchedulingDisabled for up to 6 hours:

image

Prometheus query:

kube_node_spec_taint{key="node.kubernetes.io/unschedulable", key!="DeletionCandidateOfClusterAutoscaler"}

What you expected to happen:
I expect the nodes to have a Ready status, or get deleted.

How to reproduce it (as minimally and precisely as possible):
On a kubernetes 1.17 EKS cluster, launch a worker node using the amazon-eks-node-1.14-v20201007 AMI, with cluster-autoscaler and the following user-data:

#!/bin/bash
set -o xtrace
export AWS_DEFAULT_REGION="$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document | grep -oP '\"region\"[[:space:]]*:[[:space:]]*\"\K[^\"]+')"
iid=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
ilc=`aws ec2 describe-instances --instance-ids  $iid  --query 'Reservations[0].Instances[0].InstanceLifecycle' --output text`
if [ "$ilc" == "spot" ]; then
  /etc/eks/bootstrap.sh --kubelet-extra-args '--node-labels=lifecycle=Spot --cluster-dns=169.254.20.10 --register-with-taints=spotInstance=true:PreferNoSchedule' --apiserver-endpoint '${aws_eks_cluster.eks.endpoint}' --b64-cluster-ca '${aws_eks_cluster.eks.certificate_authority[0].data}' 'eks-${var.environment}'
else
  /etc/eks/bootstrap.sh --kubelet-extra-args '--node-labels=lifecycle=OnDemand --cluster-dns=169.254.20.10' --apiserver-endpoint '${aws_eks_cluster.eks.endpoint}' --b64-cluster-ca '${aws_eks_cluster.eks.certificate_authority[0].data}' 'eks-${var.environment}'
fi

Anything else we need to know?:

  1. The nodes are underutilised:
NAME                                             CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
ip-10-2-34-6.ap-southeast-2.compute.internal     105m         5%     754Mi           10%

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         1535m (79%)   2600m (134%)
  memory                      1337Mi (18%)  2688Mi (37%)
  ephemeral-storage           0 (0%)        0 (0%)
  hugepages-1Gi               0 (0%)        0 (0%)
  hugepages-2Mi               0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0
  1. Here are the node taints :
Taints:             node.kubernetes.io/unschedulable:NoSchedule
                    spotInstance=true:PreferNoSchedule
Unschedulable:      true
  1. Here are the events:
Events:
  Type    Reason                   Age                From        Message
  ----    ------                   ----               ----        -------
  Normal  Starting                 40m                kubelet     Starting kubelet.
  Normal  NodeHasSufficientMemory  40m (x2 over 40m)  kubelet     Node ip-10-2-34-6.ap-southeast-2.compute.internal status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    40m (x2 over 40m)  kubelet     Node ip-10-2-34-6.ap-southeast-2.compute.internal status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     40m (x2 over 40m)  kubelet     Node ip-10-2-34-6.ap-southeast-2.compute.internal status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced  40m                kubelet     Updated Node Allocatable limit across pods
  Normal  Starting                 39m                kube-proxy  Starting kube-proxy.
  Normal  NodeNotSchedulable       39m                kubelet     Node ip-10-2-34-6.ap-southeast-2.compute.internal status is now: NodeNotSchedulable
  Normal  NodeReady                39m                kubelet     Node ip-10-2-34-6.ap-southeast-2.compute.internal status is now: NodeReady
  1. I am using a deployment with 2 replicas that have the following cluster-autoscaler image:
k8s.gcr.io/autoscaling/cluster-autoscaler:v1.17.4
  1. Contents of the cluster-autoscaler-status ConfigMap:
  status: |+
    Cluster-autoscaler status at 2020-11-24 01:28:12.471105109 +0000 UTC:
    Cluster-wide:
      Health:      Healthy (ready=9 unready=0 notStarted=0 longNotStarted=0 registered=9 longUnregistered=0)
                   LastProbeTime:      2020-11-24 01:28:12.467836437 +0000 UTC m=+87522.878222347
                   LastTransitionTime: 2020-11-23 01:21:08.047368651 +0000 UTC m=+698.457754608
      ScaleUp:     NoActivity (ready=9 registered=9)
                   LastProbeTime:      2020-11-24 01:28:12.467836437 +0000 UTC m=+87522.878222347
                   LastTransitionTime: 2020-11-24 00:54:11.86357796 +0000 UTC m=+85482.273963885
      ScaleDown:   NoCandidates (candidates=0)
                   LastProbeTime:      2020-11-24 01:28:12.467836437 +0000 UTC m=+87522.878222347
                   LastTransitionTime: 2020-11-24 00:54:11.86357796 +0000 UTC m=+85482.273963885

    NodeGroups:
      Name:        tf-asg-20200525114556707500000001
      Health:      Healthy (ready=9 unready=0 notStarted=0 longNotStarted=0 registered=9 longUnregistered=0 cloudProviderTarget=9 (minSize=2, maxSize=12))
                   LastProbeTime:      2020-11-24 01:28:12.467836437 +0000 UTC m=+87522.878222347
                   LastTransitionTime: 2020-11-23 01:21:08.047368651 +0000 UTC m=+698.457754608
      ScaleUp:     NoActivity (ready=9 cloudProviderTarget=9)
                   LastProbeTime:      2020-11-24 01:28:12.467836437 +0000 UTC m=+87522.878222347
                   LastTransitionTime: 2020-11-24 00:54:11.86357796 +0000 UTC m=+85482.273963885
      ScaleDown:   NoCandidates (candidates=0)
                   LastProbeTime:      2020-11-24 01:28:12.467836437 +0000 UTC m=+87522.878222347
                   LastTransitionTime: 2020-11-24 00:54:11.86357796 +0000 UTC m=+85482.273963885

Environment:

  • AWS Region: ap-southeast-2
  • Instance Type(s): m5.large
  • EKS Platform version: eks.2
  • Kubernetes version: 1.17
  • AMI Version: ami-087315adc4086bcef
  • Kernel : Linux ip-10-2-34-223.ap-southeast-2.compute.internal 4.14.198-152.320.amzn2.x86_64 Fix imports in cluster autoscaler after migrating it from contrib #1 SMP Wed Sep 23 23:57:28 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Release information:
BASE_AMI_ID="ami-01f93ce28477e7be1"
BUILD_TIME="Wed Oct  7 19:11:36 UTC 2020"
BUILD_KERNEL="4.14.198-152.320.amzn2.x86_64"
ARCH="x86_64"
BASE_AMI_ID="ami-01f93ce28477e7be1"
BUILD_TIME="Wed Oct  7 19:11:36 UTC 2020"
BUILD_KERNEL="4.14.198-152.320.amzn2.x86_64"
ARCH="x86_64"
@sudo-justinwilson
Copy link
Author

It turns out that the aws-node-termination-handler is responsible for giving the nodes an unschedulable status when a REBALANCE_RECOMMENDATION occurs.

@adriannieto-attechnest
Copy link

@sudo-justinwilson how did you fix the issue? We are experiencing the same issue here.

@ghost
Copy link

ghost commented Mar 14, 2022

Try patching for the workaround:

For example

kubectl patch node 10.xx.xx.xxx -p '{"spec":{"unschedulable":false}}'

@sudo-justinwilson
Copy link
Author

aws-node-termination-handler

Are you using aws-node-termination-handler? If so, check out the previous comment..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants