Adding CriticalAddonsOnly taint doesn't allow cluster to start #487

ChristopherEdwards · 2020-10-14T12:13:04Z

When installing a new cluster on a clean Amazon Linux 2 instance and setting:

node-taint:
  - "CriticalAddonsOnly=true:NoExecute"

in the config.yaml file results in a cluster node that will never become ready.

I'm installing via the tarball installer and can provide details if required.

config.yaml:

token: ${rancher-token}
node-taint:
  - "CriticalAddonsOnly=true:NoExecute"
tls-san:
  - "${control-plane-dns}"

The text was updated successfully, but these errors were encountered:

ChristopherEdwards · 2020-10-14T13:17:45Z

get pods output:

NAMESPACE     NAME                                                  READY   STATUS    RESTARTS   AGE
kube-system   etcd-ip-xxx-xxx-xxx-xxx.ec2.internal                      1/1     Running   0          4m31s
kube-system   helm-install-rke2-canal-z4mcr                         0/1     Pending   0          5m18s
kube-system   helm-install-rke2-coredns-qhldn                       0/1     Pending   0          5m18s
kube-system   helm-install-rke2-ingress-nginx-2w45l                 0/1     Pending   0          5m18s
kube-system   helm-install-rke2-kube-proxy-767j4                    0/1     Pending   0          5m18s
kube-system   helm-install-rke2-metrics-server-htmp4                0/1     Pending   0          5m18s
kube-system   kube-apiserver-ip-xxx-xxx-xxx-xxx.ec2.internal            1/1     Running   0          4m6s
kube-system   kube-controller-manager-ip-xxx-xxx-xxx-xxx.ec2.internal   1/1     Running   0          4m10s
kube-system   kube-scheduler-ip-xxx-xxx-xxx-xxx.ec2.internal            1/1     Running   0          4m29s

All pending helm-install-rke2 pods are failing with:

  Warning  FailedScheduling  8m24s  default-scheduler  0/1 nodes are available: 1 node(s) had taint {CriticalAddonsOnly: true}, that the pod didn't tolerate.
  Warning  FailedScheduling  8m24s  default-scheduler  0/1 nodes are available: 1 node(s) had taint {CriticalAddonsOnly: true}, that the pod didn't tolerate.
  Warning  FailedScheduling  6m46s  default-scheduler  0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) had taint {CriticalAddonsOnly: true}, that the pod didn't tolerate.

get nodes output:

NAME                          STATUS     ROLES         AGE     VERSION
ip-xxx-xxx-xxx-xxx.ec2.internal   NotReady   <none>        7m36s   v1.18.9+rke2r1
ip-xxx-xxx-xxx-xxx.ec2.internal   NotReady   etcd,master   9m30s   v1.18.9+rke2r1

joshrwolf · 2020-10-16T20:34:14Z

can confirm this issue, it appears that while the system charts are properly tainted, the helm install jobs that deploy them are not, resulting in what @ChristopherEdwards is showing above

innobead · 2020-12-09T08:03:04Z

@c3y1huang please help with this issue which is related to the pod created by the job does not have the correct tolerations to make the helm installation pending.

Please try to review the below code (helm-controller) to see how to make CriticalAddonsOnly workable in this case.

https://github.com/k3s-io/k3s/blob/15d03c5930e37cb7aad00c65486902fb66dc744a/vendor/github.com/k3s-io/helm-controller/pkg/helm/controller.go#L209-L209

cc @jenting @cclhsu

davidnuzik · 2020-12-16T18:15:29Z

Assigning to you @brandond as per our discussion in last sprint (yesterday, 12/15). You will take point on this issue. Review the PR that @c3y1huang proposed (thank you for the PR) and ensure it looks good / provide feedback. We should work towards getting this in to a January release if possible. I've set the 1.19.6 milestone for mid-January.

brandond · 2021-01-08T21:48:43Z

This should be fixed in the above-linked PR:

[root@centos01 ~]# kubectl describe pod -n kube-system   helm-install-rke2-canal-vqgfh
Name:         helm-install-rke2-canal-vqgfh
Namespace:    kube-system
Priority:     0
Node:         centos01.lan.khaus/10.0.1.137
Start Time:   Fri, 08 Jan 2021 13:41:20 -0800
Labels:       controller-uid=191e6541-893b-4ce2-a786-7f954a1e71e5
              helmcharts.helm.cattle.io/chart=rke2-canal
              job-name=helm-install-rke2-canal
Annotations:  helmcharts.helm.cattle.io/configHash: SHA256=D54E55CAEE0F4088268827DA9824C7A101199892C9F0D3CE6E991DA685802A46
              kubernetes.io/psp: global-unrestricted-psp
Status:       Succeeded
...
Tolerations:     CriticalAddonsOnly op=Exists
                 node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
                 node.kubernetes.io/not-ready:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

ShylajaDevadiga · 2021-01-12T21:22:49Z

Using master build rke2 version v1.19.5-dev+5c837fe7
On a single node cluster helm-install-rke2-ingress-nginx and helm-install-rke2-metrics failed as expected as they are not tolerated.
rke2-coredns fail is fixed in rancher/rke2-charts#40

$ kubectl describe pod -n kube-system helm-install-rke2-canal-nbb8p  |grep -i critical
Tolerations:     CriticalAddonsOnly op=Exists
$ kubectl describe pod -n kube-system helm-install-rke2-coredns-cgttf   |grep -i critical
Tolerations:     CriticalAddonsOnly op=Exists

 $ kubectl describe pod -n kube-system helm-install-rke2-ingress-nginx-m2qdl |grep -i critical
  Warning  FailedScheduling  2m7s  default-scheduler  0/1 nodes are available: 1 node(s) had taint {CriticalAddonsOnly: true}, that the pod didn't tolerate.
  Warning  FailedScheduling  2m7s  default-scheduler  0/1 nodes are available: 1 node(s) had taint {CriticalAddonsOnly: true}, that the pod didn't tolerate.
  
 $ kubectl describe pod -n kube-system helm-install-rke2-metrics-server-96tzr |grep -i critical
  Warning  FailedScheduling  23m   default-scheduler  0/1 nodes are available: 1 node(s) had taint {CriticalAddonsOnly: true}, that the pod didn't tolerate.
  Warning  FailedScheduling  23m   default-scheduler  0/1 nodes are available: 1 node(s) had taint {CriticalAddonsOnly: true}, that the pod didn't tolerate.
  
  $ kubectl describe pod -n kube-system rke2-coredns-rke2-coredns-bbf9475cb-n6hnw |grep -i critical
Priority Class Name:  system-cluster-critical
                      scheduler.alpha.kubernetes.io/critical-pod: 
                      scheduler.alpha.kubernetes.io/tolerations: [{"key":"CriticalAddonsOnly", "operator":"Exists"}]
  Warning  FailedScheduling  19m   default-scheduler  0/1 nodes are available: 1 node(s) had taint {CriticalAddonsOnly: true}, that the pod didn't tolerate.
  Warning  FailedScheduling  19m   default-scheduler  0/1 nodes are available: 1 node(s) had taint {CriticalAddonsOnly: true}, that the pod didn't tolerate.

On a two node cluster
rke2 version v1.19.5-dev+5c837fe7

Created a two node cluster with one Control Plane and One worker node

$ cat /etc/rancher/rke2/config.yaml
node-taint:
  - "CriticalAddonsOnly=true:NoExecute"

$ kubectl get nodes
NAME               STATUS   ROLES         AGE     VERSION
ip-172-31-26-237   Ready    <none>        5m31s   v1.19.5-dev+5c837fe7
ip-172-31-31-92    Ready    etcd,master   96m     v1.19.5-dev+5c837fe7

$ kubectl get pods -A
NAMESPACE     NAME                                                 READY   STATUS      RESTARTS   AGE
kube-system   etcd-ip-172-31-31-92                                 1/1     Running     0          95m
kube-system   helm-install-rke2-canal-msrnz                        0/1     Completed   0          96m
kube-system   helm-install-rke2-coredns-l76v2                      0/1     Completed   0          96m
kube-system   helm-install-rke2-ingress-nginx-79wj2                0/1     Completed   0          96m
kube-system   helm-install-rke2-kube-proxy-42jwn                   0/1     Completed   0          96m
kube-system   helm-install-rke2-metrics-server-xjbls               0/1     Completed   0          96m
kube-system   kube-apiserver-ip-172-31-31-92                       1/1     Running     0          95m
kube-system   kube-controller-manager-ip-172-31-31-92              1/1     Running     0          96m
kube-system   kube-proxy-4t94k                                     1/1     Running     0          96m
kube-system   kube-proxy-t69bp                                     1/1     Running     0          5m39s
kube-system   kube-scheduler-ip-172-31-31-92                       1/1     Running     0          96m
kube-system   rke2-canal-4jb2p                                     2/2     Running     0          96m
kube-system   rke2-canal-ls59f                                     2/2     Running     0          5m39s
kube-system   rke2-coredns-rke2-coredns-bbf9475cb-ht7r9            1/1     Running     0          96m
kube-system   rke2-ingress-nginx-controller-54946dd48f-r5jsg       1/1     Running     0          4m45s
kube-system   rke2-ingress-nginx-default-backend-5795954f8-rkl8h   1/1     Running     0          4m45s
kube-system   rke2-metrics-server-5f9b5757dc-zr67r                 1/1     Running     0          4m45s

$ kubectl describe pod -n kube-system   helm-install-rke2-coredns-l76v2 |tail -6
Tolerations:     CriticalAddonsOnly op=Exists
                 node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
                 node.kubernetes.io/not-ready:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

ShylajaDevadiga · 2021-01-19T20:49:45Z

Validated on rke2 version v1.19.7-rc1+rke2r1, rke2-coredns is in Running state on a single node cluster

ubuntu@ip-172-31-6-18:~$ kubectl get nodes
NAME             STATUS   ROLES         AGE     VERSION
ip-172-31-6-18   Ready    etcd,master   8m30s   v1.19.7-rc1+rke2r1
ubuntu@ip-172-31-6-18:~$ kubectl get pods -A
NAMESPACE     NAME                                         READY   STATUS      RESTARTS   AGE
kube-system   etcd-ip-172-31-6-18                          1/1     Running     0          7m36s
kube-system   helm-install-rke2-canal-5p4jc                0/1     Completed   0          8m24s
kube-system   helm-install-rke2-coredns-d6z4w              0/1     Completed   0          8m24s
kube-system   helm-install-rke2-ingress-nginx-s4n49        0/1     Pending     0          8m24s
kube-system   helm-install-rke2-kube-proxy-bvtds           0/1     Completed   0          8m24s
kube-system   helm-install-rke2-metrics-server-x9p4c       0/1     Pending     0          8m24s
kube-system   kube-apiserver-ip-172-31-6-18                1/1     Running     0          7m40s
kube-system   kube-controller-manager-ip-172-31-6-18       1/1     Running     0          7m36s
kube-system   kube-proxy-t9stq                             1/1     Running     0          8m10s
kube-system   kube-scheduler-ip-172-31-6-18                1/1     Running     0          7m28s
kube-system   rke2-canal-2mm8j                             2/2     Running     0          8m10s
kube-system   rke2-coredns-rke2-coredns-6cd96645d6-h8fvf   1/1     Running     0          8m8s
ubuntu@ip-172-31-6-18:~$

gawsoftpl · 2024-06-15T00:09:51Z

This issue still occurs during create cluster: 3x control_plane + etcd, If you want to create cluster you have to taint k3s-io/k3s#6383 nodes after creation

brandond · 2024-06-15T00:47:23Z

@gawsoftpl you're on the wrong repo. This issue is regarding rke2. If you have a problem with k3s, please open an issue over there.

davidnuzik added the [zube]: Needs Attention label Oct 21, 2020

c3y1huang mentioned this issue Dec 15, 2020

Fix CriticalAddonsOnly node-taint caused job pend k3s-io/helm-controller#84

Merged

davidnuzik assigned brandond Dec 16, 2020

davidnuzik added [zube]: Next Up and removed [zube]: Needs Attention labels Dec 16, 2020

davidnuzik added this to the v1.19.6+rke2r1 milestone Dec 16, 2020

davidnuzik added [zube]: Working and removed [zube]: Next Up labels Jan 5, 2021

This was referenced Jan 8, 2021

Update helm-controller to v0.8.3 k3s-io/k3s#2793

Merged

Update helm-controller to v0.8.3 #655

Merged

brandond added [zube]: To Test and removed [zube]: Working labels Jan 8, 2021

davidnuzik assigned ShylajaDevadiga Jan 8, 2021

brandond mentioned this issue Jan 12, 2021

Fix CoreDNS CriticalAddonsOnly toleration rancher/rke2-charts#40

Merged

ShylajaDevadiga closed this as completed Jan 19, 2021

zube bot added [zube]: Done and removed [zube]: To Test labels Jan 19, 2021

This was referenced Jan 20, 2021

[release-1.19] Update helm-controller to v0.8.3 k3s-io/k3s#2832

Merged

[release-1.18] Update helm-controller to v0.8.3 k3s-io/k3s#2833

Merged

[release-1.17] Update helm-controller to v0.8.3 k3s-io/k3s#2834

Closed

afloarea mentioned this issue Oct 28, 2022

helm-install-* pod remains in pending when installing k3s with --node-taint CriticalAddonsOnly=true:NoExecute k3s-io/k3s#6383

Closed

rancher locked as resolved and limited conversation to collaborators Jun 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding CriticalAddonsOnly taint doesn't allow cluster to start #487

Adding CriticalAddonsOnly taint doesn't allow cluster to start #487

ChristopherEdwards commented Oct 14, 2020 •

edited

Loading

ChristopherEdwards commented Oct 14, 2020

joshrwolf commented Oct 16, 2020

innobead commented Dec 9, 2020 •

edited

Loading

davidnuzik commented Dec 16, 2020 •

edited

Loading

brandond commented Jan 8, 2021

ShylajaDevadiga commented Jan 12, 2021 •

edited

Loading

ShylajaDevadiga commented Jan 19, 2021

gawsoftpl commented Jun 15, 2024

brandond commented Jun 15, 2024 •

edited

Loading

Adding CriticalAddonsOnly taint doesn't allow cluster to start #487

Adding CriticalAddonsOnly taint doesn't allow cluster to start #487

Comments

ChristopherEdwards commented Oct 14, 2020 • edited Loading

ChristopherEdwards commented Oct 14, 2020

joshrwolf commented Oct 16, 2020

innobead commented Dec 9, 2020 • edited Loading

davidnuzik commented Dec 16, 2020 • edited Loading

brandond commented Jan 8, 2021

ShylajaDevadiga commented Jan 12, 2021 • edited Loading

ShylajaDevadiga commented Jan 19, 2021

gawsoftpl commented Jun 15, 2024

brandond commented Jun 15, 2024 • edited Loading

ChristopherEdwards commented Oct 14, 2020 •

edited

Loading

innobead commented Dec 9, 2020 •

edited

Loading

davidnuzik commented Dec 16, 2020 •

edited

Loading

ShylajaDevadiga commented Jan 12, 2021 •

edited

Loading

brandond commented Jun 15, 2024 •

edited

Loading