Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tunnelfront and coredns fails to schedule when all nodes are tainted #1401

Closed
folkol opened this issue Jan 21, 2020 · 4 comments
Closed

Tunnelfront and coredns fails to schedule when all nodes are tainted #1401

folkol opened this issue Jan 21, 2020 · 4 comments
Labels
nodepools resolution/answer-provided Provided answer to issue, question or feedback. system-pods

Comments

@folkol
Copy link

folkol commented Jan 21, 2020

What happened:

Tunnelfront and coredns pods failed to be scheduled when all nodes were tainted.

What you expected to happen:

To have the ability of letting the kube system deployments tolerate a tainted node.

How to reproduce it (as minimally and precisely as possible):

Set up a cluster e.g.:

$ az aks create \
    --resource-group $RESOURCE_GROUP_NAME \
    --name $CLUSTER_NAME \
    --node-count 1 \
    --nodepool-name nodepool1 \
    --vm-set-type VirtualMachineScaleSets
  • Taint the only node: kubectl taint nodes -l agentpool=nodepool1 zone=nodepool1:NoSchedule
  • Delete the tunnelfront pod: kubectl delete pod -n kube-system tunnelfront-xyz-123
  • New pod will be stuck in pending because it fails to be schedule with message: 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

Anything else we need to know?:

There is this similar issue: #363

Environment:

  • Server version: v1.14.8
  • Client version: v1.16.2
  • Location: francecentral
@yamansama
Copy link

Hi, any news on this ?

@ghost ghost added the action-required label Jul 22, 2020
@ghost
Copy link

ghost commented Jul 27, 2020

Action required from @Azure/aks-pm

@ghost ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label Jul 27, 2020
@ghost
Copy link

ghost commented Aug 6, 2020

Issue needing attention of @Azure/aks-leads

@palma21
Copy link
Member

palma21 commented Aug 6, 2020

This is by design to allow you to move your system compoents that are not daemon sets to a specific pool.

For the record, they already have affinity to pools of mode System.
https://docs.microsoft.com/en-us/azure/aks/use-system-pools

If you want to taint a pool to only have system components use the taint CriticalAddonsOnly which is already tolerated:

 tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
      - key: CriticalAddonsOnly
        operator: Exists

@palma21 palma21 closed this as completed Aug 6, 2020
@palma21 palma21 added resolution/answer-provided Provided answer to issue, question or feedback. and removed Needs Attention 👋 Issues needs attention/assignee/owner action-required labels Aug 6, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Sep 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
nodepools resolution/answer-provided Provided answer to issue, question or feedback. system-pods
Projects
None yet
Development

No branches or pull requests

4 participants