Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Question: Drain Node vs Add Taints #123

Closed
leoskyrocker opened this issue Mar 18, 2020 · 2 comments
Closed

Design Question: Drain Node vs Add Taints #123

leoskyrocker opened this issue Mar 18, 2020 · 2 comments

Comments

@leoskyrocker
Copy link

leoskyrocker commented Mar 18, 2020

Currently, most of the termination handler including this one, drains the node when termination notice is received.

TaintBasedEviction has been enabled since 1.13 (and TaintNodesByCondition since 1.12). Has anyone thought about using taints instead to approach this as below?

On receiving Termination Notice, taint the affected node with "termination-notice:NoSchedule" and "termination-notice:NoExecute".

In that case, any pods that do not have the toleration will automatically be evicted.
It will then give back the decision (and flexibility) to the cluster admin whether a pod can tolerate this or not.

This will provide more flexibility and scalability to say what combination of pods should or should not be evicted per the user's need.

Notes

@jaypipes
Copy link
Contributor

NTH uses the standard cordon and drain mechanism from the kubectl command of the same name:

func (n Node) cordonNode() error {
node, err := n.fetchKubernetesNode()
if err != nil {
return err
}
err = drain.RunCordonOrUncordon(n.drainHelper, node, true)
if err != nil {
return err
}
return nil
}

which ends up setting the Node.Spec.Unschedulable field to true:

https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/kubectl/pkg/drain/cordon.go#L83

Once NTH issues the cordon call (which sets Node.Spec.Unschedulable to true), it then labels the node with a draint event:

https://github.com/aws/aws-node-termination-handler/blob/master/pkg/node/node.go#L135-L142

that it uses to ensure correct operation after a reboot of the worker node (if the worker is not a Spot instance of course)

At the end of the day, the cluster admin can still use PodDisruptionBudgets to configure a Pod's general priority, but the termination or reboot notice is not something that a Pod can prevent from happening, which is why taints and tolerations aren't really appropriate here, IMHO.

@leoskyrocker
Copy link
Author

leoskyrocker commented Mar 19, 2020

I see. I do see some enhancement with my described approach:

  1. Cluster admins can decide whether a pod should tolerate termination, but if they tolerate unschedulable it could mean that it can cause unexpected behavior as it has multiple meanings.
  2. There seems to be use cases where users want pods to run till the very last minute even though it's aware of an upcoming termination.
  3. Currently it allows skipping all daemonsets, but there could be a chance that users don't want to skip particular daemonsets.

Anyway, I agree with you overall and most of the use cases should be able to resolve via PDB, so I'm closing this until someone thinks of a use case for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants