Design Question: Drain Node vs Add Taints #123

leoskyrocker · 2020-03-18T14:35:28Z

Currently, most of the termination handler including this one, drains the node when termination notice is received.

TaintBasedEviction has been enabled since 1.13 (and TaintNodesByCondition since 1.12). Has anyone thought about using taints instead to approach this as below?

On receiving Termination Notice, taint the affected node with "termination-notice:NoSchedule" and "termination-notice:NoExecute".

In that case, any pods that do not have the toleration will automatically be evicted.
It will then give back the decision (and flexibility) to the cluster admin whether a pod can tolerate this or not.

This will provide more flexibility and scalability to say what combination of pods should or should not be evicted per the user's need.

Notes

Taints by Condition also seems to be how k8s handles out-of-resource eviction: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/#taint-nodes-by-condition

jaypipes · 2020-03-18T15:53:01Z

NTH uses the standard cordon and drain mechanism from the kubectl command of the same name:

aws-node-termination-handler/pkg/node/node.go

Lines 93 to 103 in 14188ee

    
           func (n Node) cordonNode() error { 
        
           	node, err := n.fetchKubernetesNode() 
        
           	if err != nil { 
        
           		return err 
        
           	} 
        
           	err = drain.RunCordonOrUncordon(n.drainHelper, node, true) 
        
           	if err != nil { 
        
           		return err 
        
           	} 
        
           	return nil 
        
           }

which ends up setting the Node.Spec.Unschedulable field to true:

https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/kubectl/pkg/drain/cordon.go#L83

Once NTH issues the cordon call (which sets Node.Spec.Unschedulable to true), it then labels the node with a draint event:

https://github.com/aws/aws-node-termination-handler/blob/master/pkg/node/node.go#L135-L142

that it uses to ensure correct operation after a reboot of the worker node (if the worker is not a Spot instance of course)

At the end of the day, the cluster admin can still use PodDisruptionBudgets to configure a Pod's general priority, but the termination or reboot notice is not something that a Pod can prevent from happening, which is why taints and tolerations aren't really appropriate here, IMHO.

leoskyrocker · 2020-03-19T04:21:33Z

I see. I do see some enhancement with my described approach:

Cluster admins can decide whether a pod should tolerate termination, but if they tolerate unschedulable it could mean that it can cause unexpected behavior as it has multiple meanings.
There seems to be use cases where users want pods to run till the very last minute even though it's aware of an upcoming termination.
Currently it allows skipping all daemonsets, but there could be a chance that users don't want to skip particular daemonsets.

Anyway, I agree with you overall and most of the use cases should be able to resolve via PDB, so I'm closing this until someone thinks of a use case for this.

leoskyrocker closed this as completed Mar 19, 2020

diversario mentioned this issue May 13, 2020

Taint nodes when termination notification is detected #160

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design Question: Drain Node vs Add Taints #123

Design Question: Drain Node vs Add Taints #123

leoskyrocker commented Mar 18, 2020 •

edited

Loading

jaypipes commented Mar 18, 2020

leoskyrocker commented Mar 19, 2020 •

edited

Loading

Design Question: Drain Node vs Add Taints #123

Design Question: Drain Node vs Add Taints #123

Comments

leoskyrocker commented Mar 18, 2020 • edited Loading

jaypipes commented Mar 18, 2020

leoskyrocker commented Mar 19, 2020 • edited Loading

leoskyrocker commented Mar 18, 2020 •

edited

Loading

leoskyrocker commented Mar 19, 2020 •

edited

Loading