You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Similar to how cluster-autoscaler uses taints when marking nodes for scale down, aws-node-termination-handler should taint nodes which will be terminated, in addition to everything it does currently. This issue is slightly similar to #123 but the proposal is to add to the current behavior.
The reason for this is to make it possible to detect programmatically that a spot node becoming unschedulable in k8s is due to termination vs due to cluster issues. For example, when using prometheus-operator, default KubeNodeUnreachable alert looks like this:
That sounds like a good addition to the project. A PR is certainly welcome!
We have a node labeling function already defined, so shouldn't be too hard to add a taint function in the node pkg.
Also, should the taint only be applied to spot interruption termination notices (ITNs) or should it also apply to EC2 scheduled maintenance events? Maybe it would make sense to have an aws-node-termination-handler/spot-itn and aws-node-termination-handler/scheduled-maintenance?
If we include tainting at this point in the code, we'll have the information needed to differentiate between a maintenance event and a spot ITN. Alternatively, the taint could be added to each of the drain events PreDrainTask hooks.
Similar to how
cluster-autoscaler
uses taints when marking nodes for scale down,aws-node-termination-handler
should taint nodes which will be terminated, in addition to everything it does currently. This issue is slightly similar to #123 but the proposal is to add to the current behavior.The reason for this is to make it possible to detect programmatically that a spot node becoming unschedulable in k8s is due to termination vs due to cluster issues. For example, when using
prometheus-operator
, defaultKubeNodeUnreachable
alert looks like this:which often misfires for nodes being scaled down by the cluster-autoscaler. To fix the problem, we can drop nodes with a taint:
If
aws-node-termination-handler
tainted nodes that taint could be incorporated into the alert as well.I'm happy to PR this if there are no objections.
The text was updated successfully, but these errors were encountered: