-
Notifications
You must be signed in to change notification settings - Fork 808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support node start-up taint #1581
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
698be78
to
846eb58
Compare
846eb58
to
131f9e3
Compare
131f9e3
to
21278e5
Compare
/retest |
21278e5
to
c52478e
Compare
if driverOptions.startupTaintRemoval { | ||
err := cloud.RemoveNodeTaint(cloud.DefaultKubernetesAPIClient, AgentNotReadyNodeTaintKey) | ||
if err != nil { | ||
klog.InfoS("Node agent-not-ready taint error", "error", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if someone enables startupTaintRemoval
even when running on other COs? This will cause the node service to fail to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add a check and handle the error as log, moving this part after nodeService creating.
@@ -0,0 +1,77 @@ | |||
package cloud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cloud
is for AWS specific stuff. This should be part of driver
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it in cloud as it make API request here, moving to driver pkg.
} | ||
|
||
// RemoveNodeTaint() patched the node, removes the taint that match NodeTaintKey | ||
func RemoveNodeTaint(k8sAPIClient KubernetesAPIClient, NodeTaintKey string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np: nodeTaintKey
} | ||
|
||
if !hasTaint { | ||
return fmt.Errorf("could not find node taint, key: %v, node: %v", NodeTaintKey, nodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to return an error here? This will cause the driver to fail to run, even if someone explicitly didn't put a taint on the node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we do not remove the taint at service creating, and handle the error message as warning or log to event. working on it
@@ -41,6 +41,19 @@ kubectl create secret generic aws-secret \ | |||
### Configure driver toleration settings | |||
By default, the driver controller tolerates taint `CriticalAddonsOnly` and has `tolerationSeconds` configured as `300`; and the driver node tolerates all taints. If you don't want to deploy the driver node on all nodes, please set Helm `Value.node.tolerateAllTaints` to false before deployment. Add policies to `Value.node.tolerations` to configure customized toleration for nodes. | |||
|
|||
### Configure node taint and driver start-up taint | |||
In some cases when new node frequesntly join the cluster, workload pods can be scheduled to a new node ahead of the csi-node start-up and ready on that node. This race condition between workload pod and csi-node pod will cause the workload pod fails to mount PVC at first place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np: node->nodes, frequesntly->frequently
@@ -58,6 +58,9 @@ spec: | |||
{{- with .Values.node.volumeAttachLimit }} | |||
- --volume-attach-limit={{ . }} | |||
{{- end }} | |||
{{- if .Values.node.startUpTaint }} | |||
- --start-up-taint=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thoughts on start-up-taint
-> remove-driver-node-taints
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about remove-not-ready-taints
?
@@ -273,6 +273,7 @@ node: | |||
enableWindows: false | |||
# The "maximum number of attachable volumes" per node | |||
volumeAttachLimit: | |||
startUpTaint: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np: removeDriverNodeTaints
@@ -41,6 +41,19 @@ kubectl create secret generic aws-secret \ | |||
### Configure driver toleration settings | |||
By default, the driver controller tolerates taint `CriticalAddonsOnly` and has `tolerationSeconds` configured as `300`; and the driver node tolerates all taints. If you don't want to deploy the driver node on all nodes, please set Helm `Value.node.tolerateAllTaints` to false before deployment. Add policies to `Value.node.tolerations` to configure customized toleration for nodes. | |||
|
|||
### Configure node taint and driver start-up taint | |||
In some cases when new node frequesntly join the cluster, workload pods can be scheduled to a new node ahead of the csi-node start-up and ready on that node. This race condition between workload pod and csi-node pod will cause the workload pod fails to mount PVC at first place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested wording:
When new nodes frequently join a cluster, there may be cases where workload pods are scheduled to a new node before the csi-node starts up and becomes ready on that node. This can create a race condition between the workload pod and the csi-node pod, which can result in the workload pod failing to mount the PVC initially.
### Configure node taint and driver start-up taint | ||
In some cases when new node frequesntly join the cluster, workload pods can be scheduled to a new node ahead of the csi-node start-up and ready on that node. This race condition between workload pod and csi-node pod will cause the workload pod fails to mount PVC at first place. | ||
|
||
To help overcome this situation, CSI Driver node can manipulate Kubernetes’s taints on a given node to help preventing pods from starting before CSI Driver's node pod runs on this node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested wording:
To mitigate this issue, the CSI Driver can adjust the taints in Kubernetes to prevent pods from starting until the CSI Driver's node pod runs on that node.
|
||
To help overcome this situation, CSI Driver node can manipulate Kubernetes’s taints on a given node to help preventing pods from starting before CSI Driver's node pod runs on this node. | ||
|
||
To configure start-up taint, the cluster administrator can places a taint with key `node.ebs.csi.aws.com/agent-not-ready` on a given uninitialized node(or node group). This prevents pods that don’t have a matching toleration from either being scheduled or altogether running on the node until the taint is removed. If use Helm to install CSI Driver, set `.Value.node.startUpTaint` to `true`. Once the CSI Driver pod runs on the node, initializes and ready, it will removes the aforementioned taint. After that, workload pods will start being scheduled and running on the node, with CSI Driver full functional on that node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested wording:
The cluster administrator can place a taint on an uninitialized node (or node group) with the key `node.ebs.csi.aws.com/agent-not-ready`. This taint prevents pods that do not have a corresponding toleration from being scheduled or running on the node until the taint is removed. If the CSI Driver is installed using Helm, .Value.node.removeDriverNodeTaints can be set to true. Once the CSI Driver pod is running on the node, it will remove the aforementioned taint. Afterward, workload pods can be scheduled and run on the node along with the CSI Driver.
} | ||
|
||
func (o *NodeOptions) AddFlags(fs *flag.FlagSet) { | ||
fs.Int64Var(&o.VolumeAttachLimit, "volume-attach-limit", -1, "Value for the maximum number of volumes attachable per node. If specified, the limit applies to all nodes. If not specified, the value is approximated from the instance type.") | ||
fs.BoolVar(&o.StartupTaintRemoval, "start-up-taint", false, "To enable the node service remove node-ready taint after startup (default to false).") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this knob provides primarily two benefits:
- Saving an API call to the K8s API if it is disabled.
- Sets intent, which allows us to log a warning if the taint is missing.
Removing the knob allows for the feature to work automatically with no user configuration needed.
Should we move forward with adding this knob? What is the best user experience here?
} | ||
|
||
// RemoveNodeTaint() patched the node, removes the taint that match NodeTaintKey | ||
func RemoveNodeTaint(k8sAPIClient KubernetesAPIClient, NodeTaintKey string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a helper library here instead so that we don't have to construct the JSON patch manually, etc?
K8s taints package implements utilities for working with taints: https://pkg.go.dev/k8s.io/kubernetes/pkg/util/taints.
249bb8e
to
c8f4867
Compare
@gtxu: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Implements a feature to remove a taint on driver startup to alleviate potential race conditions. Supercedes kubernetes-sigs#1581, all credit for the design and initial implementation to @gtxu. Co-authored-by: Gengtao Xu <[email protected]> Signed-off-by: Connor Catlett <[email protected]>
/close Superceded by #1588 |
@ConnorJC3: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Is this a bug fix or adding new feature?
What is this PR about? / Why do we need it?
This PR added support for start-up taint removal for csi-node daemonset pods. This feature allows cluster admin to set taints on nodes, blocking workload pod to be scheduled before the driver start up and be ready. By automatically remove the taints marks the node ready for any CSI functionalities and workload start to be scheduled to the node. This feature can be configured in driver options and in Helm values.
closes issue #1232
What testing is done?
Manual Tested taint removal and logs