Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support node start-up taint #1581

Closed

Conversation

gtxu
Copy link
Contributor

@gtxu gtxu commented Apr 20, 2023

Is this a bug fix or adding new feature?

  • Feature

What is this PR about? / Why do we need it?

This PR added support for start-up taint removal for csi-node daemonset pods. This feature allows cluster admin to set taints on nodes, blocking workload pod to be scheduled before the driver start up and be ready. By automatically remove the taints marks the node ready for any CSI functionalities and workload start to be scheduled to the node. This feature can be configured in driver options and in Helm values.

closes issue #1232
What testing is done?
Manual Tested taint removal and logs

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from gtxu. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from rdpsin and torredil April 20, 2023 15:14
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 20, 2023
@gtxu gtxu force-pushed the node-taint-removal-on-startup branch from 698be78 to 846eb58 Compare April 20, 2023 15:28
@gtxu gtxu marked this pull request as draft April 20, 2023 15:29
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 20, 2023
@gtxu gtxu requested a review from ConnorJC3 April 20, 2023 15:29
@gtxu gtxu force-pushed the node-taint-removal-on-startup branch from 846eb58 to 131f9e3 Compare April 20, 2023 15:37
@gtxu gtxu marked this pull request as ready for review April 20, 2023 17:39
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 20, 2023
@gtxu gtxu marked this pull request as draft April 20, 2023 17:52
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 20, 2023
@gtxu gtxu force-pushed the node-taint-removal-on-startup branch from 131f9e3 to 21278e5 Compare April 20, 2023 18:28
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 20, 2023
@gtxu gtxu marked this pull request as ready for review April 20, 2023 18:45
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 20, 2023
@gtxu
Copy link
Contributor Author

gtxu commented Apr 20, 2023

/retest

@gtxu gtxu force-pushed the node-taint-removal-on-startup branch from 21278e5 to c52478e Compare April 21, 2023 16:22
if driverOptions.startupTaintRemoval {
err := cloud.RemoveNodeTaint(cloud.DefaultKubernetesAPIClient, AgentNotReadyNodeTaintKey)
if err != nil {
klog.InfoS("Node agent-not-ready taint error", "error", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if someone enables startupTaintRemoval even when running on other COs? This will cause the node service to fail to run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a check and handle the error as log, moving this part after nodeService creating.

@@ -0,0 +1,77 @@
package cloud
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloud is for AWS specific stuff. This should be part of driver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it in cloud as it make API request here, moving to driver pkg.

}

// RemoveNodeTaint() patched the node, removes the taint that match NodeTaintKey
func RemoveNodeTaint(k8sAPIClient KubernetesAPIClient, NodeTaintKey string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np: nodeTaintKey

}

if !hasTaint {
return fmt.Errorf("could not find node taint, key: %v, node: %v", NodeTaintKey, nodeName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to return an error here? This will cause the driver to fail to run, even if someone explicitly didn't put a taint on the node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we do not remove the taint at service creating, and handle the error message as warning or log to event. working on it

@@ -41,6 +41,19 @@ kubectl create secret generic aws-secret \
### Configure driver toleration settings
By default, the driver controller tolerates taint `CriticalAddonsOnly` and has `tolerationSeconds` configured as `300`; and the driver node tolerates all taints. If you don't want to deploy the driver node on all nodes, please set Helm `Value.node.tolerateAllTaints` to false before deployment. Add policies to `Value.node.tolerations` to configure customized toleration for nodes.

### Configure node taint and driver start-up taint
In some cases when new node frequesntly join the cluster, workload pods can be scheduled to a new node ahead of the csi-node start-up and ready on that node. This race condition between workload pod and csi-node pod will cause the workload pod fails to mount PVC at first place.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np: node->nodes, frequesntly->frequently

@@ -58,6 +58,9 @@ spec:
{{- with .Values.node.volumeAttachLimit }}
- --volume-attach-limit={{ . }}
{{- end }}
{{- if .Values.node.startUpTaint }}
- --start-up-taint=true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on start-up-taint -> remove-driver-node-taints ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about remove-not-ready-taints?

@@ -273,6 +273,7 @@ node:
enableWindows: false
# The "maximum number of attachable volumes" per node
volumeAttachLimit:
startUpTaint: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np: removeDriverNodeTaints

@@ -41,6 +41,19 @@ kubectl create secret generic aws-secret \
### Configure driver toleration settings
By default, the driver controller tolerates taint `CriticalAddonsOnly` and has `tolerationSeconds` configured as `300`; and the driver node tolerates all taints. If you don't want to deploy the driver node on all nodes, please set Helm `Value.node.tolerateAllTaints` to false before deployment. Add policies to `Value.node.tolerations` to configure customized toleration for nodes.

### Configure node taint and driver start-up taint
In some cases when new node frequesntly join the cluster, workload pods can be scheduled to a new node ahead of the csi-node start-up and ready on that node. This race condition between workload pod and csi-node pod will cause the workload pod fails to mount PVC at first place.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested wording:

When new nodes frequently join a cluster, there may be cases where workload pods are scheduled to a new node before the csi-node starts up and becomes ready on that node. This can create a race condition between the workload pod and the csi-node pod, which can result in the workload pod failing to mount the PVC initially.

### Configure node taint and driver start-up taint
In some cases when new node frequesntly join the cluster, workload pods can be scheduled to a new node ahead of the csi-node start-up and ready on that node. This race condition between workload pod and csi-node pod will cause the workload pod fails to mount PVC at first place.

To help overcome this situation, CSI Driver node can manipulate Kubernetes’s taints on a given node to help preventing pods from starting before CSI Driver's node pod runs on this node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested wording:

To mitigate this issue, the CSI Driver can adjust the taints in Kubernetes to prevent pods from starting until the CSI Driver's node pod runs on that node.


To help overcome this situation, CSI Driver node can manipulate Kubernetes’s taints on a given node to help preventing pods from starting before CSI Driver's node pod runs on this node.

To configure start-up taint, the cluster administrator can places a taint with key `node.ebs.csi.aws.com/agent-not-ready` on a given uninitialized node(or node group). This prevents pods that don’t have a matching toleration from either being scheduled or altogether running on the node until the taint is removed. If use Helm to install CSI Driver, set `.Value.node.startUpTaint` to `true`. Once the CSI Driver pod runs on the node, initializes and ready, it will removes the aforementioned taint. After that, workload pods will start being scheduled and running on the node, with CSI Driver full functional on that node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested wording:

The cluster administrator can place a taint on an uninitialized node (or node group) with the key `node.ebs.csi.aws.com/agent-not-ready`. This taint prevents pods that do not have a corresponding toleration from being scheduled or running on the node until the taint is removed. If the CSI Driver is installed using Helm, .Value.node.removeDriverNodeTaints can be set to true. Once the CSI Driver pod is running on the node, it will remove the aforementioned taint. Afterward, workload pods can be scheduled and run on the node along with the CSI Driver.

}

func (o *NodeOptions) AddFlags(fs *flag.FlagSet) {
fs.Int64Var(&o.VolumeAttachLimit, "volume-attach-limit", -1, "Value for the maximum number of volumes attachable per node. If specified, the limit applies to all nodes. If not specified, the value is approximated from the instance type.")
fs.BoolVar(&o.StartupTaintRemoval, "start-up-taint", false, "To enable the node service remove node-ready taint after startup (default to false).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this knob provides primarily two benefits:

  1. Saving an API call to the K8s API if it is disabled.
  2. Sets intent, which allows us to log a warning if the taint is missing.

Removing the knob allows for the feature to work automatically with no user configuration needed.

Should we move forward with adding this knob? What is the best user experience here?

}

// RemoveNodeTaint() patched the node, removes the taint that match NodeTaintKey
func RemoveNodeTaint(k8sAPIClient KubernetesAPIClient, NodeTaintKey string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a helper library here instead so that we don't have to construct the JSON patch manually, etc?
K8s taints package implements utilities for working with taints: https://pkg.go.dev/k8s.io/kubernetes/pkg/util/taints.

@gtxu gtxu force-pushed the node-taint-removal-on-startup branch from 249bb8e to c8f4867 Compare April 24, 2023 23:53
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Apr 25, 2023

@gtxu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-aws-ebs-csi-driver-external-test c8f4867 link true /test pull-aws-ebs-csi-driver-external-test
pull-aws-ebs-csi-driver-external-test-eks c8f4867 link true /test pull-aws-ebs-csi-driver-external-test-eks

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

ConnorJC3 added a commit to ConnorJC3/aws-ebs-csi-driver that referenced this pull request May 3, 2023
Implements a feature to remove a taint on driver startup to alleviate
potential race conditions. Supercedes kubernetes-sigs#1581, all credit for the design
and initial implementation to @gtxu.

Co-authored-by: Gengtao Xu <[email protected]>
Signed-off-by: Connor Catlett <[email protected]>
@ConnorJC3
Copy link
Contributor

/close

Superceded by #1588
Thanks for the initial work!

@k8s-ci-robot
Copy link
Contributor

@ConnorJC3: Closed this PR.

In response to this:

/close

Superceded by #1588
Thanks for the initial work!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 15, 2023
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants