Update sidecar timeout values #1824

torredil · 2023-11-01T17:45:45Z

What is this PR about? / Why do we need it?

This PR changes the default timeout value (15s) for the attacher sidecar, responsible for exercising ControllerPublishVolume / ControllerUnpublishVolume. The default value of 15s used today is not a sensible default, as a result the following error is observed frequently:

E1101 14:21:54.345145       1 driver.go:124] "GRPC error" err="rpc error: code = Internal desc = Could not detach volume \"vol-07ec886465a316b9b\" from node \"i-0e137107f5012fb1f\": context deadline exceeded"

In the vast majority of cases the volume is successfully detached just mere seconds after the attacher times out.

closes #1671

For context:

--timeout <duration>: Timeout of all calls to CSI driver. It should be set to value that accommodates majority of ControllerPublish and ControllerUnpublish calls. See [CSI error and timeout handling](https://github.com/kubernetes-csi/external-attacher#csi-error-and-timeout-handling) for details. 15 seconds is used by default.

What testing is done?

Manual testing
CI

deploy/kubernetes/base/controller.yaml

AndrewSirenko

/lgtm
Left non-blocking comment.

pkg/cloud/cloud.go

AndrewSirenko · 2023-11-01T19:24:39Z

/retest
/lgtm

wmesard

When an operation does take a long time, how will the cx change as a result of this PR? Fewer annoying error messages in the log, sure. But what else? Will it reduce the number of AWS API calls? Increase it? Will it reduce the length of time that it takes for K8s to notice that the volume attachment state has changed? Increase it?

charts/aws-ebs-csi-driver/templates/controller.yaml

ConnorJC3 · 2023-11-02T06:11:40Z

How will this behave if the user is already passing --timeout via additionalArgs? Do we need a guard against that?

AndrewSirenko · 2023-11-10T01:35:42Z

/lgtm
as two way door

ConnorJC3

Missing snapshotter?

charts/aws-ebs-csi-driver/templates/controller.yaml

Signed-off-by: Eddie Torres <[email protected]>

torredil · 2023-11-10T15:36:55Z

Missing snapshotter?

The snapshotter sidecar already uses a timeout value of 60s by default.

ConnorJC3 · 2023-11-10T15:39:38Z

/lgtm

torredil · 2023-11-10T16:51:13Z

The primary goal of adjusting the timeout values in this PR is to prevent premature context cancellations for CSI operations. With longer timeout values, Kubernetes (more specifically the sidecars) will wait a longer duration (60s) before retrying cancelled operations - this does not inherently change the speed at which (as an example) the volume attachment state changes - it simply provides the respective EC2 API call more time to complete before the sidecar retries.

In terms of operational performance, the increased timeout values would result in delays for operations that are genuinely stuck (which would likely be an indicator of a bug or inefficiency in the driver code. In this regard, increasing the timeout values improves the resiliency of our code because bugs can hide underneath the current set of values). However, in all other cases the operational performance is improved as the state is updated sooner because there is no need to wait for a retry.

/approve

k8s-ci-robot · 2023-11-10T16:51:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: torredil

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [torredil]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Update the operator to use the same sidecar arguments (timeouts, QPS, worker threads) as upstream. See kubernetes-sigs/aws-ebs-csi-driver#1824 and kubernetes-sigs/aws-ebs-csi-driver#1824.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 1, 2023

k8s-ci-robot requested review from AndrewSirenko and ConnorJC3 November 1, 2023 17:45

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Nov 1, 2023

torredil force-pushed the update-timeout branch from 0399d2a to 93f80d0 Compare November 1, 2023 17:48

torredil changed the title ~~Update attacher/provisioner RPC call timeout~~ Update csi-attacher timeout value Nov 1, 2023

ConnorJC3 force-pushed the master branch 2 times, most recently from 24a8e7b to bddbe0b Compare November 1, 2023 18:08

AndrewSirenko reviewed Nov 1, 2023

View reviewed changes

deploy/kubernetes/base/controller.yaml Show resolved Hide resolved

AndrewSirenko approved these changes Nov 1, 2023

View reviewed changes

k8s-ci-robot assigned AndrewSirenko Nov 1, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 1, 2023

ConnorJC3 reviewed Nov 1, 2023

View reviewed changes

pkg/cloud/cloud.go Outdated Show resolved Hide resolved

torredil force-pushed the update-timeout branch from 93f80d0 to 0360695 Compare November 1, 2023 18:44

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 1, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 1, 2023

wmesard reviewed Nov 1, 2023

View reviewed changes

rdpsin reviewed Nov 1, 2023

View reviewed changes

charts/aws-ebs-csi-driver/templates/controller.yaml Show resolved Hide resolved

torredil force-pushed the update-timeout branch from 0360695 to 741a3f4 Compare November 9, 2023 22:49

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 9, 2023

torredil changed the title ~~Update csi-attacher timeout value~~ Update sidecar timeout values Nov 9, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 10, 2023

ConnorJC3 reviewed Nov 10, 2023

View reviewed changes

charts/aws-ebs-csi-driver/templates/controller.yaml Outdated Show resolved Hide resolved

Update sidecar timeout values

5ed15f7

Signed-off-by: Eddie Torres <[email protected]>

torredil force-pushed the update-timeout branch from 741a3f4 to 5ed15f7 Compare November 10, 2023 15:35

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 10, 2023

k8s-ci-robot assigned ConnorJC3 Nov 10, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 10, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 10, 2023

k8s-ci-robot merged commit 479f6e8 into kubernetes-sigs:master Nov 10, 2023

jsafrane mentioned this pull request Nov 21, 2023

STOR-1400: Sync sidecar arguments with upstream openshift/csi-operator#72

Merged

ElijahQuinones mentioned this pull request Oct 29, 2024

Update external-snapshotter to have same timeout as other sidecars #2200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update sidecar timeout values #1824

Update sidecar timeout values #1824

torredil commented Nov 1, 2023 •

edited

Loading

AndrewSirenko left a comment

AndrewSirenko commented Nov 1, 2023

wmesard left a comment •

edited

Loading

ConnorJC3 commented Nov 2, 2023

AndrewSirenko commented Nov 10, 2023

ConnorJC3 left a comment

torredil commented Nov 10, 2023

ConnorJC3 commented Nov 10, 2023

torredil commented Nov 10, 2023

k8s-ci-robot commented Nov 10, 2023

Update sidecar timeout values #1824

Update sidecar timeout values #1824

Conversation

torredil commented Nov 1, 2023 • edited Loading

AndrewSirenko left a comment

Choose a reason for hiding this comment

AndrewSirenko commented Nov 1, 2023

wmesard left a comment • edited Loading

Choose a reason for hiding this comment

ConnorJC3 commented Nov 2, 2023

AndrewSirenko commented Nov 10, 2023

ConnorJC3 left a comment

Choose a reason for hiding this comment

torredil commented Nov 10, 2023

ConnorJC3 commented Nov 10, 2023

torredil commented Nov 10, 2023

k8s-ci-robot commented Nov 10, 2023

torredil commented Nov 1, 2023 •

edited

Loading

wmesard left a comment •

edited

Loading