Make NodeStageVolume idempotent #163

sreis · 2018-12-27T11:36:45Z

Is this a bug fix or adding new feature?

Bug fix.

If NodeStageVolume takes longer than the default CSI timeout to format the volume the CO can issue another request. This can lead to multiple inflight format commands against the volume and unexpected behavior.

What is this PR about? / Why do we need it?

The CSI spec requires most operations to be idempotent.

Currently, some of the requests rely on AWS API to achieve this. An example is CreateVolume. If by any chance AWS API request latency increases the CO might issue the same request that will fail. This leads the CO to send more CreateVolume requests and the possibility of creating multiple disks for the same request. (NOTE: This bug (multiple disks being created) is fixed in another PR but the underlying issue is still not fixed).

Other requests might take more time to complete. An example is NodeStageVolume. If the volume was just created it will need to be formatted for the mount to succeed. If the format takes too much time the CO can send another NodeStageVolume request that will fail because the previous one is still in progress.

This PR introduces a new class to handle inflight requests. It relies on the fact that the Golang protobuf implements the Stringer interface for all structures that it generates. We can use that string to compare requests and keep a list of the ones that are being processed.

Regarding NodeStageVolume idempotency, the spec says:

This operation MUST be idempotent. If the volume corresponding to the volume_id is
already staged to the staging_target_path, and is identical to the specified 
volume_capability the Plugin MUST reply 0 OK.

We check if the volume is already mounted at staging_target_path using mount.GetDeviceNameFromMount.

What testing is done?

Added unit tests for the new internal data structure to manage in flight requests.

k8s-ci-robot · 2018-12-27T11:36:47Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please email the CNCF helpdesk: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot · 2018-12-27T11:36:58Z

Hi @sreis. Thanks for your PR.

I'm waiting for a kubernetes-sigs or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

coveralls · 2018-12-27T11:42:02Z

Pull Request Test Coverage Report for Build 287

23 of 56 (41.07%) changed or added relevant lines in 3 files are covered.
2 unchanged lines in 2 files lost coverage.
Overall coverage increased (+0.07%) to 50.424%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/driver/driver.go	0	1	0.0%
pkg/driver/node.go	0	32	0.0%

Files with Coverage Reduction	New Missed Lines	%
pkg/driver/node.go	1	0.0%
pkg/driver/driver.go	1	0.0%

Totals
Change from base Build 284:	0.07%
Covered Lines:	595
Relevant Lines:	1180

💛 - Coveralls

coveralls · 2018-12-27T11:42:02Z

Pull Request Test Coverage Report for Build 316

25 of 52 (48.08%) changed or added relevant lines in 4 files are covered.
2 unchanged lines in 2 files lost coverage.
Overall coverage increased (+0.4%) to 50.765%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/driver/driver.go	0	1	0.0%
pkg/driver/node.go	0	26	0.0%

Files with Coverage Reduction	New Missed Lines	%
pkg/driver/node.go	1	0.0%
pkg/driver/driver.go	1	0.0%

Totals
Change from base Build 312:	0.4%
Covered Lines:	597
Relevant Lines:	1176

💛 - Coveralls

leakingtapan · 2018-12-27T16:40:58Z

/ok-to-test

sreis · 2018-12-27T17:35:03Z

/retest

bertinatto

While reviewing this I wondered if this is the best approach in this case. Particularly, my comment about the idempotency made me think if a lock for each key wouldn't work better in our case. For example, for every target path there would be a lock that both stage and unstage calls would rely on.

Have you also considered other options?

Looking at other drivers, GCP [1] is using a mutex that locks the entire node server; and although this sounds too wide, I'd love to know how bad it scales exactly (maybe it's manageable). DigitalOcean [2], in the other hand, doesn't do any locking.

[1] https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/gce-pd-csi-driver/node.go
[2] https://github.com/digitalocean/csi-digitalocean/blob/master/driver/node.go

bertinatto · 2019-01-02T13:44:28Z

pkg/driver/node.go

@@ -71,30 +73,56 @@ func (d *Driver) NodeStageVolume(ctx context.Context, req *csi.NodeStageVolumeRe
 		return nil, status.Error(codes.InvalidArgument, "Volume capability not supported")
 	}

+	if ok := d.inFlight.Insert(req); !ok {


Although this prevents two identical requests from being executed at the same time, I don't think it's helpful if NodeUnstageVolume is called instead (with this target directory), right?

I think the keys should work for mutually-exclusive calls as well. Perhaps target would work better?

In this scenario, I believe we do want the NodeUnstageVolume request to fail and keep retrying.

For example, NodeStageVolume is called when a new pod is being created, the user then deletes that pod and PVC before its ready and the volume was staged, this then triggers the NodeUnstageVolume, if target is used instead of req and no error is returned the controller won't retry and the volume will never actually get unstaged.

Why are we concerned about NodeUnStageVolume here? I thought CO should guarantee unstage is called after stage for the same target path. So that these two operations should never happen concurrently.

But this also makes me wondering how should be handle the idempotency of NodeUnstageVolume itself? We currently do NOT check whether the target path exists on disk before doing unmount. This will cause issue if the same target path is unmounted twice and the second call will fail since it is already unmount.

Create a separate issue to track unstage idempotency #175

bertinatto · 2019-01-02T13:55:41Z

pkg/driver/node.go

@@ -71,30 +73,56 @@ func (d *Driver) NodeStageVolume(ctx context.Context, req *csi.NodeStageVolumeRe
 		return nil, status.Error(codes.InvalidArgument, "Volume capability not supported")
 	}

+	if ok := d.inFlight.Insert(req); !ok {
+		msg := fmt.Sprintf("request to stage volume=%+v is already in process: formatting volume", volumeID)
+		return nil, status.Error(codes.Internal, msg)


I don't think we want to return an error (specially an internal one) if a second request is issued while the first one is still running.

I think it should return the same result as the first one, otherwise it wouldn't be idempotent.

bertinatto · 2019-01-02T13:58:53Z

pkg/driver/node.go

-			notMnt = true
-		} else {
-			msg := fmt.Sprintf("could not determine if %q is valid mount point: %v", target, err)
+		msg := fmt.Sprintf("failed to check if target %+v exists: %+v", target, err)


Can you use %v for the error? I don't think it's useful to return the field names. Also, I think %q looks better for strings as well.

leakingtapan · 2019-01-03T19:45:51Z

Looking at other drivers, GCP [1] is using a mutex that locks the entire node server

From GCE PD's implementation, the global lock will be acquired by any node service operations including Publish/Unpublish and Stage/Unstage, this means when Stage takes longer than the timeout (CSI default to 15s) to format the disk, all the other node service operations will be blocked except for NodeGetCapabilities and NodeGetInfo

@bertinatto do you think node service should be blocked in this case?

dkoshkin · 2019-01-03T20:08:02Z

It takes ~6 minutes to format a 500GB volume, seems like a long time to block other operations on unrelated volumes.

dkoshkin · 2019-01-03T21:18:16Z

/retest

leakingtapan · 2019-01-03T23:07:02Z

The integration is added recently and fix is inprogress

pkg/driver/internal/inflight.go

pkg/driver/internal/inflight_test.go

leakingtapan · 2019-01-03T23:58:27Z

pkg/driver/internal/inflight_test.go

+				Parameters:         map[string]string{"foo": "bar"},
+			},
+		},
+		{


This testcase's success depends on its order in the list. I suspect it won't pass if this is run as the first case within the list. Can we make each test being independent of each other? So that it will be much easier to debug when one test is failed but its caused by other testcases.

pkg/driver/internal/inflight_test.go

leakingtapan · 2019-01-04T00:44:22Z

pkg/driver/node.go

@@ -71,30 +73,56 @@ func (d *Driver) NodeStageVolume(ctx context.Context, req *csi.NodeStageVolumeRe
 		return nil, status.Error(codes.InvalidArgument, "Volume capability not supported")
 	}

+	if ok := d.inFlight.Insert(req); !ok {


Why are we concerned about NodeUnStageVolume here? I thought CO should guarantee unstage is called after stage for the same target path. So that these two operations should never happen concurrently.

But this also makes me wondering how should be handle the idempotency of NodeUnstageVolume itself? We currently do NOT check whether the target path exists on disk before doing unmount. This will cause issue if the same target path is unmounted twice and the second call will fail since it is already unmount.

pkg/driver/node.go

dkoshkin · 2019-01-04T18:25:32Z

@leakingtapan @bertinatto I've addressed your comments, PTAL.

leakingtapan · 2019-01-07T23:10:13Z

/retest

leakingtapan · 2019-01-08T00:42:20Z

Lets add unit test to cover this after we have some unit test in node service ref: #142

leakingtapan · 2019-01-08T17:22:21Z

/lgtm
/approve

k8s-ci-robot · 2019-01-08T17:22:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: leakingtapan, sreis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [leakingtapan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…pansion-idempotent Bug 1810470:Make EBS controllerexpansion idempotent

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Dec 27, 2018

k8s-ci-robot requested review from d-nishi and leakingtapan December 27, 2018 11:36

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 27, 2018

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 27, 2018

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Dec 27, 2018

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 27, 2018

bertinatto reviewed Jan 2, 2019

View reviewed changes

dkoshkin force-pushed the sreis/idempotent branch from e202049 to 2d13232 Compare January 3, 2019 21:08

leakingtapan reviewed Jan 4, 2019

View reviewed changes

dkoshkin force-pushed the sreis/idempotent branch 3 times, most recently from e88746b to d10f882 Compare January 4, 2019 15:44

dkoshkin mentioned this pull request Jan 4, 2019

Race condition in CreateVolume #165

Closed

Add internal inflight database manager

2e7fca7

dkoshkin force-pushed the sreis/idempotent branch from d10f882 to 2e7fca7 Compare January 7, 2019 23:41

k8s-ci-robot assigned leakingtapan Jan 8, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 8, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 8, 2019

k8s-ci-robot merged commit 48556f7 into kubernetes-sigs:master Jan 8, 2019

leakingtapan mentioned this pull request Jan 8, 2019

NodeUnstageVolume Idempotency #175

Closed

dkoshkin mentioned this pull request Jan 9, 2019

Fix NodeStageVolume returning prematurely #176

Merged

dkoshkin deleted the sreis/idempotent branch January 17, 2019 20:26

sreis mentioned this pull request Jan 28, 2019

REQUEST: New membership for sreis kubernetes/org#425

Closed

6 tasks

huffmanca pushed a commit to huffmanca/aws-ebs-csi-driver that referenced this pull request Sep 3, 2020

Merge pull request kubernetes-sigs#163 from gnufied/ocp-controller-ex…

810d22c

…pansion-idempotent Bug 1810470:Make EBS controllerexpansion idempotent

jacobwolfaws mentioned this pull request May 12, 2023

Adopting Multiple CSI Driver improvements kubernetes-sigs/aws-fsx-csi-driver#309

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make NodeStageVolume idempotent #163

Make NodeStageVolume idempotent #163

sreis commented Dec 27, 2018 •

edited by leakingtapan

Loading

k8s-ci-robot commented Dec 27, 2018

k8s-ci-robot commented Dec 27, 2018

coveralls commented Dec 27, 2018

coveralls commented Dec 27, 2018 •

edited

Loading

leakingtapan commented Dec 27, 2018

sreis commented Dec 27, 2018

bertinatto left a comment

bertinatto Jan 2, 2019

dkoshkin Jan 3, 2019

leakingtapan Jan 4, 2019

leakingtapan Jan 8, 2019

bertinatto Jan 2, 2019

bertinatto Jan 2, 2019

leakingtapan commented Jan 3, 2019 •

edited

Loading

dkoshkin commented Jan 3, 2019

dkoshkin commented Jan 3, 2019

leakingtapan commented Jan 3, 2019

leakingtapan Jan 3, 2019

leakingtapan Jan 4, 2019

dkoshkin commented Jan 4, 2019

leakingtapan commented Jan 7, 2019

leakingtapan commented Jan 8, 2019

leakingtapan commented Jan 8, 2019

k8s-ci-robot commented Jan 8, 2019

Make NodeStageVolume idempotent #163

Make NodeStageVolume idempotent #163

Conversation

sreis commented Dec 27, 2018 • edited by leakingtapan Loading

k8s-ci-robot commented Dec 27, 2018

k8s-ci-robot commented Dec 27, 2018

coveralls commented Dec 27, 2018

Pull Request Test Coverage Report for Build 287

💛 - Coveralls

coveralls commented Dec 27, 2018 • edited Loading

Pull Request Test Coverage Report for Build 316

💛 - Coveralls

leakingtapan commented Dec 27, 2018

sreis commented Dec 27, 2018

bertinatto left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leakingtapan commented Jan 3, 2019 • edited Loading

dkoshkin commented Jan 3, 2019

dkoshkin commented Jan 3, 2019

leakingtapan commented Jan 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dkoshkin commented Jan 4, 2019

leakingtapan commented Jan 7, 2019

leakingtapan commented Jan 8, 2019

leakingtapan commented Jan 8, 2019

k8s-ci-robot commented Jan 8, 2019

sreis commented Dec 27, 2018 •

edited by leakingtapan

Loading

coveralls commented Dec 27, 2018 •

edited

Loading

leakingtapan commented Jan 3, 2019 •

edited

Loading