Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make NodeStageVolume idempotent #163

Merged
merged 1 commit into from
Jan 8, 2019

Conversation

sreis
Copy link
Contributor

@sreis sreis commented Dec 27, 2018

Is this a bug fix or adding new feature?

Bug fix.

If NodeStageVolume takes longer than the default CSI timeout to format the volume the CO can issue another request. This can lead to multiple inflight format commands against the volume and unexpected behavior.

What is this PR about? / Why do we need it?

The CSI spec requires most operations to be idempotent.

Currently, some of the requests rely on AWS API to achieve this. An example is CreateVolume. If by any chance AWS API request latency increases the CO might issue the same request that will fail. This leads the CO to send more CreateVolume requests and the possibility of creating multiple disks for the same request. (NOTE: This bug (multiple disks being created) is fixed in another PR but the underlying issue is still not fixed).

Other requests might take more time to complete. An example is NodeStageVolume. If the volume was just created it will need to be formatted for the mount to succeed. If the format takes too much time the CO can send another NodeStageVolume request that will fail because the previous one is still in progress.

This PR introduces a new class to handle inflight requests. It relies on the fact that the Golang protobuf implements the Stringer interface for all structures that it generates. We can use that string to compare requests and keep a list of the ones that are being processed.

Regarding NodeStageVolume idempotency, the spec says:

This operation MUST be idempotent. If the volume corresponding to the volume_id is
already staged to the staging_target_path, and is identical to the specified 
volume_capability the Plugin MUST reply 0 OK.

We check if the volume is already mounted at staging_target_path using mount.GetDeviceNameFromMount.

What testing is done?

Added unit tests for the new internal data structure to manage in flight requests.

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please email the CNCF helpdesk: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Dec 27, 2018
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 27, 2018
@k8s-ci-robot
Copy link
Contributor

Hi @sreis. Thanks for your PR.

I'm waiting for a kubernetes-sigs or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 27, 2018
@coveralls
Copy link

Pull Request Test Coverage Report for Build 287

  • 23 of 56 (41.07%) changed or added relevant lines in 3 files are covered.
  • 2 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.07%) to 50.424%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/driver/driver.go 0 1 0.0%
pkg/driver/node.go 0 32 0.0%
Files with Coverage Reduction New Missed Lines %
pkg/driver/node.go 1 0.0%
pkg/driver/driver.go 1 0.0%
Totals Coverage Status
Change from base Build 284: 0.07%
Covered Lines: 595
Relevant Lines: 1180

💛 - Coveralls

@coveralls
Copy link

coveralls commented Dec 27, 2018

Pull Request Test Coverage Report for Build 316

  • 25 of 52 (48.08%) changed or added relevant lines in 4 files are covered.
  • 2 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.4%) to 50.765%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/driver/driver.go 0 1 0.0%
pkg/driver/node.go 0 26 0.0%
Files with Coverage Reduction New Missed Lines %
pkg/driver/node.go 1 0.0%
pkg/driver/driver.go 1 0.0%
Totals Coverage Status
Change from base Build 312: 0.4%
Covered Lines: 597
Relevant Lines: 1176

💛 - Coveralls

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Dec 27, 2018
@leakingtapan
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 27, 2018
@sreis
Copy link
Contributor Author

sreis commented Dec 27, 2018

/retest

Copy link
Member

@bertinatto bertinatto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reviewing this I wondered if this is the best approach in this case. Particularly, my comment about the idempotency made me think if a lock for each key wouldn't work better in our case. For example, for every target path there would be a lock that both stage and unstage calls would rely on.

Have you also considered other options?

Looking at other drivers, GCP [1] is using a mutex that locks the entire node server; and although this sounds too wide, I'd love to know how bad it scales exactly (maybe it's manageable). DigitalOcean [2], in the other hand, doesn't do any locking.

[1] https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/gce-pd-csi-driver/node.go
[2] https://github.com/digitalocean/csi-digitalocean/blob/master/driver/node.go

@@ -71,30 +73,56 @@ func (d *Driver) NodeStageVolume(ctx context.Context, req *csi.NodeStageVolumeRe
return nil, status.Error(codes.InvalidArgument, "Volume capability not supported")
}

if ok := d.inFlight.Insert(req); !ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this prevents two identical requests from being executed at the same time, I don't think it's helpful if NodeUnstageVolume is called instead (with this target directory), right?

I think the keys should work for mutually-exclusive calls as well. Perhaps target would work better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this scenario, I believe we do want the NodeUnstageVolume request to fail and keep retrying.

For example, NodeStageVolume is called when a new pod is being created, the user then deletes that pod and PVC before its ready and the volume was staged, this then triggers the NodeUnstageVolume, if target is used instead of req and no error is returned the controller won't retry and the volume will never actually get unstaged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we concerned about NodeUnStageVolume here? I thought CO should guarantee unstage is called after stage for the same target path. So that these two operations should never happen concurrently.

But this also makes me wondering how should be handle the idempotency of NodeUnstageVolume itself? We currently do NOT check whether the target path exists on disk before doing unmount. This will cause issue if the same target path is unmounted twice and the second call will fail since it is already unmount.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a separate issue to track unstage idempotency #175

@@ -71,30 +73,56 @@ func (d *Driver) NodeStageVolume(ctx context.Context, req *csi.NodeStageVolumeRe
return nil, status.Error(codes.InvalidArgument, "Volume capability not supported")
}

if ok := d.inFlight.Insert(req); !ok {
msg := fmt.Sprintf("request to stage volume=%+v is already in process: formatting volume", volumeID)
return nil, status.Error(codes.Internal, msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to return an error (specially an internal one) if a second request is issued while the first one is still running.

I think it should return the same result as the first one, otherwise it wouldn't be idempotent.

notMnt = true
} else {
msg := fmt.Sprintf("could not determine if %q is valid mount point: %v", target, err)
msg := fmt.Sprintf("failed to check if target %+v exists: %+v", target, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use %v for the error? I don't think it's useful to return the field names. Also, I think %q looks better for strings as well.

@leakingtapan
Copy link
Contributor

leakingtapan commented Jan 3, 2019

Looking at other drivers, GCP [1] is using a mutex that locks the entire node server

From GCE PD's implementation, the global lock will be acquired by any node service operations including Publish/Unpublish and Stage/Unstage, this means when Stage takes longer than the timeout (CSI default to 15s) to format the disk, all the other node service operations will be blocked except for NodeGetCapabilities and NodeGetInfo

@bertinatto do you think node service should be blocked in this case?

@dkoshkin
Copy link
Contributor

dkoshkin commented Jan 3, 2019

It takes ~6 minutes to format a 500GB volume, seems like a long time to block other operations on unrelated volumes.

@dkoshkin
Copy link
Contributor

dkoshkin commented Jan 3, 2019

/retest

@leakingtapan
Copy link
Contributor

The integration is added recently and fix is inprogress

Parameters: map[string]string{"foo": "bar"},
},
},
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This testcase's success depends on its order in the list. I suspect it won't pass if this is run as the first case within the list. Can we make each test being independent of each other? So that it will be much easier to debug when one test is failed but its caused by other testcases.

@@ -71,30 +73,56 @@ func (d *Driver) NodeStageVolume(ctx context.Context, req *csi.NodeStageVolumeRe
return nil, status.Error(codes.InvalidArgument, "Volume capability not supported")
}

if ok := d.inFlight.Insert(req); !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we concerned about NodeUnStageVolume here? I thought CO should guarantee unstage is called after stage for the same target path. So that these two operations should never happen concurrently.

But this also makes me wondering how should be handle the idempotency of NodeUnstageVolume itself? We currently do NOT check whether the target path exists on disk before doing unmount. This will cause issue if the same target path is unmounted twice and the second call will fail since it is already unmount.

@dkoshkin dkoshkin force-pushed the sreis/idempotent branch 3 times, most recently from e88746b to d10f882 Compare January 4, 2019 15:44
@dkoshkin
Copy link
Contributor

dkoshkin commented Jan 4, 2019

@leakingtapan @bertinatto I've addressed your comments, PTAL.

@leakingtapan
Copy link
Contributor

/retest

@leakingtapan
Copy link
Contributor

Lets add unit test to cover this after we have some unit test in node service ref: #142

@leakingtapan
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 8, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: leakingtapan, sreis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 8, 2019
@k8s-ci-robot k8s-ci-robot merged commit 48556f7 into kubernetes-sigs:master Jan 8, 2019
@dkoshkin dkoshkin deleted the sreis/idempotent branch January 17, 2019 20:26
huffmanca pushed a commit to huffmanca/aws-ebs-csi-driver that referenced this pull request Sep 3, 2020
…pansion-idempotent

Bug 1810470:Make EBS controllerexpansion idempotent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants