-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make NodeStageVolume idempotent #163
Make NodeStageVolume idempotent #163
Conversation
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Hi @sreis. Thanks for your PR. I'm waiting for a kubernetes-sigs or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Pull Request Test Coverage Report for Build 287
💛 - Coveralls |
Pull Request Test Coverage Report for Build 316
💛 - Coveralls |
/ok-to-test |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While reviewing this I wondered if this is the best approach in this case. Particularly, my comment about the idempotency made me think if a lock for each key wouldn't work better in our case. For example, for every target path there would be a lock that both stage and unstage calls would rely on.
Have you also considered other options?
Looking at other drivers, GCP [1] is using a mutex that locks the entire node server; and although this sounds too wide, I'd love to know how bad it scales exactly (maybe it's manageable). DigitalOcean [2], in the other hand, doesn't do any locking.
[1] https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/gce-pd-csi-driver/node.go
[2] https://github.com/digitalocean/csi-digitalocean/blob/master/driver/node.go
@@ -71,30 +73,56 @@ func (d *Driver) NodeStageVolume(ctx context.Context, req *csi.NodeStageVolumeRe | |||
return nil, status.Error(codes.InvalidArgument, "Volume capability not supported") | |||
} | |||
|
|||
if ok := d.inFlight.Insert(req); !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this prevents two identical requests from being executed at the same time, I don't think it's helpful if NodeUnstageVolume
is called instead (with this target
directory), right?
I think the keys should work for mutually-exclusive calls as well. Perhaps target
would work better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this scenario, I believe we do want the NodeUnstageVolume
request to fail and keep retrying.
For example, NodeStageVolume
is called when a new pod is being created, the user then deletes that pod and PVC before its ready and the volume was staged, this then triggers the NodeUnstageVolume
, if target
is used instead of req
and no error is returned the controller won't retry and the volume will never actually get unstaged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we concerned about NodeUnStageVolume
here? I thought CO should guarantee unstage is called after stage for the same target path. So that these two operations should never happen concurrently.
But this also makes me wondering how should be handle the idempotency of NodeUnstageVolume
itself? We currently do NOT check whether the target path exists on disk before doing unmount. This will cause issue if the same target path is unmounted twice and the second call will fail since it is already unmount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a separate issue to track unstage idempotency #175
pkg/driver/node.go
Outdated
@@ -71,30 +73,56 @@ func (d *Driver) NodeStageVolume(ctx context.Context, req *csi.NodeStageVolumeRe | |||
return nil, status.Error(codes.InvalidArgument, "Volume capability not supported") | |||
} | |||
|
|||
if ok := d.inFlight.Insert(req); !ok { | |||
msg := fmt.Sprintf("request to stage volume=%+v is already in process: formatting volume", volumeID) | |||
return nil, status.Error(codes.Internal, msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to return an error (specially an internal one) if a second request is issued while the first one is still running.
I think it should return the same result as the first one, otherwise it wouldn't be idempotent.
pkg/driver/node.go
Outdated
notMnt = true | ||
} else { | ||
msg := fmt.Sprintf("could not determine if %q is valid mount point: %v", target, err) | ||
msg := fmt.Sprintf("failed to check if target %+v exists: %+v", target, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use %v
for the error? I don't think it's useful to return the field names. Also, I think %q
looks better for strings as well.
From GCE PD's implementation, the global lock will be acquired by any node service operations including Publish/Unpublish and Stage/Unstage, this means when Stage takes longer than the timeout (CSI default to 15s) to format the disk, all the other node service operations will be blocked except for @bertinatto do you think node service should be blocked in this case? |
It takes ~6 minutes to format a 500GB volume, seems like a long time to block other operations on unrelated volumes. |
e202049
to
2d13232
Compare
/retest |
The integration is added recently and fix is inprogress |
Parameters: map[string]string{"foo": "bar"}, | ||
}, | ||
}, | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This testcase's success depends on its order in the list. I suspect it won't pass if this is run as the first case within the list. Can we make each test being independent of each other? So that it will be much easier to debug when one test is failed but its caused by other testcases.
@@ -71,30 +73,56 @@ func (d *Driver) NodeStageVolume(ctx context.Context, req *csi.NodeStageVolumeRe | |||
return nil, status.Error(codes.InvalidArgument, "Volume capability not supported") | |||
} | |||
|
|||
if ok := d.inFlight.Insert(req); !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we concerned about NodeUnStageVolume
here? I thought CO should guarantee unstage is called after stage for the same target path. So that these two operations should never happen concurrently.
But this also makes me wondering how should be handle the idempotency of NodeUnstageVolume
itself? We currently do NOT check whether the target path exists on disk before doing unmount. This will cause issue if the same target path is unmounted twice and the second call will fail since it is already unmount.
e88746b
to
d10f882
Compare
@leakingtapan @bertinatto I've addressed your comments, PTAL. |
/retest |
d10f882
to
2e7fca7
Compare
Lets add unit test to cover this after we have some unit test in node service ref: #142 |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: leakingtapan, sreis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…pansion-idempotent Bug 1810470:Make EBS controllerexpansion idempotent
Is this a bug fix or adding new feature?
Bug fix.
If NodeStageVolume takes longer than the default CSI timeout to format the volume the CO can issue another request. This can lead to multiple inflight format commands against the volume and unexpected behavior.
What is this PR about? / Why do we need it?
The CSI spec requires most operations to be idempotent.
Currently, some of the requests rely on AWS API to achieve this. An example is CreateVolume. If by any chance AWS API request latency increases the CO might issue the same request that will fail. This leads the CO to send more CreateVolume requests and the possibility of creating multiple disks for the same request. (NOTE: This bug (multiple disks being created) is fixed in another PR but the underlying issue is still not fixed).
Other requests might take more time to complete. An example is NodeStageVolume. If the volume was just created it will need to be formatted for the mount to succeed. If the format takes too much time the CO can send another NodeStageVolume request that will fail because the previous one is still in progress.
This PR introduces a new class to handle inflight requests. It relies on the fact that the Golang protobuf implements the Stringer interface for all structures that it generates. We can use that string to compare requests and keep a list of the ones that are being processed.
Regarding NodeStageVolume idempotency, the spec says:
We check if the volume is already mounted at staging_target_path using mount.GetDeviceNameFromMount.
What testing is done?
Added unit tests for the new internal data structure to manage in flight requests.