Fix NodeStageVolume returning prematurely #176

dkoshkin · 2019-01-09T23:24:34Z

Is this a bug fix or adding new feature?
Bug fix

What is this PR about? / Why do we need it?
PR #163 introduced a bug where NodeStageVolume would return prematurely.
This was only evident when using HDD volumes because the formatting takes longer than the default 15 seconds, unfortunately when I tested the change for #163 (comment) I only ran the sample app with a SSD and did not catch this bug :(.

When I ran the remaining e2e tests, all HDD volume tests were failing to delete EBS volumes because they were still being formatted.

Below log, shows that NodeStageVolume: volume="vol-0504e2cf8285d7bcc" operation is already in progress after 15 seconds(default retry period) but because it doesn't return an error the next operation NodePublishVolume is called.

I0109 21:47:06.265099       1 node.go:55] NodeStageVolume: called with args {VolumeId:vol-0504e2cf8285d7bcc PublishContext:map[devicePath:/dev/xvdbb] StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-03a7e5f0-1458-11e9-8557-2acd0797fa41/globalmount VolumeCapability:mount:<fs_type:"ext2" > access_mode:<mode:SINGLE_NODE_WRITER >  Secrets:map[] VolumeContext:map[storage.kubernetes.io/csiProvisionerIdentity:1547070289563-8081-ebs.csi.aws.com fsType:ext2] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0109 21:47:06.265186       1 node.go:77] NodeStageVolume: volume="vol-0504e2cf8285d7bcc" operation is already in progress
I0109 21:47:06.267684       1 node.go:248] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0109 21:47:06.273550       1 node.go:162] NodePublishVolume: called with args {VolumeId:vol-0504e2cf8285d7bcc PublishContext:map[devicePath:/dev/xvdbb] StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-03a7e5f0-1458-11e9-8557-2acd0797fa41/globalmount TargetPath:/var/lib/kubelet/pods/08d7000f-1458-11e9-8557-2acd0797fa41/volumes/kubernetes.io~csi/pvc-03a7e5f0-1458-11e9-8557-2acd0797fa41/mount VolumeCapability:mount:<fs_type:"ext2" > access_mode:<mode:SINGLE_NODE_WRITER >  Readonly:false Secrets:map[] VolumeContext:map[fsType:ext2 storage.kubernetes.io/csiProvisionerIdentity:1547070289563-8081-ebs.csi.aws.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0109 21:47:06.273600       1 node.go:208] NodePublishVolume: creating dir /var/lib/kubelet/pods/08d7000f-1458-11e9-8557-2acd0797fa41/volumes/kubernetes.io~csi/pvc-03a7e5f0-1458-11e9-8557-2acd0797fa41/mount
I0109 21:47:06.273632       1 node.go:213] NodePublishVolume: mounting /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-03a7e5f0-1458-11e9-8557-2acd0797fa41/globalmount at /var/lib/kubelet/pods/08d7000f-1458-11e9-8557-2acd0797fa41/volumes/kubernetes.io~csi/pvc-03a7e5f0-1458-11e9-8557-2acd0797fa41/mount
I0109 21:47:06.273666       1 mount_linux.go:146] Mounting cmd (mount) with arguments ([-t ext4 -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-03a7e5f0-1458-11e9-8557-2acd0797fa41/globalmount /var/lib/kubelet/pods/08d7000f-1458-11e9-8557-2acd0797fa41/volumes/kubernetes.io~csi/pvc-03a7e5f0-1458-11e9-8557-2acd0797fa41/mount])
I0109 21:47:06.283116       1 mount_linux.go:146] Mounting cmd (mount) with arguments ([-t ext4 -o bind,remount /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-03a7e5f0-1458-11e9-8557-2acd0797fa41/globalmount /var/lib/kubelet/pods/08d7000f-1458-11e9-8557-2acd0797fa41/volumes/kubernetes.io~csi/pvc-03a7e5f0-1458-11e9-8557-2acd0797fa41/mount])
I0109 21:48:26.498530       1 mount_linux.go:492] Disk successfully formatted (mkfs): ext2 - /dev/xvdbb /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-03a7e5f0-1458-11e9-8557-2acd0797fa41/globalmount
I0109 21:48:26.498575       1 mount_linux.go:146] Mounting cmd (mount) with arguments ([-t ext2 -o defaults /dev/xvdbb /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-03a7e5f0-1458-11e9-8557-2acd0797fa41/globalmount])
I0109 21:48:26.527167       1 node.go:81] NodeStageVolume: volume="vol-0504e2cf8285d7bcc" operation finished

What testing is done?
Locally ran all e2e tests

k8s-ci-robot · 2019-01-09T23:24:47Z

Hi @dkoshkin. Thanks for your PR.

I'm waiting for a kubernetes-sigs or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

coveralls · 2019-01-09T23:27:56Z

Pull Request Test Coverage Report for Build 324

0 of 2 (0.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 50.931%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/driver/node.go	0	2	0.0%

Totals
Change from base Build 323:	0.0%
Covered Lines:	602
Relevant Lines:	1182

💛 - Coveralls

leakingtapan · 2019-01-10T00:34:23Z

/ok-to-test

dkoshkin · 2019-01-12T00:15:41Z

It would be great to get this merged in and have a new latest image built with this change, I want to do some more manual/automated testing.

leakingtapan · 2019-01-20T20:28:28Z

Good catch.

/lgtm
/approve

k8s-ci-robot · 2019-01-20T20:28:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dkoshkin, leakingtapan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [leakingtapan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Bug 1913289: Rebase to v0.8.0 for OCP 4.7

Fix NodeStageVolume returning prematurely

2d28136

k8s-ci-robot requested review from bertinatto and d-nishi January 9, 2019 23:24

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 9, 2019

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 9, 2019

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 10, 2019

k8s-ci-robot assigned leakingtapan Jan 20, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 20, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 20, 2019

k8s-ci-robot merged commit 5668eac into kubernetes-sigs:master Jan 20, 2019

dkoshkin deleted the node-stage-volume branch January 22, 2019 12:58

jsafrane pushed a commit to jsafrane/aws-ebs-csi-driver that referenced this pull request Feb 24, 2021

Merge pull request kubernetes-sigs#176 from jsafrane/rebase-v0.8.0

6e7f163

Bug 1913289: Rebase to v0.8.0 for OCP 4.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix NodeStageVolume returning prematurely #176

Fix NodeStageVolume returning prematurely #176

dkoshkin commented Jan 9, 2019 •

edited by leakingtapan

Loading

k8s-ci-robot commented Jan 9, 2019

coveralls commented Jan 9, 2019 •

edited

Loading

leakingtapan commented Jan 10, 2019

dkoshkin commented Jan 12, 2019

leakingtapan commented Jan 20, 2019

k8s-ci-robot commented Jan 20, 2019

Fix NodeStageVolume returning prematurely #176

Fix NodeStageVolume returning prematurely #176

Conversation

dkoshkin commented Jan 9, 2019 • edited by leakingtapan Loading

k8s-ci-robot commented Jan 9, 2019

coveralls commented Jan 9, 2019 • edited Loading

Pull Request Test Coverage Report for Build 324

💛 - Coveralls

leakingtapan commented Jan 10, 2019

dkoshkin commented Jan 12, 2019

leakingtapan commented Jan 20, 2019

k8s-ci-robot commented Jan 20, 2019

dkoshkin commented Jan 9, 2019 •

edited by leakingtapan

Loading

coveralls commented Jan 9, 2019 •

edited

Loading