Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search for nvme device path even if non-nvme exists #1082

Merged
merged 5 commits into from
Oct 8, 2021

Conversation

wongma7
Copy link
Contributor

@wongma7 wongma7 commented Oct 7, 2021

Is this a bug fix or adding new feature? /bug

What is this PR about? / Why do we need it? (intended to) fix #1076 and maybe #1027.

NodeStage is not idempotent for nvme volumes. This is most apparent in kubelet 1.20 because it is guaranteed to call NodeStage again for the same volume when it restarts. Then you will see the errors in #1076 because NodeStage compares the path /dev/xvbda to the path /dev/nvme1n1, when the intention is to compare /dev/nvme1n1 with /dev/nvme1n1 and return NodeStage as no-op success.

The fix is to ensure that the path we compare is the "canonical" one, so even if we see that /dev/xvbda exists we still need to search for the "canonical" nvme representation of the device because that is what will show up in the mount table. Then we are comparing what we find with what is in the mount table and it should match.

What testing is done?

e2e In CI:

I updated the test to create nitro c5 instances so this code will get exercised

e2e on my cluster:

built and pushed an image to my registry

$ OUTPUT_TYPE=registry OS=linux TAG=oct7-1082-1 REGISTRY=x.dkr.ecr.us-west-2.amazonaws.com make image

restarted kubelet to repro the issue.

$ journalctl -ru kubelet -l | less
Oct 07 20:14:52 ip-192-168-29-9.us-west-2.compute.internal kubelet[14145]: E1007 20:14:52.364113   14145 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/ebs.csi.aws.com^vol-0f71a55a194a009b2 podName: nodeName:}" failed. No retries permitted until 2021-10-07 20:16:54.364070009 +0000 UTC m=+73017.340013110 (durationBeforeRetry 2m2s). Error: "MountVolume.MountDevice failed for volume \"pvc-b622b11e-64ad-419f-a229-7372be5314d5\" (UniqueName: \"kubernetes.io/csi/ebs.csi.aws.com^vol-0f71a55a194a009b2\") pod \"nginx-deployment-b8fdc6d6b-c4z7p\" (UID: \"aa9febfe-f8b3-4ccb-84d8-bbef4865c98f\") : rpc error: code = Internal desc = could not format \"/dev/xvdba\" and mount it at \"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b622b11e-64ad-419f-a229-7372be5314d5/globalmount\": mount failed: exit status 32\nMounting command: mount\nMounting arguments: -t ext4 -o defaults /dev/xvdba /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b622b11e-64ad-419f-a229-7372be5314d5/globalmount\nOutput: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b622b11e-64ad-419f-a229-7372be5314d5/globalmount: /dev/nvme1n1 already mounted on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b622b11e-64ad-419f-a229-7372be5314d5/globalmount.\n"

deployed my image and the errors stopped.

$ k logs (k get po -l app=ebs-csi-node --field-selector spec.nodeName=ip-192-168-29-9.us-west-2.compute.internal -n kube-system -o jsonpath='{.items[0].metadata.name}') -n kube-system ebs-plugin
I1007 20:16:08.127217       1 driver.go:72] Driver: ebs.csi.aws.com Version: v1.3.0
I1007 20:16:08.127300       1 node.go:85] [Debug] Retrieving node info from metadata service
I1007 20:16:08.127312       1 metadata.go:67] retrieving instance data from ec2 metadata
I1007 20:16:08.130739       1 metadata.go:74] ec2 metadata is available
I1007 20:16:08.137019       1 driver.go:142] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
I1007 20:16:08.576831       1 node.go:509] NodeGetInfo: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1007 20:16:54.434739       1 node.go:493] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1007 20:16:54.441380       1 node.go:105] NodeStageVolume: called with args {VolumeId:vol-0f71a55a194a009b2 PublishContext:map[devicePath:/dev/xvdba] StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b622b11e-64ad-419f-a229-7372be5314d5/globalmount VolumeCapability:mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER >  Secrets:map[] VolumeContext:map[storage.kubernetes.io/csiProvisionerIdentity:1633546016140-8081-ebs.csi.aws.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1007 20:16:54.441482       1 node.go:174] NodeStageVolume: find device path /dev/xvdba -> /dev/nvme1n1
I1007 20:16:54.441764       1 node.go:203] NodeStageVolume: volume="vol-0f71a55a194a009b2" already staged
I1007 20:16:54.441777       1 node.go:152] NodeStageVolume: volume="vol-0f71a55a194a009b2" operation finished
I1007 20:16:54.441784       1 inflight.go:73] Node Service: volume="vol-0f71a55a194a009b2" operation finished
I1007 20:16:54.447821       1 node.go:493] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1007 20:16:54.453725       1 node.go:349] NodePublishVolume: called with args {VolumeId:vol-0f71a55a194a009b2 PublishContext:map[devicePath:/dev/xvdba] StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b622b11e-64ad-419f-a229-7372be5314d5/globalmount TargetPath:/var/lib/kubelet/pods/aa9febfe-f8b3-4ccb-84d8-bbef4865c98f/volumes/kubernetes.io~csi/pvc-b622b11e-64ad-419f-a229-7372be5314d5/mount VolumeCapability:mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER >  Readonly:false Secrets:map[] VolumeContext:map[storage.kubernetes.io/csiProvisionerIdentity:1633546016140-8081-ebs.csi.aws.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1007 20:16:54.453813       1 node.go:662] NodePublishVolume: mounting /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b622b11e-64ad-419f-a229-7372be5314d5/globalmount at /var/lib/kubelet/pods/aa9febfe-f8b3-4ccb-84d8-bbef4865c98f/volumes/kubernetes.io~csi/pvc-b622b11e-64ad-419f-a229-7372be5314d5/mount with option [bind] as fstype ext4
I1007 20:16:54.453831       1 node.go:633] NodePublishVolume: Target path "/var/lib/kubelet/pods/aa9febfe-f8b3-4ccb-84d8-bbef4865c98f/volumes/kubernetes.io~csi/pvc-b622b11e-64ad-419f-a229-7372be5314d5/mount" is already mounted
I1007 20:16:54.453843       1 node.go:378] NodePublishVolume: volume="vol-0f71a55a194a009b2" operation finished
I1007 20:16:54.453849       1 inflight.go:73] Node Service: volume="vol-0f71a55a194a009b2" operation finished
$ journalctl -ru kubelet -l | less
Oct 07 20:17:46 ip-192-168-29-9.us-west-2.compute.internal kubelet[14145]: I1007 20:17:46.290485   14145 operation_generator.go:672] MountVolume.SetUp succeeded for volume "pvc-b622b11e-64ad-419f-a229-7372be5314d5" (UniqueName: "kubernetes.io/csi/ebs.csi.aws.com^vol-0f71a55a194a009b2") pod "nginx-deployment-b8fdc6d6b-c4z7p" (UID: "aa9febfe-f8b3-4ccb-84d8-bbef4865c98f")
Oct 07 20:17:46 ip-192-168-29-9.us-west-2.compute.internal kubelet[14145]: I1007 20:17:46.290461   14145 csi_mounter.go:282] kubernetes.io/csi: mounter.SetUp successfully requested NodePublish [/var/lib/kubelet/pods/aa9febfe-f8b3-4ccb-84d8-bbef4865c98f/volumes/kubernetes.io~csi/pvc-b622b11e-64ad-419f-a229-7372be5314d5/mount]
Oct 07 20:17:46 ip-192-168-29-9.us-west-2.compute.internal kubelet[14145]: I1007 20:17:46.290430   14145 csi_mounter.go:279] kubernetes.io/csi: mounter.SetupAt fsGroup [2002] applied successfully to vol-0f71a55a194a009b2

unit tests:

see latest 2 commits

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 7, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wongma7

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 7, 2021
@wongma7 wongma7 force-pushed the nodestageidempotent branch 2 times, most recently from 25dbf8e to 01d8c8d Compare October 7, 2021 00:53
@wongma7 wongma7 force-pushed the nodestageidempotent branch from 01d8c8d to f3db9e8 Compare October 7, 2021 20:20
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 8, 2021
@wongma7 wongma7 force-pushed the nodestageidempotent branch from 537abf2 to 35e6aa8 Compare October 8, 2021 00:45
@wongma7 wongma7 force-pushed the nodestageidempotent branch from 35e6aa8 to 9ea0250 Compare October 8, 2021 00:46
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 8, 2021
@wongma7 wongma7 changed the title WIP: Search for nvme device path even if non-nvme exists Search for nvme device path even if non-nvme exists Oct 8, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 8, 2021
@wongma7
Copy link
Contributor Author

wongma7 commented Oct 8, 2021

Also verified that if a pod is stuck Terminating because NodeStage keeps failing after the kubelet restart, upgrading to driver with this fix unsticks it.

I1008 23:05:31.868350       1 node.go:205] NodeStageVolume: volume="vol-0f71a55a194a009b2" already staged
I1008 23:05:31.868363       1 node.go:154] NodeStageVolume: volume="vol-0f71a55a194a009b2" operation finished
I1008 23:05:31.868371       1 inflight.go:73] Node Service: volume="vol-0f71a55a194a009b2" operation finished
I1008 23:05:31.880780       1 node.go:495] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1008 23:05:31.887248       1 node.go:351] NodePublishVolume: called with args {VolumeId:vol-0f71a55a194a009b2 PublishContext:map[devicePath:/dev/xvdba] StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b622b11e-64ad-419f-a229-7372be5314d5/globalmount TargetPath:/var/lib/kubelet/pods/aa9febfe-f8b3-4ccb-84d8-bbef4865c98f/volumes/kubernetes.io~csi/pvc-b622b11e-64ad-419f-a229-7372be5314d5/mount VolumeCapability:mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER >  Readonly:false Secrets:map[] VolumeContext:map[storage.kubernetes.io/csiProvisionerIdentity:1633546016140-8081-ebs.csi.aws.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1008 23:05:31.887312       1 node.go:664] NodePublishVolume: mounting /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b622b11e-64ad-419f-a229-7372be5314d5/globalmount at /var/lib/kubelet/pods/aa9febfe-f8b3-4ccb-84d8-bbef4865c98f/volumes/kubernetes.io~csi/pvc-b622b11e-64ad-419f-a229-7372be5314d5/mount with option [bind] as fstype ext4
I1008 23:05:31.887335       1 node.go:635] NodePublishVolume: Target path "/var/lib/kubelet/pods/aa9febfe-f8b3-4ccb-84d8-bbef4865c98f/volumes/kubernetes.io~csi/pvc-b622b11e-64ad-419f-a229-7372be5314d5/mount" is already mounted
I1008 23:05:31.887348       1 node.go:380] NodePublishVolume: volume="vol-0f71a55a194a009b2" operation finished
I1008 23:05:31.887354       1 inflight.go:73] Node Service: volume="vol-0f71a55a194a009b2" operation finished
I1008 23:06:22.004637       1 node.go:404] NodeUnpublishVolume: called with args {VolumeId:vol-0f71a55a194a009b2 TargetPath:/var/lib/kubelet/pods/aa9febfe-f8b3-4ccb-84d8-bbef4865c98f/volumes/kubernetes.io~csi/pvc-b622b11e-64ad-419f-a229-7372be5314d5/mount XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1008 23:06:22.004679       1 node.go:422] NodeUnpublishVolume: unmounting /var/lib/kubelet/pods/aa9febfe-f8b3-4ccb-84d8-bbef4865c98f/volumes/kubernetes.io~csi/pvc-b622b11e-64ad-419f-a229-7372be5314d5/mount
I1008 23:06:22.007293       1 node.go:418] NodeUnPublishVolume: volume="vol-0f71a55a194a009b2" operation finished
I1008 23:06:22.007310       1 inflight.go:73] Node Service: volume="vol-0f71a55a194a009b2" operation finished
I1008 23:06:22.105174       1 node.go:495] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1008 23:06:22.105895       1 node.go:233] NodeUnstageVolume: called with args {VolumeId:vol-0f71a55a194a009b2 StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b622b11e-64ad-419f-a229-7372be5314d5/globalmount XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1008 23:06:22.106150       1 node.go:273] NodeUnstageVolume: unmounting /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b622b11e-64ad-419f-a229-7372be5314d5/globalmount
I1008 23:06:23.583979       1 node.go:248] NodeUnStageVolume: volume="vol-0f71a55a194a009b2" operation finished
I1008 23:06:23.584061       1 inflight.go:73] Node Service: volume="vol-0f71a55a194a009b2" operation finished

Copy link
Contributor

@nirmalaagash nirmalaagash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 8, 2021
@k8s-ci-robot k8s-ci-robot merged commit ad8df8f into kubernetes-sigs:master Oct 8, 2021
@nirmalaagash
Copy link
Contributor

Also verified that if a pod is stuck Terminating because NodeStage keeps failing after the kubelet restart, upgrading to driver with this fix unsticks it.

@wongma7 Can you close the issue #1027 if this fixes the similar issue of that.

@PaulFurtado
Copy link
Contributor

This code does not actually resolve the issue for me.

@wongma7 can you paste the udev rules on your system that create the /dev/disk/by-id symlinks? On my system, using the udev rules from amazonlinux 2, I end up with symlinks like:

/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol01ce3998bad07240b-ns-1

however, the code is searching for:

/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol01ce3998bad07240b

I'm really not sure why the code bothers to dive into that logic however, it seems like at the point where it determines that the /dev/xvdXX path exists, it can check if it's a symlink, resolve it, and return the /dev/nvmeXX path right there without forcing the fallback to the /dev/disk/by-id based lookup.

Another solution which would potentially be more reliable than the path games is if the code that compares the mount device to the symlink actually called os.Stat on each side and compared them with os.SameFile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NodeStage might not always be idempotent for nitro instances
4 participants