Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart of kubelet leads to duplicated mount entries #1007

Closed
dguendisch opened this issue Jul 31, 2021 · 10 comments · Fixed by #1019
Closed

Restart of kubelet leads to duplicated mount entries #1007

dguendisch opened this issue Jul 31, 2021 · 10 comments · Fixed by #1019
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@dguendisch
Copy link

/kind bug

What happened?
Having a pod (from a statefulset) with a pv running on some node, when I restart kubelet, the mountpoints are suddenly duplicated.
I could reproduce this 100% of the time with aws-ebs-csi-driver 1.1.0 (independent of k8s version).
No mountpoints are duplicated e.g. on gcp or azure (with their resp. csi drivers).

What you expected to happen?
No duplicated mountpoints upon kubelet restarts.

How to reproduce it (as minimally and precisely as possible)?

  1. Get a k8s cluster on aws with aws-ebs-csi-driver 1.1.0
  2. Deploy a simple sts (this one will work fine, one replica is enough)
  3. Shell into the node and list the resp. mounts:
mount|grep nvme1
/dev/nvme1n1 on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-shoot--core--d039771-7f760bd2-7738-456a-a20e-681e601a5474/globalmount type ext4 (rw,relatime)
/dev/nvme1n1 on /var/lib/kubelet/pods/00fabdc1-e746-4a25-97f5-1a0c9500f077/volumes/kubernetes.io~csi/pv-shoot--core--d039771-7f760bd2-7738-456a-a20e-681e601a5474/mount type ext4 (rw,relatime)
  1. Restart the kubelet (on my distro with systemctl restart kubelet)
  2. Shell again into the node and list the resp. mounts:
mount|grep nvme1
/dev/nvme1n1 on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-shoot--core--d039771-7f760bd2-7738-456a-a20e-681e601a5474/globalmount type ext4 (rw,relatime)
/dev/nvme1n1 on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-shoot--core--d039771-7f760bd2-7738-456a-a20e-681e601a5474/globalmount type ext4 (rw,relatime)
/dev/nvme1n1 on /var/lib/kubelet/pods/00fabdc1-e746-4a25-97f5-1a0c9500f077/volumes/kubernetes.io~csi/pv-shoot--core--d039771-7f760bd2-7738-456a-a20e-681e601a5474/mount type ext4 (rw,relatime)
/dev/nvme1n1 on /var/lib/kubelet/pods/00fabdc1-e746-4a25-97f5-1a0c9500f077/volumes/kubernetes.io~csi/pv-shoot--core--d039771-7f760bd2-7738-456a-a20e-681e601a5474/mount type ext4 (rw,relatime)

Anything else we need to know?:
These duplicated entries lead to all kind of weird problems, i.e. terminating the resp. pod will never finish because kubelet will trigger unmount, csi will successfully unmount (but this unmounts only one of the duplicated entries, the other pair remains and "leaks" therefore), kubelet triggers detach, subsequently kubelet's GetDeviceMountRefs check will fail so pod termination won't proceed => if user now force-deletes the pod, next time the volume won't attach at nvme1n1 but at nvme1n2, however kubelet now reuses the stale globalmount (which now points to a non-existing device)... ...

Environment

  • Kubernetes version (use kubectl version): 1.20.9 (but is independent, tried also with 1.21.3 with same effect)
  • Driver version: 1.1.0

cc @timuthy @vpnachev @ialidzhikov

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 31, 2021
@wongma7
Copy link
Contributor

wongma7 commented Aug 2, 2021

seems like neither our NodeStage nor NodePublish are idempotent. NodePublish not being idempotent is known issue #955, NodeStage I don't understand why yet but will repro and try.

@nirmalaagash
Copy link
Contributor

/assign nirmalaagash

@nirmalaagash
Copy link
Contributor

@dguendisch Not able to reproduce the issue. Can you give me some more details

@dguendisch
Copy link
Author

I'm happy to detail out, but from the comment I don't know which part you are not able to reproduce? So you applied all steps from my reproduction details and don't see a duplicate mount or what exactly do you mean @nirmalaagash ?

@nirmalaagash
Copy link
Contributor

@dguendisch I tried implementing all the steps that you mentioned in the issue but I am not able to see a duplicate mount.
Here's what I did.
Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:52:14Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T20:59:07Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}

Driver version:

ebs-plugin:
Container ID: containerd://ce4a78441a8921118234b649c3a1ad04c37d426b1d5b8e219aa70d1cc5902410
Image: k8s.gcr.io/provider-aws/aws-ebs-csi-driver:v1.1.0

Deployed the web.yaml with one replica.

Shelled into the node where the volume is attached.

mount|grep nvme1
/dev/nvme1n1 on /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-2c/vol-081300cf118b494cc type ext4 (rw,relatime)
/dev/nvme1n1 on /var/lib/kubelet/pods/dbc71932-95b8-46d9-8ba2-99da50fb7538/volumes/kubernetes.io~aws-ebs/pvc-a39a0fe6-930b-4633-8f17-81f192ba6f11 type ext4 (rw,relatime)

Restarted the kubelet in the same node and tried running the command. [After this step, I tried restarting the kubelet in the master too and tried running the command in the worker node.]

mount|grep nvme1
/dev/nvme1n1 on /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-2c/vol-081300cf118b494cc type ext4 (rw,relatime)
/dev/nvme1n1 on /var/lib/kubelet/pods/dbc71932-95b8-46d9-8ba2-99da50fb7538/volumes/kubernetes.io~aws-ebs/pvc-a39a0fe6-930b-4633-8f17-81f192ba6f11 type ext4 (rw,relatime)

Let me know if I missed any of the steps you performed.

@dguendisch
Copy link
Author

Are you sure that you're actually using csi? Your mount location /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts makes me think you're using intree volume provisioning (which doesn't duplicate mounts...)

@bertinatto
Copy link
Member

@dguendisch it seems like the version v1.1.0 that you're using doesn't have this fix.

Would you be able to re-check with a newer driver version, like v1.1.1?

@dguendisch
Copy link
Author

I just tried with k8s.gcr.io/provider-aws/aws-ebs-csi-driver:v1.1.1 => fully reproducible.

State before kubelet restart:

date && mount|grep nvme
Thu Aug 12 14:48:05 UTC 2021
/dev/nvme0n1p1 on / type ext4 (rw,relatime,discard)
/dev/nvme1n1 on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/globalmount type ext4 (rw,relatime)
/dev/nvme1n1 on /var/lib/kubelet/pods/632d632b-d3a4-47d0-bbfa-d6f36e9e25d8/volumes/kubernetes.io~csi/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/mount type ext4 (rw,relatime)

kubelet restart triggered:

date && systemctl restart kubelet
Thu Aug 12 14:48:25 UTC 2021

State afterwards:

date && mount|grep nvme
Thu Aug 12 14:48:53 UTC 2021
/dev/nvme0n1p1 on / type ext4 (rw,relatime,discard)
/dev/nvme1n1 on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/globalmount type ext4 (rw,relatime)
/dev/nvme1n1 on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/globalmount type ext4 (rw,relatime)
/dev/nvme1n1 on /var/lib/kubelet/pods/632d632b-d3a4-47d0-bbfa-d6f36e9e25d8/volumes/kubernetes.io~csi/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/mount type ext4 (rw,relatime)
/dev/nvme1n1 on /var/lib/kubelet/pods/632d632b-d3a4-47d0-bbfa-d6f36e9e25d8/volumes/kubernetes.io~csi/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/mount type ext4 (rw,relatime)
csi-driver logs with `v5`

I0812 14:46:33.296376       1 driver.go:72] Driver: ebs.csi.aws.com Version: v1.1.1
I0812 14:46:33.296487       1 node.go:85] [Debug] Retrieving node info from metadata service
W0812 14:46:33.544845       1 metadata.go:191] Failed to parse the outpost arn:
I0812 14:46:33.545132       1 driver.go:142] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
I0812 14:46:34.863214       1 node.go:514] NodeGetInfo: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0812 14:47:21.791255       1 node.go:498] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0812 14:47:21.795133       1 node.go:105] NodeStageVolume: called with args {VolumeId:vol-0e8d8727f56f0565b PublishContext:map[devicePath:/dev/xvdba] StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/globalmount VolumeCapability:mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER >  Secrets:map[] VolumeContext:map[storage.kubernetes.io/csiProvisionerIdentity:1628776562753-1847-ebs.csi.aws.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0812 14:47:21.795252       1 node.go:179] NodeStageVolume: find device path /dev/xvdba -> /dev/nvme1n1
I0812 14:47:21.795594       1 node.go:213] NodeStageVolume: formatting /dev/nvme1n1 and mounting at /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/globalmount with fstype ext4
I0812 14:47:21.896819       1 mount_linux.go:408] Disk "/dev/nvme1n1" appears to be unformatted, attempting to format as type: "ext4" with options: [-F -m0 /dev/nvme1n1]
I0812 14:47:22.995400       1 mount_linux.go:418] Disk successfully formatted (mkfs): ext4 - /dev/nvme1n1 /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/globalmount
I0812 14:47:23.314134       1 mount.go:121] ResizeFs.needResize - checking mounted volume /dev/nvme1n1
I0812 14:47:23.397039       1 mount.go:125] Ext size: filesystem size=1073741824, block size=4096
I0812 14:47:23.397072       1 mount.go:137] Volume /dev/nvme1n1: device size=1073741824, filesystem size=1073741824, block size=4096
I0812 14:47:23.397079       1 node.go:157] NodeStageVolume: volume="vol-0e8d8727f56f0565b" operation finished
I0812 14:47:23.397089       1 inflight.go:73] Node Service: volume="vol-0e8d8727f56f0565b" operation finished
I0812 14:47:23.403665       1 node.go:498] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0812 14:47:23.496245       1 node.go:354] NodePublishVolume: called with args {VolumeId:vol-0e8d8727f56f0565b PublishContext:map[devicePath:/dev/xvdba] StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/globalmount TargetPath:/var/lib/kubelet/pods/632d632b-d3a4-47d0-bbfa-d6f36e9e25d8/volumes/kubernetes.io~csi/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/mount VolumeCapability:mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER >  Readonly:false Secrets:map[] VolumeContext:map[storage.kubernetes.io/csiProvisionerIdentity:1628776562753-1847-ebs.csi.aws.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0812 14:47:23.496317       1 node.go:616] NodePublishVolume: creating dir /var/lib/kubelet/pods/632d632b-d3a4-47d0-bbfa-d6f36e9e25d8/volumes/kubernetes.io~csi/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/mount
I0812 14:47:23.496403       1 node.go:626] NodePublishVolume: mounting /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/globalmount at /var/lib/kubelet/pods/632d632b-d3a4-47d0-bbfa-d6f36e9e25d8/volumes/kubernetes.io~csi/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/mount with option [bind] as fstype ext4
I0812 14:47:23.695675       1 node.go:383] NodePublishVolume: volume="vol-0e8d8727f56f0565b" operation finished
I0812 14:47:23.695712       1 inflight.go:73] Node Service: volume="vol-0e8d8727f56f0565b" operation finished
I0812 14:48:07.969807       1 node.go:498] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0812 14:48:07.972014       1 node.go:435] NodeGetVolumeStats: called with args {VolumeId:vol-0e8d8727f56f0565b VolumePath:/var/lib/kubelet/pods/632d632b-d3a4-47d0-bbfa-d6f36e9e25d8/volumes/kubernetes.io~csi/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/mount StagingTargetPath: XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0812 14:48:26.076927       1 node.go:514] NodeGetInfo: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0812 14:48:27.311716       1 node.go:498] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0812 14:48:27.316264       1 node.go:105] NodeStageVolume: called with args {VolumeId:vol-0e8d8727f56f0565b PublishContext:map[devicePath:/dev/xvdba] StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/globalmount VolumeCapability:mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER >  Secrets:map[] VolumeContext:map[storage.kubernetes.io/csiProvisionerIdentity:1628776562753-1847-ebs.csi.aws.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0812 14:48:27.316372       1 node.go:179] NodeStageVolume: find device path /dev/xvdba -> /dev/nvme1n1
I0812 14:48:27.316709       1 node.go:208] NodeStageVolume: volume="vol-0e8d8727f56f0565b" already staged
I0812 14:48:27.316717       1 node.go:157] NodeStageVolume: volume="vol-0e8d8727f56f0565b" operation finished
I0812 14:48:27.316722       1 inflight.go:73] Node Service: volume="vol-0e8d8727f56f0565b" operation finished
I0812 14:48:27.322527       1 node.go:498] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0812 14:48:27.395678       1 node.go:354] NodePublishVolume: called with args {VolumeId:vol-0e8d8727f56f0565b PublishContext:map[devicePath:/dev/xvdba] StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/globalmount TargetPath:/var/lib/kubelet/pods/632d632b-d3a4-47d0-bbfa-d6f36e9e25d8/volumes/kubernetes.io~csi/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/mount VolumeCapability:mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER >  Readonly:false Secrets:map[] VolumeContext:map[storage.kubernetes.io/csiProvisionerIdentity:1628776562753-1847-ebs.csi.aws.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0812 14:48:27.395763       1 node.go:616] NodePublishVolume: creating dir /var/lib/kubelet/pods/632d632b-d3a4-47d0-bbfa-d6f36e9e25d8/volumes/kubernetes.io~csi/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/mount
I0812 14:48:27.395784       1 node.go:626] NodePublishVolume: mounting /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/globalmount at /var/lib/kubelet/pods/632d632b-d3a4-47d0-bbfa-d6f36e9e25d8/volumes/kubernetes.io~csi/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/mount with option [bind] as fstype ext4
I0812 14:48:27.597575       1 node.go:383] NodePublishVolume: volume="vol-0e8d8727f56f0565b" operation finished
I0812 14:48:27.597601       1 inflight.go:73] Node Service: volume="vol-0e8d8727f56f0565b" operation finished
I0812 14:49:25.576272       1 node.go:498] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0812 14:49:25.577279       1 node.go:435] NodeGetVolumeStats: called with args {VolumeId:vol-0e8d8727f56f0565b VolumePath:/var/lib/kubelet/pods/632d632b-d3a4-47d0-bbfa-d6f36e9e25d8/volumes/kubernetes.io~csi/pv-shoot--core--test-d7394043-ee5c-4268-b70a-e2d024ee7812/mount StagingTargetPath: XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}

@bertinatto
Copy link
Member

Regarding NodePublishVolume, we can tell from the logs that it's clearly bind-mounting the volume twice, so this should be fixed. @nirmalaagash is trying to fix it with #1019.

As for NodeStageVolume, it seems like it's early-returning and avoiding the double-mount: NodeStageVolume: volume="vol-0e8d8727f56f0565b" already staged. So it's not clear to me yet why you got 2 entries there.

@nirmalaagash
Copy link
Contributor

@dguendisch I am able to reproduce the issue now. Working on it #1019.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants