Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix corrupted mount issue when driver daemonset restarted #117

Merged
merged 1 commit into from
Feb 26, 2020

Conversation

andyzhangx
Copy link
Member

@andyzhangx andyzhangx commented Feb 26, 2020

What type of PR is this?
/kind bug

What this PR does / why we need it:
This PR together with an k8s upstream PR(kubernetes/kubernetes#88569) would fix the corrupted mount issue when fuse based CSI driver daemonset is restarted on the node:
after daemonset is restarted, original blobfuse mount is broken, this PR would handle broken mount in both NodeStage and NodePublish

  • detect the broken mount path
  • unmount broken mount path
  • remount mount path

Which issue(s) this PR fixes:

Fixes #115

Special notes for your reviewer:
Main fix is in ensureMountPoint func:

func (d *Driver) ensureMountPoint(target string) error {
	notMnt, err := d.mounter.IsLikelyNotMountPoint(target)
	if err != nil && !os.IsNotExist(err) {
		if IsCorruptedDir(target) {
			notMnt = false
			klog.Warningf("detected corrupted mount for targetPath [%s]", target)
		} else {
			return err
		}
	}
...

main code exection logic for this PR:

I0226 03:37:10.191769       1 utils.go:112] GRPC call: /csi.v1.Node/NodeStageVolume
I0226 03:37:10.191800       1 utils.go:113] GRPC request: volume_id:"andy-1180alpha5#fuse9e6bc1063ad742c8a12#pvc-0433847e-03fd-422f-b053-5534510eb338" staging_target_path:"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > > volume_context:<key:"skuName" value:"Standard_LRS" > volume_context:<key:"storage.kubernetes.io/csiProvisionerIdentity" value:"1582552362270-8081-blobfuse.csi.azure.com" >
I0226 03:37:10.191810       1 nodeserver.go:105] NodeStageVolume: called with args {VolumeId:andy-1180alpha5#fuse9e6bc1063ad742c8a12#pvc-0433847e-03fd-422f-b053-5534510eb338 PublishContext:map[] StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount VolumeCapability:mount:<fs_type:"ext4" > access_mode:<mode:MULTI_NODE_MULTI_WRITER >  Secrets:map[] VolumeContext:map[skuName:Standard_LRS storage.kubernetes.io/csiProvisionerIdentity:1582552362270-8081-blobfuse.csi.azure.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
W0226 03:37:10.191901       1 nodeserver.go:241] detected corrupted mount for targetPath [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount]
W0226 03:37:10.191917       1 nodeserver.go:255] ReadDir /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount failed with open /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount: transport endpoint is not connected, unmount this directory
I0226 03:37:10.191925       1 mount_linux.go:209] Unmounting /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount
I0226 03:37:10.674579       1 nodeserver.go:141] target /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount
fstype ext4

volumeId andy-1180alpha5#fuse9e6bc1063ad742c8a12#pvc-0433847e-03fd-422f-b053-5534510eb338
context map[skuName:Standard_LRS storage.kubernetes.io/csiProvisionerIdentity:1582552362270-8081-blobfuse.csi.azure.com]
mountflags []
mountOptions [--use-https=true]
args /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount --tmp-path=/mnt/andy-1180alpha5#fuse9e6bc1063ad742c8a12#pvc-0433847e-03fd-422f-b053-5534510eb338 --container-name=pvc-0433847e-03fd-422f-b053-5534510eb338 --use-https=true

I0226 03:37:10.813063       1 utils.go:113] GRPC request: volume_id:"andy-1180alpha5#fuse9e6bc1063ad742c8a12#pvc-0433847e-03fd-422f-b053-5534510eb338" staging_target_path:"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount" target_path:"/var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > > volume_context:<key:"skuName" value:"Standard_LRS" > volume_context:<key:"storage.kubernetes.io/csiProvisionerIdentity" value:"1582552362270-8081-blobfuse.csi.azure.com" >
I0226 03:37:10.813074       1 nodeserver.go:39] NodePublishVolume: called with args {VolumeId:andy-1180alpha5#fuse9e6bc1063ad742c8a12#pvc-0433847e-03fd-422f-b053-5534510eb338 PublishContext:map[] StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount TargetPath:/var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount VolumeCapability:mount:<fs_type:"ext4" > access_mode:<mode:MULTI_NODE_MULTI_WRITER >  Readonly:false Secrets:map[] VolumeContext:map[skuName:Standard_LRS storage.kubernetes.io/csiProvisionerIdentity:1582552362270-8081-blobfuse.csi.azure.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
W0226 03:37:10.813124       1 nodeserver.go:241] detected corrupted mount for targetPath [/var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount]
W0226 03:37:10.813142       1 nodeserver.go:255] ReadDir /var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount failed with open /var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount: transport endpoint is not connected, unmount this directory
I0226 03:37:10.813152       1 mount_linux.go:209] Unmounting /var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount
I0226 03:37:10.819437       1 nodeserver.go:69] NodePublishVolume: mounting /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount at /var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount with mountOptions: [bind]
I0226 03:37:10.819461       1 mount_linux.go:142] Mounting cmd (mount) with arguments ([-o bind /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount /var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount])
I0226 03:37:10.821022       1 mount_linux.go:142] Mounting cmd (mount) with arguments ([-o bind,remount /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount /var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount])
I0226 03:37:10.824341       1 nodeserver.go:76] NodePublishVolume: mount /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount at /var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount successfully

Release note:

fix corrupted mount issue when driver deamonset restarted

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 26, 2020
@andyzhangx andyzhangx changed the title fix corrupted mount issue when driver deamonset restarted fix corrupted mount issue when driver daemonset restarted Feb 26, 2020
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 26, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 26, 2020
Makefile Show resolved Hide resolved
fix: corrupt mount path

doc: add two blobfuse deployments

test: fix comment

revert utils/mount depenency
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 26, 2020
@ZeroMagic
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 26, 2020
@k8s-ci-robot k8s-ci-robot merged commit 0030109 into kubernetes-sigs:master Feb 26, 2020
@hhstu
Copy link

hhstu commented Dec 22, 2023

@andyzhangx Is this support subpath?

@andyzhangx
Copy link
Member Author

@andyzhangx Is this support subpath?

@hhstu the mount path won't be broken now since we are using blobfuse-proxy by default: https://github.com/kubernetes-sigs/blob-csi-driver/tree/master/deploy/blobfuse-proxy

@hhstu
Copy link

hhstu commented Dec 23, 2023

@andyzhangx ok,Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

restart csi-blobfuse-node daemonset would make current blobfuse mount unavailable
4 participants