Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] blobfuse2 aks 1.25.5 fail with isHnsEnabled=true after upgrade from aks 1.24.9 #1080

Closed
aresabalo opened this issue Mar 10, 2023 · 9 comments · Fixed by #1082
Closed
Assignees
Milestone

Comments

@aresabalo
Copy link

###Describe the bug
Fail usage blobfuse2 with csi driver in aks 1.25

**To Reproduce
Steps to reproduce the behavior:

Mount volume aks 1.25 with fuse2 protocol and blog storage account with "Hierarchical namespace" activated

...
volumes:

name: repositorio-web-vol
csi:
driver: blob.csi.azure.com
readOnly: false
volumeAttributes:
secretName: blob-storage-public-secrets
containerName: web-repository
protocol: fuse2
isHnsEnabled: "true"
mountOptions: "-o allow_other --file-cache-timeout-in-seconds=120 --use-attr-cache=true --cancel-list-on-mount-seconds=60 -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120"
....

Error renaming files on volumed mounted with blobfuse2 in pod terminal:

mv file.txt file2.txt
mv: can't rename 'file.txt': I/O error

I can read files, but fail renaming files.

Which version of blobfuse was used?

fuse2

**Expected behavior
It's ok with aks 1.24.9. Fail after upgrade to aks 1.25.5

Which OS distribution and version are you using?

Kubernetes version 1.25.5
Image AKSUbuntu-2204gen2containerd-2023.02.15
image: mcr.microsoft.com/oss/kubernetes-csi/blob-csi:v1.19.0

If relevant, please share your mount command.

What was the issue encountered?

Fail usage blobfuse2 with csi driver in aks 1.25

Have you found a mitigation/solution?

No

Please share logs if available.

I0310 06:58:46.444555 5154 utils.go:75] GRPC call: /csi.v1.Node/NodePublishVolume
I0310 06:58:46.444568 5154 utils.go:76] GRPC request: {"target_path":"/var/lib/kubelet/pods/c5388686-8e24-4909-8c1d-bad732f2002b/volumes/kubernetes.iocsi/sic-test-vol/mount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":7}},"volume_context":{"containerName":"sic-test","csi.storage.k8s.io/ephemeral":"true","csi.storage.k8s.io/pod.name":"nextcloud-deploy-5b88745d7c-bmptn","csi.storage.k8s.io/pod.namespace":"test-nextcloud","csi.storage.k8s.io/pod.uid":"c5388686-8e24-4909-8c1d-bad732f2002b","csi.storage.k8s.io/serviceAccount.name":"nextcloud-account","isHnsEnabled":"true","mountOptions":"-o allow_other --file-cache-timeout-in-seconds=120 --use-attr-cache=true --cancel-list-on-mount-seconds=60 -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --log-level=LOG_WARNING --cache-size-mb=1000","protocol":"fuse2","secretName":"blob-storage-sic-secrets"},"volume_id":"csi-67fea155548f8e5b21a5aadd9325aa5a055d0d8f99b4ce516f21d24a616c719f"}
I0310 06:58:46.444711 5154 nodeserver.go:85] NodePublishVolume: ephemeral volume(csi-67fea155548f8e5b21a5aadd9325aa5a055d0d8f99b4ce516f21d24a616c719f) mount on /var/lib/kubelet/pods/c5388686-8e24-4909-8c1d-bad732f2002b/volumes/kubernetes.io
csi/sic-test-vol/mount, VolumeContext: map[containerName:sic-test csi.storage.k8s.io/ephemeral:true csi.storage.k8s.io/pod.name:nextcloud-deploy-5b88745d7c-bmptn csi.storage.k8s.io/pod.namespace:test-nextcloud csi.storage.k8s.io/pod.uid:c5388686-8e24-4909-8c1d-bad732f2002b csi.storage.k8s.io/serviceAccount.name:nextcloud-account getaccountkeyfromsecret:true isHnsEnabled:true mountOptions:-o allow_other --file-cache-timeout-in-seconds=120 --use-attr-cache=true --cancel-list-on-mount-seconds=60 -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --log-level=LOG_WARNING --cache-size-mb=1000 protocol:fuse2 secretName:blob-storage-sic-secrets secretnamespace:test-nextcloud storageaccount:]
I0310 06:58:46.445312 5154 blob.go:353] parsing volumeID(csi-67fea155548f8e5b21a5aadd9325aa5a055d0d8f99b4ce516f21d24a616c719f) return with error: error parsing volume id: "csi-67fea155548f8e5b21a5aadd9325aa5a055d0d8f99b4ce516f21d24a616c719f", should at least contain two #
I0310 06:58:46.445341 5154 blob.go:414] volumeID(csi-67fea155548f8e5b21a5aadd9325aa5a055d0d8f99b4ce516f21d24a616c719f) authEnv: []
I0310 06:58:46.478175 5154 blob.go:756] got storage account(acorunapublicshare) from secret
I0310 06:58:46.478216 5154 nodeserver.go:349] target /var/lib/kubelet/pods/c5388686-8e24-4909-8c1d-bad732f2002b/volumes/kubernetes.io~csi/sic-test-vol/mount
protocol fuse2

volumeId csi-67fea155548f8e5b21a5aadd9325aa5a055d0d8f99b4ce516f21d24a616c719f
context map[containerName:sic-test csi.storage.k8s.io/ephemeral:true csi.storage.k8s.io/pod.name:nextcloud-deploy-5b88745d7c-bmptn csi.storage.k8s.io/pod.namespace:test-nextcloud csi.storage.k8s.io/pod.uid:c5388686-8e24-4909-8c1d-bad732f2002b csi.storage.k8s.io/serviceAccount.name:nextcloud-account getaccountkeyfromsecret:true isHnsEnabled:true mountOptions:-o allow_other --file-cache-timeout-in-seconds=120 --use-attr-cache=true --cancel-list-on-mount-seconds=60 -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --log-level=LOG_WARNING --cache-size-mb=1000 protocol:fuse2 secretName:blob-storage-sic-secrets secretnamespace:test-nextcloud storageaccount:]
mountflags []
mountOptions [--use-adls=true -o allow_other --file-cache-timeout-in-seconds=120 --use-attr-cache=true --cancel-list-on-mount-seconds=60 -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --log-level=LOG_WARNING --cache-size-mb=1000 --pre-mount-validate=true --use-https=true --cancel-list-on-mount-seconds=10 --empty-dir-check=false --tmp-path=/mnt/csi-67fea155548f8e5b21a5aadd9325aa5a055d0d8f99b4ce516f21d24a616c719f --container-name=sic-test]
args /var/lib/kubelet/pods/c5388686-8e24-4909-8c1d-bad732f2002b/volumes/kubernetes.iocsi/sic-test-vol/mount --use-adls=true -o allow_other --file-cache-timeout-in-seconds=120 --use-attr-cache=true --cancel-list-on-mount-seconds=60 -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --log-level=LOG_WARNING --cache-size-mb=1000 --pre-mount-validate=true --use-https=true --cancel-list-on-mount-seconds=10 --empty-dir-check=false --tmp-path=/mnt/csi-67fea155548f8e5b21a5aadd9325aa5a055d0d8f99b4ce516f21d24a616c719f --container-name=sic-test
serverAddress acorunapublicshare.blob.core.windows.net
I0310 06:58:46.478253 5154 nodeserver.go:144] mouting using blobfuse proxy
I0310 06:58:46.478697 5154 nodeserver.go:158] calling BlobfuseProxy: MountAzureBlob function
I0310 06:58:47.027979 5154 nodeserver.go:397] volume(csi-67fea155548f8e5b21a5aadd9325aa5a055d0d8f99b4ce516f21d24a616c719f) mount on "/var/lib/kubelet/pods/c5388686-8e24-4909-8c1d-bad732f2002b/volumes/kubernetes.io
csi/sic-test-vol/mount" succeeded
I0310 06:58:47.028008 5154 utils.go:82] GRPC response: {}

@andyzhangx
Copy link
Contributor

@vibhansa-msft we are using --use-adls=true mount option in blobfuse2 mount, the mount succeeds while it's readonly, is there any breaking change in blobfuse v2?

@vibhansa-msft
Copy link
Member

Can you share blobfuse logs to root-cause this. If the mount option says its mounted as read-only then why do, we expect rename command to work?

@aresabalo
Copy link
Author

Can you share blobfuse logs to root-cause this. If the mount option says its mounted as read-only then why do, we expect rename command to work?

kubernetes deployment definition is forced to read-only = false.

I don't understand that same definition with protocol fuse (not fuse2) on AKS 1.24.9 and ubuntu 18.02 it was OK.
Now, same options, but fuse2 on AKS 1.25.5 and ubuntu 22.04 it is failing.

    volumeMounts:       
      - name: sic-test-vol
        mountPath: /sic-test-vol  
  volumes:
    - name: sic-test-vol
      csi:
        driver: blob.csi.azure.com
        **readOnly: false**
        volumeAttributes:
          secretName: blob-storage-sic-secrets 
          containerName: sic-test
          protocol: fuse2
          isHnsEnabled: "true"
          **mountOptions: "-o allow_other --file-cache-timeout-in-seconds=120 --use-attr-cache=true --cancel-list-on-mount-seconds=60 -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --log-level=LOG_WARNING --cache-size-mb=1000"**

....

Is protocolo "fusev1" deprecated for ubuntu 22.04 ??
Any workaround for solve this in aks 1.25?

Thx

@vibhansa-msft
Copy link
Member

Can you share blobfuse logs please in case where your rename is failing. Enable log_debug and if possibel sdk-trace and share the log file.

@vibhansa-msft vibhansa-msft self-assigned this Mar 10, 2023
@vibhansa-msft vibhansa-msft added this to the V2-2.0.3 milestone Mar 10, 2023
@aresabalo
Copy link
Author

I have detected a pattern when fuse2 is failing :-)

I can't create and rename files only when directory and filenames includes special characters or blanks.

Example OK:

/sic-test-vol # cd test
/sic-test-vol/test # pwd
/sic-test-vol/test
/sic-test-vol/test # echo "hello fuse2...." > fuse2.txt
/sic-test-vol/test # ls -la
total 0
-rw-r----- 1 root root 16 Mar 10 10:23 fuse2.txt
/sic-test-vol/test # cat fuse2.txt
hello fuse2....
/sic-test-vol/test # mv fuse2.txt fuse2renamed.txt
/sic-test-vol/test # ls -la
total 0
-rw-r----- 1 root root 16 Mar 10 10:23 fuse2renamed.txt
/sic-test-vol/test # ls
fuse2renamed.txt


But it's failing with special characters or blank in filenames / directory name!!!

Example KO:

/sic-test-vol/Alcaldía/test # ls -la
total 0
/sic-test-vol/Alcaldía/test # cd ..
/sic-test-vol/Alcaldía # ls -la
total 0
drwxr-x--- 2 root root 4096 Dec 19 07:53 Web EIDUS
drwxr-x--- 2 root root 4096 Mar 10 10:40 test
/sic-test-vol/Alcaldía # ls -la
total 0
drwxr-x--- 2 root root 4096 Dec 19 07:53 Web EIDUS
drwxr-x--- 2 root root 4096 Mar 10 10:40 test
/sic-test-vol/Alcaldía # cd test
/sic-test-vol/Alcaldía/test # touch josmar.txt
/sic-test-vol/Alcaldía/test # mv josmar.txt josma2r.txt
mv: can't rename 'josmar.txt': I/O error
/sic-test-vol/Alcaldía/test # ls -la
total 0
-rw-r----- 1 root root 0 Mar 10 10:41 josmar.txt
/sic-test-vol/Alcaldía/test #


Logs with debug activated:

I0310 10:20:11.195149 5154 utils.go:75] GRPC call: /csi.v1.Node/NodePublishVolume
I0310 10:20:11.195164 5154 utils.go:76] GRPC request: {"target_path":"/var/lib/kubelet/pods/7d69dcfb-55ab-4d90-ba70-8e965722d7ba/volumes/kubernetes.iocsi/sic-test-vol/mount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":7}},"volume_context":{"containerName":"sic-test","csi.storage.k8s.io/ephemeral":"true","csi.storage.k8s.io/pod.name":"nextcloud-deploy-7bf968bcf6-mhnh6","csi.storage.k8s.io/pod.namespace":"test-nextcloud","csi.storage.k8s.io/pod.uid":"7d69dcfb-55ab-4d90-ba70-8e965722d7ba","csi.storage.k8s.io/serviceAccount.name":"nextcloud-account","isHnsEnabled":"true","mountOptions":"-o allow_other --file-cache-timeout-in-seconds=120 --use-attr-cache=true --cancel-list-on-mount-seconds=60 -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --log-level=LOG_DEBUG --cache-size-mb=1000","protocol":"fuse2","secretName":"blob-storage-sic-secrets"},"volume_id":"csi-7656067a9dda955621f95dc4069cc655460b1e5394b0bfe94301d1a79b34a9a6"}
I0310 10:20:11.195301 5154 nodeserver.go:85] NodePublishVolume: ephemeral volume(csi-7656067a9dda955621f95dc4069cc655460b1e5394b0bfe94301d1a79b34a9a6) mount on /var/lib/kubelet/pods/7d69dcfb-55ab-4d90-ba70-8e965722d7ba/volumes/kubernetes.io
csi/sic-test-vol/mount, VolumeContext: map[containerName:sic-test csi.storage.k8s.io/ephemeral:true csi.storage.k8s.io/pod.name:nextcloud-deploy-7bf968bcf6-mhnh6 csi.storage.k8s.io/pod.namespace:test-nextcloud csi.storage.k8s.io/pod.uid:7d69dcfb-55ab-4d90-ba70-8e965722d7ba csi.storage.k8s.io/serviceAccount.name:nextcloud-account getaccountkeyfromsecret:true isHnsEnabled:true mountOptions:-o allow_other --file-cache-timeout-in-seconds=120 --use-attr-cache=true --cancel-list-on-mount-seconds=60 -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --log-level=LOG_DEBUG --cache-size-mb=1000 protocol:fuse2 secretName:blob-storage-sic-secrets secretnamespace:test-nextcloud storageaccount:]
I0310 10:20:11.195869 5154 blob.go:353] parsing volumeID(csi-7656067a9dda955621f95dc4069cc655460b1e5394b0bfe94301d1a79b34a9a6) return with error: error parsing volume id: "csi-7656067a9dda955621f95dc4069cc655460b1e5394b0bfe94301d1a79b34a9a6", should at least contain two #
I0310 10:20:11.195897 5154 blob.go:414] volumeID(csi-7656067a9dda955621f95dc4069cc655460b1e5394b0bfe94301d1a79b34a9a6) authEnv: []
I0310 10:20:11.201465 5154 blob.go:756] got storage account(acorunapublicshare) from secret
I0310 10:20:11.201512 5154 nodeserver.go:349] target /var/lib/kubelet/pods/7d69dcfb-55ab-4d90-ba70-8e965722d7ba/volumes/kubernetes.io~csi/sic-test-vol/mount
protocol fuse2

volumeId csi-7656067a9dda955621f95dc4069cc655460b1e5394b0bfe94301d1a79b34a9a6
context map[containerName:sic-test csi.storage.k8s.io/ephemeral:true csi.storage.k8s.io/pod.name:nextcloud-deploy-7bf968bcf6-mhnh6 csi.storage.k8s.io/pod.namespace:test-nextcloud csi.storage.k8s.io/pod.uid:7d69dcfb-55ab-4d90-ba70-8e965722d7ba csi.storage.k8s.io/serviceAccount.name:nextcloud-account getaccountkeyfromsecret:true isHnsEnabled:true mountOptions:-o allow_other --file-cache-timeout-in-seconds=120 --use-attr-cache=true --cancel-list-on-mount-seconds=60 -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --log-level=LOG_DEBUG --cache-size-mb=1000 protocol:fuse2 secretName:blob-storage-sic-secrets secretnamespace:test-nextcloud storageaccount:]
mountflags []
mountOptions [--use-adls=true -o allow_other --file-cache-timeout-in-seconds=120 --use-attr-cache=true --cancel-list-on-mount-seconds=60 -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --log-level=LOG_DEBUG --cache-size-mb=1000 --empty-dir-check=false --tmp-path=/mnt/csi-7656067a9dda955621f95dc4069cc655460b1e5394b0bfe94301d1a79b34a9a6 --container-name=sic-test --pre-mount-validate=true --use-https=true --cancel-list-on-mount-seconds=10]
args /var/lib/kubelet/pods/7d69dcfb-55ab-4d90-ba70-8e965722d7ba/volumes/kubernetes.iocsi/sic-test-vol/mount --use-adls=true -o allow_other --file-cache-timeout-in-seconds=120 --use-attr-cache=true --cancel-list-on-mount-seconds=60 -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --log-level=LOG_DEBUG --cache-size-mb=1000 --empty-dir-check=false --tmp-path=/mnt/csi-7656067a9dda955621f95dc4069cc655460b1e5394b0bfe94301d1a79b34a9a6 --container-name=sic-test --pre-mount-validate=true --use-https=true --cancel-list-on-mount-seconds=10
serverAddress acorunapublicshare.blob.core.windows.net
I0310 10:20:11.201542 5154 nodeserver.go:144] mouting using blobfuse proxy
I0310 10:20:11.201959 5154 nodeserver.go:158] calling BlobfuseProxy: MountAzureBlob function
I0310 10:20:11.761269 5154 nodeserver.go:397] volume(csi-7656067a9dda955621f95dc4069cc655460b1e5394b0bfe94301d1a79b34a9a6) mount on "/var/lib/kubelet/pods/7d69dcfb-55ab-4d90-ba70-8e965722d7ba/volumes/kubernetes.io
csi/sic-test-vol/mount" succeeded
I0310 10:20:11.761297 5154 utils.go:82] GRPC response: {}

@vibhansa-msft
Copy link
Member

This looks interesting, so your observation is directory name with special characters cause the rename to fail. Let me try to reproduce this locally and debug this further. My usual suspect here would be not sending the source URL correctly encoded in the rename api call.

@aresabalo
Copy link
Author

Mitigated with new node pool and old Kubernetes version (ubuntu 18) on AKS 1.25

az aks nodepool add --cluster-name cluster-aks-dev --resource-group XXXXXXXXX --name testpool --mode User --priority Spot --spot-max-price -1 --eviction-policy Delete --node-vm-size Standard_E2as_v5 --node-count 1 --max-pods 60 --zones 2 --kubernetes-version 1.24.9

@prabhumathiyalagan
Copy link

prabhumathiyalagan commented May 24, 2023

@vibhansa-msft I'm facing similar issue after upgrading to AKS 1.25.6, for me create directory is failing. It works fine after downgrading to 1.24.10.

I'm running with latest blob-fuse 2.0.3

@prabhumathiyalagan
Copy link

@vibhansa-msft I'm facing similar issue after upgrading to AKS 1.25.6, for me create directory is failing. It works fine after downgrading to 1.24.10.

I'm running with latest blob-fuse 2.0.3

Tried this #1080 and it is working. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants