Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete volume requests are sent twice with new provisioner version v5.0.1 #1235

Closed
shrutinipane opened this issue Jun 17, 2024 · 51 comments · Fixed by kubernetes-sigs/sig-storage-lib-external-provisioner#173 or #1250

Comments

@shrutinipane
Copy link

What happened:
When we try to delete volume using new provisioner version v5.0.1, we notice 2 volume deletion requests in the logs. As a result of which PV remains in Terminating state.
Following are the driver logs:

I0611 15:22:59.944368       1 controller.go:1258] "shouldDelete" PV="pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52"
I0611 15:22:59.944376       1 controller.go:1288] "shouldDelete is true" PV="pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52"
I0611 15:22:59.944381       1 controller.go:1132] "shouldDelete" PV="pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52"
I0611 15:22:59.944389       1 controller.go:1548] "Started" PV="pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52"
I0611 15:22:59.944401       1 controller.go:1312] volume pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52 does not need any deletion secrets
I0611 15:22:59.944454       1 connection.go:264] "GRPC call" method="/csi.v1.Controller/DeleteVolume" request="{\"volume_id\":\"0;0;15242247612712203312;385E110A:66420EDE;;;/mnt/local-fs/primary-fileset-local-fs-15242247612712203312/.volumes/pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52\"}"
I0611 15:23:04.259756       1 controller.go:1563] "Volume deleted" PV="pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52"
I0611 15:23:04.285156       1 controller.go:1607] "PersistentVolume deleted succeeded" PV="pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52"
I0611 15:23:04.285352       1 controller.go:1171] "handleProtectionFinalizer" PV="&PersistentVolume{ObjectMeta:{pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52    52c56854-0f57-4d2a-b242-ebb14dba958e 17734168 0 2024-06-11 15:13:31 +0000 UTC 2024-06-11 15:23:04 +0000 UTC 0xc00056c200 map[] map[pv.kubernetes.io/provisioned-by:spectrumscale.csi.ibm.com volume.kubernetes.io/provisioner-deletion-secret-name: volume.kubernetes.io/provisioner-deletion-secret-namespace:] [] [external-provisioner.volume.kubernetes.io/finalizer kubernetes.io/pv-protection] [{csi-provisioner Update v1 2024-06-11 15:13:31 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:pv.kubernetes.io/provisioned-by\":{},\"f:volume.kubernetes.io/provisioner-deletion-secret-name\":{},\"f:volume.kubernetes.io/provisioner-deletion-secret-namespace\":{}},\"f:finalizers\":{\".\":{},\"v:\\\"external-provisioner.volume.kubernetes.io/finalizer\\\"\":{}}},\"f:spec\":{\"f:accessModes\":{},\"f:capacity\":{\".\":{},\"f:storage\":{}},\"f:claimRef\":{\".\":{},\"f:apiVersion\":{},\"f:kind\":{},\"f:name\":{},\"f:namespace\":{},\"f:resourceVersion\":{},\"f:uid\":{}},\"f:csi\":{\".\":{},\"f:driver\":{},\"f:fsType\":{},\"f:volumeAttributes\":{\".\":{},\"f:csi.storage.k8s.io/pv/name\":{},\"f:csi.storage.k8s.io/pvc/name\":{},\"f:csi.storage.k8s.io/pvc/namespace\":{},\"f:storage.kubernetes.io/csiProvisionerIdentity\":{},\"f:volBackendFs\":{},\"f:volDirBasePath\":{}},\"f:volumeHandle\":{}},\"f:persistentVolumeReclaimPolicy\":{},\"f:storageClassName\":{},\"f:volumeMode\":{}}} } {kube-controller-manager Update v1 2024-06-11 15:22:59 +0000 UTC FieldsV1 {\"f:status\":{\"f:phase\":{}}} status}]},Spec:PersistentVolumeSpec{Capacity:ResourceList{storage: {{1073741824 0} {<nil>} 1Gi BinarySI},},PersistentVolumeSource:PersistentVolumeSource{GCEPersistentDisk:nil,AWSElasticBlockStore:nil,HostPath:nil,Glusterfs:nil,NFS:nil,RBD:nil,ISCSI:nil,Cinder:nil,CephFS:nil,FC:nil,Flocker:nil,FlexVolume:nil,AzureFile:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Local:nil,StorageOS:nil,CSI:&CSIPersistentVolumeSource{Driver:spectrumscale.csi.ibm.com,VolumeHandle:0;0;15242247612712203312;385E110A:66420EDE;;;/mnt/local-fs/primary-fileset-local-fs-15242247612712203312/.volumes/pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52,ReadOnly:false,FSType:gpfs,VolumeAttributes:map[string]string{csi.storage.k8s.io/pv/name: pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52,csi.storage.k8s.io/pvc/name: pvc-gfugq,csi.storage.k8s.io/pvc/namespace: test-driver-dynamic-pass-9-1718118761,storage.kubernetes.io/csiProvisionerIdentity: 1718094082648-2663-spectrumscale.csi.ibm.com,volBackendFs: local-fs,volDirBasePath: LW,},ControllerPublishSecretRef:nil,NodeStageSecretRef:nil,NodePublishSecretRef:nil,ControllerExpandSecretRef:nil,NodeExpandSecretRef:nil,},},AccessModes:[ReadWriteOnce],ClaimRef:&ObjectReference{Kind:PersistentVolumeClaim,Namespace:test-driver-dynamic-pass-9-1718118761,Name:pvc-gfugq,UID:c1396a54-9c2b-473f-82f1-6af2253bdf52,APIVersion:v1,ResourceVersion:17729414,FieldPath:,},PersistentVolumeReclaimPolicy:Delete,StorageClassName:sc-tvxzfffsfe,MountOptions:[],VolumeMode:*Filesystem,NodeAffinity:nil,VolumeAttributesClassName:nil,},Status:PersistentVolumeStatus{Phase:Released,Message:,Reason:,LastPhaseTransitionTime:<nil>,},}"
I0611 15:23:04.285367       1 controller.go:1258] "shouldDelete" PV="pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52"
I0611 15:23:04.285377       1 controller.go:1288] "shouldDelete is true" PV="pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52"
I0611 15:23:04.285384       1 controller.go:1132] "shouldDelete" PV="pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52"
I0611 15:23:04.285391       1 controller.go:1548] "Started" PV="pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52"
I0611 15:23:04.285403       1 controller.go:1312] volume pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52 does not need any deletion secrets
I0611 15:23:04.285454       1 connection.go:264] "GRPC call" method="/csi.v1.Controller/DeleteVolume" request="{\"volume_id\":\"0;0;15242247612712203312;385E110A:66420EDE;;;/mnt/local-fs/primary-fileset-local-fs-15242247612712203312/.volumes/pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52\"}"
I0611 15:23:06.468538       1 connection.go:270] "GRPC response" response="{}" err="rpc error: code = Internal desc = unable to Delete Dir using FS [local-fs] Relative SymLink [primary-fileset-local-fs-15242247612712203312/.volumes/pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52]. Error [unable to delete dir primary-fileset-local-fs-15242247612712203312/.volumes/pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52:[EFSSG0264C The path /mnt/local-fs/primary-fileset-local-fs-15242247612712203312/.volumes/pvc-c1396a54-9c2b-473f-82f1-6af2253bdf52 does not exist.]]"

What you expected to happen:
PV should be deleted successfully.

How to reproduce it:

  1. Create a storageclass and PVC
  2. Delete PVC
  3. Notice the status of PV

Environment:

  • Driver version:
  • Kubernetes version (use kubectl version): v1.29.3
  • OS (e.g. from /etc/os-release): RHEL 8.8
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@yarinm
Copy link

yarinm commented Jun 18, 2024

We also see similar behavior with pre-provisioned disks, the csi driver after the upgrade started to delete them after with the PVC and PV were deleted which is NOT expected.

The pre-provisioned disks should remain untouched

@shrutinipane
Copy link
Author

Another instance where pre-provisioned PVC was deleted as per the logs. But, from the cluster it is still showing Terminating.
Below are the provisioner logs:

I0620 10:50:11.805561       1 controller.go:1132] "shouldDelete" PV="pv-fbeqn"
I0620 10:50:11.805575       1 controller.go:1548] "Started" PV="pv-fbeqn"
I0620 10:50:11.805665       1 connection.go:264] "GRPC call" method="/csi.v1.Controller/DeleteVolume" request="{\"volume_id\":\"453210684566980716;34690B0A:6606C9E1;path=/ibm/fs1/dir-rgupwoxrg\"}"
I0620 10:50:12.894422       1 leaderelection.go:281] successfully renewed lease ibm-spectrum-scale-csi-driver/spectrumscale-csi-ibm-com
I0620 10:50:21.453285       1 connection.go:270] "GRPC response" response="{}" err=null
I0620 10:50:21.453335       1 controller.go:1563] "Volume deleted" PV="pv-fbeqn"
I0620 10:50:21.528417       1 controller.go:1607] "PersistentVolume deleted succeeded" PV="pv-fbeqn"
I0620 10:50:21.528709       1 controller.go:1171] "handleProtectionFinalizer" PV="&PersistentVolume{ObjectMeta:{pv-fbeqn    2d464b99-70d9-4001-b4bd-058d4a51dcb4 19599924 0 2024-06-20 10:47:32 +0000 UTC 2024-06-20 10:50:21 +0000 UTC 0xc0009efa80 map[author:kubetest] map[pv.kubernetes.io/bound-by-controller:yes] [] [external-provisioner.volume.kubernetes.io/finalizer] [{OpenAPI-Generator Update v1 2024-06-20 10:47:32 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:labels\":{\".\":{},\"f:author\":{}}},\"f:spec\":{\"f:accessModes\":{},\"f:capacity\":{\".\":{},\"f:storage\":{}},\"f:csi\":{\".\":{},\"f:driver\":{},\"f:volumeHandle\":{}},\"f:persistentVolumeReclaimPolicy\":{},\"f:storageClassName\":{},\"f:volumeMode\":{}}} } {csi-provisioner Update v1 2024-06-20 10:47:32 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:finalizers\":{\"v:\\\"external-provisioner.volume.kubernetes.io/finalizer\\\"\":{}}}} } {kube-controller-manager Update v1 2024-06-20 10:47:32 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:pv.kubernetes.io/bound-by-controller\":{}}},\"f:spec\":{\"f:claimRef\":{\".\":{},\"f:apiVersion\":{},\"f:kind\":{},\"f:name\":{},\"f:namespace\":{},\"f:resourceVersion\":{},\"f:uid\":{}}}} } {kube-controller-manager Update v1 2024-06-20 10:50:11 +0000 UTC FieldsV1 {\"f:status\":{\"f:message\":{},\"f:phase\":{}}} status}]},Spec:PersistentVolumeSpec{Capacity:ResourceList{storage: {{8589934592 0} {<nil>}  BinarySI},},PersistentVolumeSource:PersistentVolumeSource{GCEPersistentDisk:nil,AWSElasticBlockStore:nil,HostPath:nil,Glusterfs:nil,NFS:nil,RBD:nil,ISCSI:nil,Cinder:nil,CephFS:nil,FC:nil,Flocker:nil,FlexVolume:nil,AzureFile:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Local:nil,StorageOS:nil,CSI:&CSIPersistentVolumeSource{Driver:spectrumscale.csi.ibm.com,VolumeHandle:453210684566980716;34690B0A:6606C9E1;path=/ibm/fs1/dir-rgupwoxrg,ReadOnly:false,FSType:,VolumeAttributes:map[string]string{},ControllerPublishSecretRef:nil,NodeStageSecretRef:nil,NodePublishSecretRef:nil,ControllerExpandSecretRef:nil,NodeExpandSecretRef:nil,},},AccessModes:[ReadWriteMany],ClaimRef:&ObjectReference{Kind:PersistentVolumeClaim,Namespace:test-driver-static-sc-14-1718880431,Name:pvc-kqdmmbm,UID:f3821261-6e7a-4c91-b3e5-59871d7dfc2a,APIVersion:v1,ResourceVersion:19599399,FieldPath:,},PersistentVolumeReclaimPolicy:Delete,StorageClassName:sc-wdijv,MountOptions:[],VolumeMode:*Filesystem,NodeAffinity:nil,VolumeAttributesClassName:nil,},Status:PersistentVolumeStatus{Phase:Failed,Message:error getting deleter volume plugin for volume \"pv-fbeqn\": no deletable volume plugin matched,Reason:,LastPhaseTransitionTime:2024-06-20 10:50:11 +0000 UTC,},}"

@deepakkinni
Copy link
Member

We also see similar behavior with pre-provisioned disks, the csi driver after the upgrade started to delete them after with the PVC and PV were deleted which is NOT expected.

The pre-provisioned disks should remain untouched

This is as per the KEP design https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2644-honor-pv-reclaim-policy#csi-driver-volumes, essentially, the new finalizer will be added to even statically provisioned volumes if they are in a Bound state, and deleting the PVC, will delete the underlying volume, honoring the reclaim policy specified.

@deepakkinni
Copy link
Member

@shrutinipane This is pretty odd, based on https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/master/controller/controller.go#L1545

Once the deleteVolumeOperation is triggered, it removes all the finalizers if necessary. Only after that would it log the message "PersistentVolume deleted succeeded"
Based on your log messages, it seems like the syncVolume https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/master/controller/controller.go#L1114 is triggered after that, and oddly enough, the PV shows up with the finalizer still present, I don't even see deletion time stamp set. I am not sure how this works, any clues on this @jsafrane @xing-yang @carlory ?

@shrutinipane
Copy link
Author

We have following setting in provisioner clusterrole:

[root@csi-kube129-1-x-master ~]# oc get clusterrole ibm-spectrum-scale-csi-provisioner -oyaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: "2024-06-21T05:49:56Z"
  labels:
    app.kubernetes.io/instance: ibm-spectrum-scale-csi-operator
    app.kubernetes.io/managed-by: ibm-spectrum-scale-csi-operator
    app.kubernetes.io/name: ibm-spectrum-scale-csi-operator
    product: ibm-spectrum-scale-csi
    release: ibm-spectrum-scale-csi-operator
  name: ibm-spectrum-scale-csi-provisioner
  resourceVersion: "19799980"
  uid: 4fcd148f-ed58-40c1-99b2-26ffbc2c4d8d
rules:
- apiGroups:
  - ""
  resources:
  - persistentvolumes
  verbs:
  - get
  - list
  - watch
  - create
  - delete
  - patch

Still while deleting statically provisioned PV, it remains in Terminating state. Following are the logs in provisioner sidecar:

I0621 05:58:32.717669       1 controller.go:1258] "shouldDelete" PV="pv-xdikibutrt"
I0621 05:58:32.717690       1 controller.go:1279] "shouldDelete is false: PersistentVolumePhase is not Released" PV="pv-xdikibutrt"
I0621 05:58:32.763484       1 controller.go:1171] "handleProtectionFinalizer" PV="&PersistentVolume{ObjectMeta:{pv-xdikibutrt    b3c10e29-2b06-4c9d-a81f-fd31de8dc86f 19801737 0 2024-06-21 05:57:11 +0000 UTC 2024-06-21 05:58:32 +0000 UTC 0xc000a958f8 map[author:kubetest] map[pv.kubernetes.io/bound-by-controller:yes] [] [external-provisioner.volume.kubernetes.io/finalizer] [{OpenAPI-Generator Update v1 2024-06-21 05:57:11 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:labels\":{\".\":{},\"f:author\":{}}},\"f:spec\":{\"f:accessModes\":{},\"f:capacity\":{\".\":{},\"f:storage\":{}},\"f:csi\":{\".\":{},\"f:driver\":{},\"f:volumeHandle\":{}},\"f:persistentVolumeReclaimPolicy\":{},\"f:storageClassName\":{},\"f:volumeMode\":{}}} } {csi-provisioner Update v1 2024-06-21 05:57:11 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:finalizers\":{\"v:\\\"external-provisioner.volume.kubernetes.io/finalizer\\\"\":{}}}} } {kube-controller-manager Update v1 2024-06-21 05:57:11 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:pv.kubernetes.io/bound-by-controller\":{}}},\"f:spec\":{\"f:claimRef\":{\".\":{},\"f:apiVersion\":{},\"f:kind\":{},\"f:name\":{},\"f:namespace\":{},\"f:resourceVersion\":{},\"f:uid\":{}}}} } {kube-controller-manager Update v1 2024-06-21 05:58:17 +0000 UTC FieldsV1 {\"f:status\":{\"f:message\":{},\"f:phase\":{}}} status}]},Spec:PersistentVolumeSpec{Capacity:ResourceList{storage: {{8589934592 0} {<nil>}  BinarySI},},PersistentVolumeSource:PersistentVolumeSource{GCEPersistentDisk:nil,AWSElasticBlockStore:nil,HostPath:nil,Glusterfs:nil,NFS:nil,RBD:nil,ISCSI:nil,Cinder:nil,CephFS:nil,FC:nil,Flocker:nil,FlexVolume:nil,AzureFile:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Local:nil,StorageOS:nil,CSI:&CSIPersistentVolumeSource{Driver:spectrumscale.csi.ibm.com,VolumeHandle:453210684566980716;34690B0A:6606C9E1;path=/ibm/fs1/dir-jzysxcelad,ReadOnly:false,FSType:,VolumeAttributes:map[string]string{},ControllerPublishSecretRef:nil,NodeStageSecretRef:nil,NodePublishSecretRef:nil,ControllerExpandSecretRef:nil,NodeExpandSecretRef:nil,},},AccessModes:[ReadWriteMany],ClaimRef:&ObjectReference{Kind:PersistentVolumeClaim,Namespace:test-driver-static-sc-14-1718949407,Name:pvc-pgweef,UID:e82645ec-6cca-462f-b57d-78b6685aabe1,APIVersion:v1,ResourceVersion:19801456,FieldPath:,},PersistentVolumeReclaimPolicy:Delete,StorageClassName:sc-ufqxiok,MountOptions:[],VolumeMode:*Filesystem,NodeAffinity:nil,VolumeAttributesClassName:nil,},Status:PersistentVolumeStatus{Phase:Failed,Message:error getting deleter volume plugin for volume \"pv-xdikibutrt\": no deletable volume plugin matched,Reason:,LastPhaseTransitionTime:2024-06-21 05:58:17 +0000 UTC,},}"
I0621 05:58:32.763535       1 controller.go:1258] "shouldDelete" PV="pv-xdikibutrt"
I0621 05:58:32.763552       1 controller.go:1279] "shouldDelete is false: PersistentVolumePhase is not Released" PV="pv-xdikibutrt"

@deepakkinni
Copy link
Member

@shrutinipane could you post the whole csi-provisioner logs? it would be helpful to get kcm logs as well. Please post the PV and PVC yaml as well. Also the steps, did you simply delete the PVC? Do the logs indicate something as a fallout after the PVC delete?

@shrutinipane
Copy link
Author

These are the CSI pods:

[root@csi-kube129-1-x-master ~]# oc get pods
NAME                                                 READY   STATUS    RESTARTS   AGE
ibm-spectrum-scale-csi-attacher-79654d86f-s24f9      1/1     Running   0          2m2s
ibm-spectrum-scale-csi-attacher-79654d86f-s9vlh      1/1     Running   0          2m2s
ibm-spectrum-scale-csi-cjbvw                         3/3     Running   0          2m2s
ibm-spectrum-scale-csi-operator-67d475b5cf-vkb5x     1/1     Running   0          3m1s
ibm-spectrum-scale-csi-provisioner-ff75d95c-6s74h    1/1     Running   0          2m2s
ibm-spectrum-scale-csi-resizer-59f6dbbdc9-lrcf5      1/1     Running   0          2m2s
ibm-spectrum-scale-csi-snapshotter-7447b7ffc-kfm9b   1/1     Running   0          2m2s
ibm-spectrum-scale-csi-xk25w                         3/3     Running   0          2m2s

The yaml representation of kubernetes objects created:
PVC:

[root@csi-kube129-1-x-master ~]# oc get pvc pvc-espxwjj -n test-driver-static-sc-14-1719212536 -oyaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
  creationTimestamp: "2024-06-24T07:02:43Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    author: kubetest
  name: pvc-espxwjj
  namespace: test-driver-static-sc-14-1719212536
  resourceVersion: "20514987"
  uid: 542119c1-fd72-4436-ad13-b85f0a8e28aa
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 8Gi
  storageClassName: sc-vmxrn
  volumeMode: Filesystem
  volumeName: pv-ndpym
status:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 8Gi
  phase: Bound

PV:

[root@csi-kube129-1-x-master ~]# oc get pv pv-ndpym -oyaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/bound-by-controller: "yes"
  creationTimestamp: "2024-06-24T07:02:43Z"
  finalizers:
  - kubernetes.io/pv-protection
  - external-provisioner.volume.kubernetes.io/finalizer
  - external-attacher/spectrumscale-csi-ibm-com
  labels:
    author: kubetest
  name: pv-ndpym
  resourceVersion: "20515001"
  uid: fb8f2b8b-ee28-4e78-9ca2-9ef776dfda77
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 8Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: pvc-espxwjj
    namespace: test-driver-static-sc-14-1719212536
    resourceVersion: "20514947"
    uid: 542119c1-fd72-4436-ad13-b85f0a8e28aa
  csi:
    driver: spectrumscale.csi.ibm.com
    volumeHandle: 453210684566980716;34690B0A:6606C9E1;path=/ibm/fs1/dir-anlkxzae
  persistentVolumeReclaimPolicy: Delete
  storageClassName: sc-vmxrn
  volumeMode: Filesystem
status:
  lastPhaseTransitionTime: "2024-06-24T07:02:43Z"
  phase: Bound

StorageClass:

[root@csi-kube129-1-x-master ~]# oc get sc sc-vmxrn -oyaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2024-06-24T07:02:32Z"
  labels:
    author: kubetest
  name: sc-vmxrn
  resourceVersion: "20514916"
  uid: 788558db-cb7f-4b23-936b-a667a739c62f
parameters:
  clusterId: "453210684566980716"
  volBackendFs: fs1
provisioner: spectrumscale.csi.ibm.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

Steps to reproduce defect:

  1. Create StorageClass, PV and PVC (refer to above yaml files)
  2. Delete PVC
  3. Delete PV
  4. Describing PV after deletion:
[root@csi-kube129-1-x-master ~]# oc describe pv pv-ndpym
Name:            pv-ndpym
Labels:          author=kubetest
Annotations:     pv.kubernetes.io/bound-by-controller: yes
Finalizers:      [external-provisioner.volume.kubernetes.io/finalizer]
StorageClass:    sc-vmxrn
Status:          Terminating (lasts 43s)
Claim:           test-driver-static-sc-14-1719212536/pvc-espxwjj
Reclaim Policy:  Delete
Access Modes:    RWX
VolumeMode:      Filesystem
Capacity:        8Gi
Node Affinity:   <none>
Message:         error getting deleter volume plugin for volume "pv-ndpym": no deletable volume plugin matched
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            spectrumscale.csi.ibm.com
    FSType:
    VolumeHandle:      453210684566980716;34690B0A:6606C9E1;path=/ibm/fs1/dir-anlkxzae
    ReadOnly:          false
    VolumeAttributes:  <none>
Events:
  Type     Reason              Age   From                                                                                                              Message
  ----     ------              ----  ----                                                                                                              -------
  Warning  VolumeFailedDelete  59s   spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-ff75d95c-6s74h_81f3b680-fc3e-49c1-8420-139b29153aca  persistentvolume pv-ndpym is still attached to node csi-kube129-1-x-worker2.fyre.ibm.com
  Warning  VolumeFailedDelete  59s   persistentvolume-controller                                                                                       error getting deleter volume plugin for volume "pv-ndpym": no deletable volume plugin matched

Please find attached CSI provisioner logs, Kube Controller Manager logs and CSI Snap.

CSI Provisioner logs:

I0624 07:04:40.736810       1 controller.go:1132] "shouldDelete" PV="pv-ndpym"
I0624 07:04:40.736821       1 controller.go:1548] "Started" PV="pv-ndpym"
E0624 07:04:40.736905       1 controller.go:1558] "Volume deletion failed" err="persistentvolume pv-ndpym is still attached to node csi-kube129-1-x-worker2.fyre.ibm.com" PV="pv-ndpym"
I0624 07:04:40.736955       1 controller.go:1007] "Retrying syncing volume" key="pv-ndpym" failures=0
E0624 07:04:40.736984       1 controller.go:1025] error syncing volume "pv-ndpym": persistentvolume pv-ndpym is still attached to node csi-kube129-1-x-worker2.fyre.ibm.com
I0624 07:04:40.737127       1 event.go:389] "Event occurred" object="pv-ndpym" fieldPath="" kind="PersistentVolume" apiVersion="v1" type="Warning" reason="VolumeFailedDelete" message="persistentvolume pv-ndpym is still attached to node csi-kube129-1-x-worker2.fyre.ibm.com"
I0624 07:04:40.751966       1 controller.go:1171] "handleProtectionFinalizer" PV="&PersistentVolume{ObjectMeta:{pv-ndpym    fb8f2b8b-ee28-4e78-9ca2-9ef776dfda77 20515335 0 2024-06-24 07:02:43 +0000 UTC <nil> <nil> map[author:kubetest] map[pv.kubernetes.io/bound-by-controller:yes] [] [kubernetes.io/pv-protection external-provisioner.volume.kubernetes.io/finalizer external-attacher/spectrumscale-csi-ibm-com] [{OpenAPI-Generator Update v1 2024-06-24 07:02:43 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:labels\":{\".\":{},\"f:author\":{}}},\"f:spec\":{\"f:accessModes\":{},\"f:capacity\":{\".\":{},\"f:storage\":{}},\"f:csi\":{\".\":{},\"f:driver\":{},\"f:volumeHandle\":{}},\"f:persistentVolumeReclaimPolicy\":{},\"f:storageClassName\":{},\"f:volumeMode\":{}}} } {csi-provisioner Update v1 2024-06-24 07:02:43 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:finalizers\":{\"v:\\\"external-provisioner.volume.kubernetes.io/finalizer\\\"\":{}}}} } {kube-controller-manager Update v1 2024-06-24 07:02:43 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:pv.kubernetes.io/bound-by-controller\":{}}},\"f:spec\":{\"f:claimRef\":{\".\":{},\"f:apiVersion\":{},\"f:kind\":{},\"f:name\":{},\"f:namespace\":{},\"f:resourceVersion\":{},\"f:uid\":{}}}} } {csi-attacher Update v1 2024-06-24 07:02:58 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:finalizers\":{\"v:\\\"external-attacher/spectrumscale-csi-ibm-com\\\"\":{}}}} } {kube-controller-manager Update v1 2024-06-24 07:04:40 +0000 UTC FieldsV1 {\"f:status\":{\"f:message\":{},\"f:phase\":{}}} status}]},Spec:PersistentVolumeSpec{Capacity:ResourceList{storage: {{8589934592 0} {<nil>}  BinarySI},},PersistentVolumeSource:PersistentVolumeSource{GCEPersistentDisk:nil,AWSElasticBlockStore:nil,HostPath:nil,Glusterfs:nil,NFS:nil,RBD:nil,ISCSI:nil,Cinder:nil,CephFS:nil,FC:nil,Flocker:nil,FlexVolume:nil,AzureFile:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Local:nil,StorageOS:nil,CSI:&CSIPersistentVolumeSource{Driver:spectrumscale.csi.ibm.com,VolumeHandle:453210684566980716;34690B0A:6606C9E1;path=/ibm/fs1/dir-anlkxzae,ReadOnly:false,FSType:,VolumeAttributes:map[string]string{},ControllerPublishSecretRef:nil,NodeStageSecretRef:nil,NodePublishSecretRef:nil,ControllerExpandSecretRef:nil,NodeExpandSecretRef:nil,},},AccessModes:[ReadWriteMany],ClaimRef:&ObjectReference{Kind:PersistentVolumeClaim,Namespace:test-driver-static-sc-14-1719212536,Name:pvc-espxwjj,UID:542119c1-fd72-4436-ad13-b85f0a8e28aa,APIVersion:v1,ResourceVersion:20514947,FieldPath:,},PersistentVolumeReclaimPolicy:Delete,StorageClassName:sc-vmxrn,MountOptions:[],VolumeMode:*Filesystem,NodeAffinity:nil,VolumeAttributesClassName:nil,},Status:PersistentVolumeStatus{Phase:Failed,Message:error getting deleter volume plugin for volume \"pv-ndpym\": no deletable volume plugin matched,Reason:,LastPhaseTransitionTime:2024-06-24 07:04:40 +0000 UTC,},}"

Kube Controller Manager logs:

E0624 07:04:40.751575       1 goroutinemap.go:150] Operation for "delete-pv-ndpym[fb8f2b8b-ee28-4e78-9ca2-9ef776dfda77]" failed. No retries permitted until 2024-06-24 07:04:41.251537223 +0000 UTC m=+339116.107773870 (durationBeforeRetry 500ms). Error: error getting deleter volume plugin for volume "pv-ndpym": no deletable volume plugin matched
I0624 07:04:40.751658       1 event.go:376] "Event occurred" object="pv-ndpym" fieldPath="" kind="PersistentVolume" apiVersion="v1" type="Warning" reason="VolumeFailedDelete" message="error getting deleter volume plugin for volume \"pv-ndpym\": no deletable volume plugin matched"

kube-controller-manager_logs.txt
csi_provisioner_logs.txt
csi_snap_for_provisioner.tar.gz

@Jainbrt
Copy link

Jainbrt commented Jun 25, 2024

@deepakkinni do you need any more information on the above issue?

@carlory
Copy link
Member

carlory commented Jun 25, 2024

@deepakkinni @Jainbrt @shrutinipane

A similar issue was reported to kubernetes/kuberentes, and fixed by this PR in 1.30. Because this feature is alpha, it is not backported to v1.29. Only v1.30 and later versions are fixed.

@Jainbrt
Copy link

Jainbrt commented Jun 25, 2024

@carlory thanks for your response, but as this feature is not enabled, we are hitting this in common path.

@carlory
Copy link
Member

carlory commented Jun 25, 2024

 Error: error getting deleter volume plugin for volume "pv-ndpym": no deletable volume plugin matched

The root cause is that the pv controller marks the pv as failed, so the external provsioner cannot handle the pv. The previous solution is gated with HonorPVReclaimPolicy, so we still hit the issue in the common path.

	if utilfeature.DefaultFeatureGate.Enabled(features.HonorPVReclaimPolicy) {
		if metav1.HasAnnotation(volume.ObjectMeta, storagehelpers.AnnMigratedTo) {
			// CSI migration scenario - do not depend on in-tree plugin
			return nil, nil
		}

		if volume.Spec.CSI != nil {
			// CSI volume source scenario - external provisioner is requested
			return nil, nil
		}
	}

Should we remove the feature-gate check in k/k and then backport the fix to previous releases?

cc @xing-yang @deepakkinni

@deepakkinni
Copy link
Member

@carlory I think we'd need to do this. In theory, the fix should have been applicable regardless of the feature HonorPVReclaimPolicy. The fix without the feature gate will ensure there are no Failed volume states for:

  1. Statically provisioned csi volumes
  2. Migrated volumes

But how did this work in the past? With older kcm and older external-provisioner, would deleting a statically provisioned volume(bound pv-pvc, delete PVC) leave the PV in a Failed state? @shrutinipane would it be possible to repeat this using <5.0.1 provisioner? What was the state of PV when PVC was deleted? can you post kcm logs as well?post pv,PVC yamls

@shrutinipane
Copy link
Author

With older provisioner version, PV is in Released state and then gets deleted.
I have created PV pv-jsffwiqcc and PVC pvc-fxmzvrucf.

CSI Provisioner Logs:

I0626 13:00:14.507818       1 controller.go:1152] handleProtectionFinalizer Volume : &PersistentVolume{ObjectMeta:{pv-jsffwiqcc    7b8860d9-3424-400b-9158-099677c83dfb 21049328 0 2024-06-26 12:59:23 +0000 UTC <nil> <nil> map[author:kubetest] map[pv.kubernetes.io/bound-by-controller:yes] [] [kubernetes.io/pv-protection external-attacher/spectrumscale-csi-ibm-com] [{OpenAPI-Generator Update v1 2024-06-26 12:59:23 +0000 UTC FieldsV1 {"f:metadata":{"f:labels":{".":{},"f:author":{}}},"f:spec":{"f:accessModes":{},"f:capacity":{".":{},"f:storage":{}},"f:csi":{".":{},"f:driver":{},"f:volumeHandle":{}},"f:persistentVolumeReclaimPolicy":{},"f:storageClassName":{},"f:volumeMode":{}}} } {kube-controller-manager Update v1 2024-06-26 12:59:23 +0000 UTC FieldsV1 {"f:metadata":{"f:annotations":{".":{},"f:pv.kubernetes.io/bound-by-controller":{}}},"f:spec":{"f:claimRef":{".":{},"f:apiVersion":{},"f:kind":{},"f:name":{},"f:namespace":{},"f:resourceVersion":{},"f:uid":{}}}} } {csi-attacher Update v1 2024-06-26 12:59:28 +0000 UTC FieldsV1 {"f:metadata":{"f:finalizers":{"v:\"external-attacher/spectrumscale-csi-ibm-com\"":{}}}} } {kube-controller-manager Update v1 2024-06-26 13:00:14 +0000 UTC FieldsV1 {"f:status":{"f:phase":{}}} status}]},Spec:PersistentVolumeSpec{Capacity:ResourceList{storage: {{8589934592 0} {<nil>}  BinarySI},},PersistentVolumeSource:PersistentVolumeSource{GCEPersistentDisk:nil,AWSElasticBlockStore:nil,HostPath:nil,Glusterfs:nil,NFS:nil,RBD:nil,ISCSI:nil,Cinder:nil,CephFS:nil,FC:nil,Flocker:nil,FlexVolume:nil,AzureFile:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Local:nil,StorageOS:nil,CSI:&CSIPersistentVolumeSource{Driver:spectrumscale.csi.ibm.com,VolumeHandle:453210684566980716;34690B0A:6606C9E1;path=/ibm/fs1/dir-cmsbmro,ReadOnly:false,FSType:,VolumeAttributes:map[string]string{},ControllerPublishSecretRef:nil,NodeStageSecretRef:nil,NodePublishSecretRef:nil,ControllerExpandSecretRef:nil,NodeExpandSecretRef:nil,},},AccessModes:[ReadWriteMany],ClaimRef:&ObjectReference{Kind:PersistentVolumeClaim,Namespace:test-driver-static-sc-14-1719406744,Name:pvc-fxmzvrucf,UID:b478df92-ef7a-4725-8a89-bd98e6797e06,APIVersion:v1,ResourceVersion:21049139,FieldPath:,},PersistentVolumeReclaimPolicy:Delete,StorageClassName:sc-ztjnmf,MountOptions:[],VolumeMode:*Filesystem,NodeAffinity:nil,VolumeAttributesClassName:nil,},Status:PersistentVolumeStatus{Phase:Released,Message:,Reason:,LastPhaseTransitionTime:2024-06-26 13:00:14 +0000 UTC,},}
I0626 13:00:14.508028       1 controller.go:1239] shouldDelete volume "pv-jsffwiqcc"
I0626 13:00:14.508044       1 controller.go:1269] shouldDelete volume "pv-jsffwiqcc" is true
I0626 13:00:14.508050       1 controller.go:1113] shouldDelete Volume: "pv-jsffwiqcc"
I0626 13:00:14.508060       1 controller.go:1509] delete "pv-jsffwiqcc": started
I0626 13:00:14.508113       1 connection.go:244] GRPC call: /csi.v1.Controller/DeleteVolume
I0626 13:00:14.508121       1 connection.go:245] GRPC request: {"volume_id":"453210684566980716;34690B0A:6606C9E1;path=/ibm/fs1/dir-cmsbmro"}
I0626 13:00:30.325843       1 connection.go:251] GRPC response: {}
I0626 13:00:30.325871       1 connection.go:252] GRPC error: <nil>
I0626 13:00:30.325891       1 controller.go:1524] delete "pv-jsffwiqcc": volume deleted
I0626 13:00:30.331759       1 controller.go:1530] delete "pv-jsffwiqcc": failed to delete persistentvolume: persistentvolumes "pv-jsffwiqcc" not found
W0626 13:00:30.331811       1 controller.go:989] Retrying syncing volume "pv-jsffwiqcc", failure 0
E0626 13:00:30.331833       1 controller.go:1007] error syncing volume "pv-jsffwiqcc": persistentvolumes "pv-jsffwiqcc" not found

Kube Controller Manager logs:

626 13:00:14.531991       1 event.go:376] "Event occurred" object="pv-jsffwiqcc" fieldPath="" kind="PersistentVolume" apiVersion="v1" type="Warning" reason="VolumeFailedDelete" message="error getting deleter volume plugin for volume \"pv-jsffwiqcc\": no deletable volume plugin matched"
I0626 13:00:19.447842       1 event.go:376] "Event occurred" object="test-driver-static-sc-14-1719406744/pvc-zogxg" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="VolumeMismatch" message="Cannot bind to requested volume \"pv-zrtunanet\": incompatible accessMode"
I0626 13:00:24.495306       1 event.go:376] "Event occurred" object="test-driver-static-sc-14-1719406744/pvc-wntncvm" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="VolumeMismatch" message="Cannot bind to requested volume \"pv-ulnhx\": incompatible accessMode"
I0626 13:00:24.563981       1 event.go:376] "Event occurred" object="test-driver-static-sc-14-1719210522/pvc-wtmjwf" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="VolumeMismatch" message="Cannot bind to requested volume \"pv-wmqrgza\": incompatible accessMode"
I0626 13:00:24.564694       1 event.go:376] "Event occurred" object="test-driver-static-sc-14-1719210522/pvc-yqmro" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="VolumeMismatch" message="Cannot bind to requested volume \"pv-dcqlemx\": incompatible accessMode"
I0626 13:00:24.564743       1 event.go:376] "Event occurred" object="test-driver-static-sc-14-1719225244/pvc-sxkot" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="VolumeMismatch" message="Cannot bind to requested volume \"pv-wbgeoe\": incompatible accessMode"
I0626 13:00:24.564758       1 event.go:376] "Event occurred" object="test-driver-static-sc-14-1719225244/pvc-wcjqitliw" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="VolumeMismatch" message="Cannot bind to requested volume \"pv-pfwhq\": incompatible accessMode"
I0626 13:00:24.564771       1 event.go:376] "Event occurred" object="test-driver-static-sc-14-1719209606/pvc-dorgvat" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="VolumeMismatch" message="Cannot bind to requested volume \"pv-hmjpcdn\": incompatible accessMode"
I0626 13:00:24.564782       1 event.go:376] "Event occurred" object="test-driver-static-sc-14-1719209606/pvc-lgukdapx" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="VolumeMismatch" message="Cannot bind to requested volume \"pv-houmrovm\": incompatible accessMode"
E0626 13:00:24.581067       1 goroutinemap.go:150] Operation for "delete-pv-jsffwiqcc[7b8860d9-3424-400b-9158-099677c83dfb]" failed. No retries permitted until 2024-06-26 13:00:25.581046191 +0000 UTC m=+533260.437282843 (durationBeforeRetry 1s). Error: error getting deleter volume plugin for volume "pv-jsffwiqcc": no deletable volume plugin matched
E0626 13:00:29.663000       1 pv_protection_controller.go:113] PV pv-jsffwiqcc failed with : Operation cannot be fulfilled on persistentvolumes "pv-jsffwiqcc": the object has been modified; please apply your changes to the latest version and try again

Attaching log files:
provisioner_logs.txt
kcm_logs.txt

@deepakkinni
Copy link
Member

Thanks @shrutinipane
PV: pv-jsffwiqcc

From external-provisioner perspective the PV moves from Bound to Released state.

0626 12:59:23.586577       1 controller.go:1152] handleProtectionFinalizer Volume : &PersistentVolume{ObjectMeta:{pv-jsffwiqcc    7b8860d9-3424-400b-9158-099677c83dfb 21049140 0 2024-06-26 12:59:23 +0000 UTC <nil> <nil> map[author:kubetest] map[pv.kubernetes.io/bound-by-controller:yes] [] [kubernetes.io/pv-protection] [{OpenAPI-Generator Update v1 2024-06-26 12:59:23 +0000 UTC FieldsV1 {"f:metadata":{"f:labels":{".":{},"f:author":{}}},"f:spec":{"f:accessModes":{},"f:capacity":{".":{},"f:storage":{}},"f:csi":{".":{},"f:driver":{},"f:volumeHandle":{}},"f:persistentVolumeReclaimPolicy":{},"f:storageClassName":{},"f:volumeMode":{}}} } {kube-controller-manager Update v1 2024-06-26 12:59:23 +0000 UTC FieldsV1 {"f:metadata":{"f:annotations":{".":{},"f:pv.kubernetes.io/bound-by-controller":{}}},"f:spec":{"f:claimRef":{".":{},"f:apiVersion":{},"f:kind":{},"f:name":{},"f:namespace":{},"f:resourceVersion":{},"f:uid":{}}}} } {kube-controller-manager Update v1 2024-06-26 12:59:23 +0000 UTC FieldsV1 {"f:status":{"f:phase":{}}} status}]},Spec:PersistentVolumeSpec{Capacity:ResourceList{storage: {{8589934592 0} {<nil>}  BinarySI},},PersistentVolumeSource:PersistentVolumeSource{GCEPersistentDisk:nil,AWSElasticBlockStore:nil,HostPath:nil,Glusterfs:nil,NFS:nil,RBD:nil,ISCSI:nil,Cinder:nil,CephFS:nil,FC:nil,Flocker:nil,FlexVolume:nil,AzureFile:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Local:nil,StorageOS:nil,CSI:&CSIPersistentVolumeSource{Driver:spectrumscale.csi.ibm.com,VolumeHandle:453210684566980716;34690B0A:6606C9E1;path=/ibm/fs1/dir-cmsbmro,ReadOnly:false,FSType:,VolumeAttributes:map[string]string{},ControllerPublishSecretRef:nil,NodeStageSecretRef:nil,NodePublishSecretRef:nil,ControllerExpandSecretRef:nil,NodeExpandSecretRef:nil,},},AccessModes:[ReadWriteMany],ClaimRef:&ObjectReference{Kind:PersistentVolumeClaim,Namespace:test-driver-static-sc-14-1719406744,Name:pvc-fxmzvrucf,UID:b478df92-ef7a-4725-8a89-bd98e6797e06,APIVersion:v1,ResourceVersion:21049139,FieldPath:,},PersistentVolumeReclaimPolicy:Delete,StorageClassName:sc-ztjnmf,MountOptions:[],VolumeMode:*Filesystem,NodeAffinity:nil,VolumeAttributesClassName:nil,},Status:PersistentVolumeStatus{Phase:Available,Message:,Reason:,LastPhaseTransitionTime:2024-06-26 12:59:23 +0000 UTC,},}
I0626 12:59:23.586760       1 controller.go:1239] shouldDelete volume "pv-jsffwiqcc"
I0626 12:59:23.586772       1 controller.go:1260] shouldDelete volume "pv-jsffwiqcc" is false: PersistentVolumePhase is not Released
I0626 12:59:23.622008       1 controller.go:1152] handleProtectionFinalizer Volume : &PersistentVolume{ObjectMeta:{pv-jsffwiqcc    7b8860d9-3424-400b-9158-099677c83dfb 21049141 0 2024-06-26 12:59:23 +0000 UTC <nil> <nil> map[author:kubetest] map[pv.kubernetes.io/bound-by-controller:yes] [] [kubernetes.io/pv-protection] [{OpenAPI-Generator Update v1 2024-06-26 12:59:23 +0000 UTC FieldsV1 {"f:metadata":{"f:labels":{".":{},"f:author":{}}},"f:spec":{"f:accessModes":{},"f:capacity":{".":{},"f:storage":{}},"f:csi":{".":{},"f:driver":{},"f:volumeHandle":{}},"f:persistentVolumeReclaimPolicy":{},"f:storageClassName":{},"f:volumeMode":{}}} } {kube-controller-manager Update v1 2024-06-26 12:59:23 +0000 UTC FieldsV1 {"f:metadata":{"f:annotations":{".":{},"f:pv.kubernetes.io/bound-by-controller":{}}},"f:spec":{"f:claimRef":{".":{},"f:apiVersion":{},"f:kind":{},"f:name":{},"f:namespace":{},"f:resourceVersion":{},"f:uid":{}}}} } {kube-controller-manager Update v1 2024-06-26 12:59:23 +0000 UTC FieldsV1 {"f:status":{"f:phase":{}}} status}]},Spec:PersistentVolumeSpec{Capacity:ResourceList{storage: {{8589934592 0} {<nil>}  BinarySI},},PersistentVolumeSource:PersistentVolumeSource{GCEPersistentDisk:nil,AWSElasticBlockStore:nil,HostPath:nil,Glusterfs:nil,NFS:nil,RBD:nil,ISCSI:nil,Cinder:nil,CephFS:nil,FC:nil,Flocker:nil,FlexVolume:nil,AzureFile:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Local:nil,StorageOS:nil,CSI:&CSIPersistentVolumeSource{Driver:spectrumscale.csi.ibm.com,VolumeHandle:453210684566980716;34690B0A:6606C9E1;path=/ibm/fs1/dir-cmsbmro,ReadOnly:false,FSType:,VolumeAttributes:map[string]string{},ControllerPublishSecretRef:nil,NodeStageSecretRef:nil,NodePublishSecretRef:nil,ControllerExpandSecretRef:nil,NodeExpandSecretRef:nil,},},AccessModes:[ReadWriteMany],ClaimRef:&ObjectReference{Kind:PersistentVolumeClaim,Namespace:test-driver-static-sc-14-1719406744,Name:pvc-fxmzvrucf,UID:b478df92-ef7a-4725-8a89-bd98e6797e06,APIVersion:v1,ResourceVersion:21049139,FieldPath:,},PersistentVolumeReclaimPolicy:Delete,StorageClassName:sc-ztjnmf,MountOptions:[],VolumeMode:*Filesystem,NodeAffinity:nil,VolumeAttributesClassName:nil,},Status:PersistentVolumeStatus{Phase:Bound,Message:,Reason:,LastPhaseTransitionTime:2024-06-26 12:59:23 +0000 UTC,},}
I0626 12:59:23.622276       1 controller.go:1239] shouldDelete volume "pv-jsffwiqcc"
I0626 12:59:23.622368       1 controller.go:1260] shouldDelete volume "pv-jsffwiqcc" is false: PersistentVolumePhase is not Released
I0626 12:59:28.811648       1 controller.go:1152] handleProtectionFinalizer Volume : &PersistentVolume{ObjectMeta:{pv-jsffwiqcc    7b8860d9-3424-400b-9158-099677c83dfb 21049164 0 2024-06-26 12:59:23 +0000 UTC <nil> <nil> map[author:kubetest] map[pv.kubernetes.io/bound-by-controller:yes] [] [kubernetes.io/pv-protection external-attacher/spectrumscale-csi-ibm-com] [{OpenAPI-Generator Update v1 2024-06-26 12:59:23 +0000 UTC FieldsV1 {"f:metadata":{"f:labels":{".":{},"f:author":{}}},"f:spec":{"f:accessModes":{},"f:capacity":{".":{},"f:storage":{}},"f:csi":{".":{},"f:driver":{},"f:volumeHandle":{}},"f:persistentVolumeReclaimPolicy":{},"f:storageClassName":{},"f:volumeMode":{}}} } {kube-controller-manager Update v1 2024-06-26 12:59:23 +0000 UTC FieldsV1 {"f:metadata":{"f:annotations":{".":{},"f:pv.kubernetes.io/bound-by-controller":{}}},"f:spec":{"f:claimRef":{".":{},"f:apiVersion":{},"f:kind":{},"f:name":{},"f:namespace":{},"f:resourceVersion":{},"f:uid":{}}}} } {kube-controller-manager Update v1 2024-06-26 12:59:23 +0000 UTC FieldsV1 {"f:status":{"f:phase":{}}} status} {csi-attacher Update v1 2024-06-26 12:59:28 +0000 UTC FieldsV1 {"f:metadata":{"f:finalizers":{"v:\"external-attacher/spectrumscale-csi-ibm-com\"":{}}}} }]},Spec:PersistentVolumeSpec{Capacity:ResourceList{storage: {{8589934592 0} {<nil>}  BinarySI},},PersistentVolumeSource:PersistentVolumeSource{GCEPersistentDisk:nil,AWSElasticBlockStore:nil,HostPath:nil,Glusterfs:nil,NFS:nil,RBD:nil,ISCSI:nil,Cinder:nil,CephFS:nil,FC:nil,Flocker:nil,FlexVolume:nil,AzureFile:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Local:nil,StorageOS:nil,CSI:&CSIPersistentVolumeSource{Driver:spectrumscale.csi.ibm.com,VolumeHandle:453210684566980716;34690B0A:6606C9E1;path=/ibm/fs1/dir-cmsbmro,ReadOnly:false,FSType:,VolumeAttributes:map[string]string{},ControllerPublishSecretRef:nil,NodeStageSecretRef:nil,NodePublishSecretRef:nil,ControllerExpandSecretRef:nil,NodeExpandSecretRef:nil,},},AccessModes:[ReadWriteMany],ClaimRef:&ObjectReference{Kind:PersistentVolumeClaim,Namespace:test-driver-static-sc-14-1719406744,Name:pvc-fxmzvrucf,UID:b478df92-ef7a-4725-8a89-bd98e6797e06,APIVersion:v1,ResourceVersion:21049139,FieldPath:,},PersistentVolumeReclaimPolicy:Delete,StorageClassName:sc-ztjnmf,MountOptions:[],VolumeMode:*Filesystem,NodeAffinity:nil,VolumeAttributesClassName:nil,},Status:PersistentVolumeStatus{Phase:Bound,Message:,Reason:,LastPhaseTransitionTime:2024-06-26 12:59:23 +0000 UTC,},}
I0626 12:59:28.811887       1 controller.go:1239] shouldDelete volume "pv-jsffwiqcc"
I0626 12:59:28.811906       1 controller.go:1260] shouldDelete volume "pv-jsffwiqcc" is false: PersistentVolumePhase is not Released
.
.
.
I0626 13:00:14.507818       1 controller.go:1152] handleProtectionFinalizer Volume : &PersistentVolume{ObjectMeta:{pv-jsffwiqcc    7b8860d9-3424-400b-9158-099677c83dfb 21049328 0 2024-06-26 12:59:23 +0000 UTC <nil> <nil> map[author:kubetest] map[pv.kubernetes.io/bound-by-controller:yes] [] [kubernetes.io/pv-protection external-attacher/spectrumscale-csi-ibm-com] [{OpenAPI-Generator Update v1 2024-06-26 12:59:23 +0000 UTC FieldsV1 {"f:metadata":{"f:labels":{".":{},"f:author":{}}},"f:spec":{"f:accessModes":{},"f:capacity":{".":{},"f:storage":{}},"f:csi":{".":{},"f:driver":{},"f:volumeHandle":{}},"f:persistentVolumeReclaimPolicy":{},"f:storageClassName":{},"f:volumeMode":{}}} } {kube-controller-manager Update v1 2024-06-26 12:59:23 +0000 UTC FieldsV1 {"f:metadata":{"f:annotations":{".":{},"f:pv.kubernetes.io/bound-by-controller":{}}},"f:spec":{"f:claimRef":{".":{},"f:apiVersion":{},"f:kind":{},"f:name":{},"f:namespace":{},"f:resourceVersion":{},"f:uid":{}}}} } {csi-attacher Update v1 2024-06-26 12:59:28 +0000 UTC FieldsV1 {"f:metadata":{"f:finalizers":{"v:\"external-attacher/spectrumscale-csi-ibm-com\"":{}}}} } {kube-controller-manager Update v1 2024-06-26 13:00:14 +0000 UTC FieldsV1 {"f:status":{"f:phase":{}}} status}]},Spec:PersistentVolumeSpec{Capacity:ResourceList{storage: {{8589934592 0} {<nil>}  BinarySI},},PersistentVolumeSource:PersistentVolumeSource{GCEPersistentDisk:nil,AWSElasticBlockStore:nil,HostPath:nil,Glusterfs:nil,NFS:nil,RBD:nil,ISCSI:nil,Cinder:nil,CephFS:nil,FC:nil,Flocker:nil,FlexVolume:nil,AzureFile:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Local:nil,StorageOS:nil,CSI:&CSIPersistentVolumeSource{Driver:spectrumscale.csi.ibm.com,VolumeHandle:453210684566980716;34690B0A:6606C9E1;path=/ibm/fs1/dir-cmsbmro,ReadOnly:false,FSType:,VolumeAttributes:map[string]string{},ControllerPublishSecretRef:nil,NodeStageSecretRef:nil,NodePublishSecretRef:nil,ControllerExpandSecretRef:nil,NodeExpandSecretRef:nil,},},AccessModes:[ReadWriteMany],ClaimRef:&ObjectReference{Kind:PersistentVolumeClaim,Namespace:test-driver-static-sc-14-1719406744,Name:pvc-fxmzvrucf,UID:b478df92-ef7a-4725-8a89-bd98e6797e06,APIVersion:v1,ResourceVersion:21049139,FieldPath:,},PersistentVolumeReclaimPolicy:Delete,StorageClassName:sc-ztjnmf,MountOptions:[],VolumeMode:*Filesystem,NodeAffinity:nil,VolumeAttributesClassName:nil,},Status:PersistentVolumeStatus{Phase:Released,Message:,Reason:,LastPhaseTransitionTime:2024-06-26 13:00:14 +0000 UTC,},}
I0626 13:00:14.508028       1 controller.go:1239] shouldDelete volume "pv-jsffwiqcc"
I0626 13:00:14.508044       1 controller.go:1269] shouldDelete volume "pv-jsffwiqcc" is true
I0626 13:00:14.508050       1 controller.go:1113] shouldDelete Volume: "pv-jsffwiqcc"
I0626 13:00:14.508060       1 controller.go:1509] delete "pv-jsffwiqcc": started
I0626 13:00:14.508113       1 connection.go:244] GRPC call: /csi.v1.Controller/DeleteVolume
I0626 13:00:14.508121       1 connection.go:245] GRPC request: {"volume_id":"453210684566980716;34690B0A:6606C9E1;path=/ibm/fs1/dir-cmsbmro"}
I0626 13:00:30.325843       1 connection.go:251] GRPC response: {}
I0626 13:00:30.325871       1 connection.go:252] GRPC error: <nil>
I0626 13:00:30.325891       1 controller.go:1524] delete "pv-jsffwiqcc": volume deleted
I0626 13:00:30.331759       1 controller.go:1530] delete "pv-jsffwiqcc": failed to delete persistentvolume: persistentvolumes "pv-jsffwiqcc" not found

But in kcm, there's no indication that it moved into Released state, on the other hand the indicated error should have moved the PV to failed state

E0626 13:00:14.531745       1 goroutinemap.go:150] Operation for "delete-pv-jsffwiqcc[7b8860d9-3424-400b-9158-099677c83dfb]" failed. No retries permitted until 2024-06-26 13:00:15.031690313 +0000 UTC m=+533249.887927013 (durationBeforeRetry 500ms). Error: error getting deleter volume plugin for volume "pv-jsffwiqcc": no deletable volume plugin matched
I0626 13:00:14.531991       1 event.go:376] "Event occurred" object="pv-jsffwiqcc" fieldPath="" kind="PersistentVolume" apiVersion="v1" type="Warning" reason="VolumeFailedDelete" message="error getting deleter volume plugin for volume \"pv-jsffwiqcc\": no deletable volume plugin matched"

Based on https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/persistentvolume/pv_controller.go#L1318-L1334

	pluginName, deleted, err := ctrl.doDeleteVolume(ctx, volume)
	if err != nil {
		// Delete failed, update the volume and emit an event.
		logger.V(3).Info("Deletion of volume failed", "volumeName", volume.Name, "err", err)
		if volerr.IsDeletedVolumeInUse(err) {
			// The plugin needs more time, don't mark the volume as Failed
			// and send Normal event only
			ctrl.eventRecorder.Event(volume, v1.EventTypeNormal, events.VolumeDelete, err.Error())
		} else {
			// The plugin failed, mark the volume as Failed and send Warning
			// event
			if _, err := ctrl.updateVolumePhaseWithEvent(ctx, volume, v1.VolumeFailed, v1.EventTypeWarning, events.VolumeFailedDelete, err.Error()); err != nil {
				logger.V(4).Info("DeleteVolumeOperation: failed to mark volume as failed", "volumeName", volume.Name, "err", err)
				// Save failed, retry on the next deletion attempt
				return pluginName, err
			}
		}

This was supposed to move the PV into VolumeFailed state.

https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/plugins.go#L770-L779

// FindDeletablePluginBySpec fetches a persistent volume plugin by spec.  If
// no plugin is found, returns error.
func (pm *VolumePluginMgr) FindDeletablePluginBySpec(spec *Spec) (DeletableVolumePlugin, error) {
	volumePlugin, err := pm.FindPluginBySpec(spec)
	if err != nil {
		return nil, err
	}
	if deletableVolumePlugin, ok := volumePlugin.(DeletableVolumePlugin); ok {
		return deletableVolumePlugin, nil
	}
	return nil, fmt.Errorf("no deletable volume plugin matched")
}

Will follow up this soon.

@carlory
Copy link
Member

carlory commented Jun 27, 2024

The KCM updates pv's phase with the Released state and then may move this field into the Failed state, please refer to kubernetes/kubernetes#122030 (comment) for more details.

We can observe the behavior from the above log snippet of the KCM and the external provisioner.

@Jainbrt
Copy link

Jainbrt commented Jun 28, 2024

Thanks @carlory for the help here and raising the fix. I was wondering if we would get new provisioner sidecar image or this be a dependency on newer k8s versions?

@carlory
Copy link
Member

carlory commented Jun 28, 2024

@Jainbrt After the pr is merged, it will be backported to 1.30, 1.29, 1.28, and 1.27. The latest provisioner image is enough, so we don't need to release a newer one.

@Jainbrt
Copy link

Jainbrt commented Jun 28, 2024

Thanks @carlory for the help in understanding, while are k8s < 1.27 versions also impacted ?

@carlory
Copy link
Member

carlory commented Jun 28, 2024

Yes, please refer to End of Life

@deepakkinni
Copy link
Member

Hi @carlory, i don't think that is the root cause, it'll definitely resolve the issue(since we avoid moving into Failed state entirely), but there seems to be other issue.

To summarize the observation based on logs in in #1235 (comment), the volume is moved into a Released state, followed by a Failed state, this infact was observed in the csi-provisioner logs:

I0624 07:04:40.736752       1 controller.go:1171] "handleProtectionFinalizer" PV="&PersistentVolume{ObjectMeta:{pv-ndpym    fb8f2b8b-ee28-4e78-9ca2-9ef776dfda77 20515333 0 2024-06-24 07:02:43 +0000 UTC <nil> <nil> map[author:kubetest] map[pv.kubernetes.io/bound-by-controller:yes] [] [kubernetes.io/pv-protection external-provisioner.volume.kubernetes.io/finalizer external-attacher/spectrumscale-csi-ibm-com] [{OpenAPI-Generator Update v1 2024-06-24 07:02:43 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:labels\":{\".\":{},\"f:author\":{}}},\"f:spec\":{\"f:accessModes\":{},\"f:capacity\":{\".\":{},\"f:storage\":{}},\"f:csi\":{\".\":{},\"f:driver\":{},\"f:volumeHandle\":{}},\"f:persistentVolumeReclaimPolicy\":{},\"f:storageClassName\":{},\"f:volumeMode\":{}}} } {csi-provisioner Update v1 2024-06-24 07:02:43 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:finalizers\":{\"v:\\\"external-provisioner.volume.kubernetes.io/finalizer\\\"\":{}}}} } {kube-controller-manager Update v1 2024-06-24 07:02:43 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:pv.kubernetes.io/bound-by-controller\":{}}},\"f:spec\":{\"f:claimRef\":{\".\":{},\"f:apiVersion\":{},\"f:kind\":{},\"f:name\":{},\"f:namespace\":{},\"f:resourceVersion\":{},\"f:uid\":{}}}} } {csi-attacher Update v1 2024-06-24 07:02:58 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:finalizers\":{\"v:\\\"external-attacher/spectrumscale-csi-ibm-com\\\"\":{}}}} } {kube-controller-manager Update v1 2024-06-24 07:04:40 +0000 UTC FieldsV1 {\"f:status\":{\"f:phase\":{}}} status}]},Spec:PersistentVolumeSpec{Capacity:ResourceList{storage: {{8589934592 0} {<nil>}  BinarySI},},PersistentVolumeSource:PersistentVolumeSource{GCEPersistentDisk:nil,AWSElasticBlockStore:nil,HostPath:nil,Glusterfs:nil,NFS:nil,RBD:nil,ISCSI:nil,Cinder:nil,CephFS:nil,FC:nil,Flocker:nil,FlexVolume:nil,AzureFile:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Local:nil,StorageOS:nil,CSI:&CSIPersistentVolumeSource{Driver:spectrumscale.csi.ibm.com,VolumeHandle:453210684566980716;34690B0A:6606C9E1;path=/ibm/fs1/dir-anlkxzae,ReadOnly:false,FSType:,VolumeAttributes:map[string]string{},ControllerPublishSecretRef:nil,NodeStageSecretRef:nil,NodePublishSecretRef:nil,ControllerExpandSecretRef:nil,NodeExpandSecretRef:nil,},},AccessModes:[ReadWriteMany],ClaimRef:&ObjectReference{Kind:PersistentVolumeClaim,Namespace:test-driver-static-sc-14-1719212536,Name:pvc-espxwjj,UID:542119c1-fd72-4436-ad13-b85f0a8e28aa,APIVersion:v1,ResourceVersion:20514947,FieldPath:,},PersistentVolumeReclaimPolicy:Delete,StorageClassName:sc-vmxrn,MountOptions:[],VolumeMode:*Filesystem,NodeAffinity:nil,VolumeAttributesClassName:nil,},Status:PersistentVolumeStatus{Phase:Released,Message:,Reason:,LastPhaseTransitionTime:2024-06-24 07:04:40 +0000 UTC,},}"
I0624 07:04:40.736791       1 controller.go:1258] "shouldDelete" PV="pv-ndpym"
I0624 07:04:40.736802       1 controller.go:1288] "shouldDelete is true" PV="pv-ndpym"
I0624 07:04:40.736810       1 controller.go:1132] "shouldDelete" PV="pv-ndpym"
I0624 07:04:40.736821       1 controller.go:1548] "Started" PV="pv-ndpym"
E0624 07:04:40.736905       1 controller.go:1558] "Volume deletion failed" err="persistentvolume pv-ndpym is still attached to node csi-kube129-1-x-worker2.fyre.ibm.com" PV="pv-ndpym"
I0624 07:04:40.736955       1 controller.go:1007] "Retrying syncing volume" key="pv-ndpym" failures=0
E0624 07:04:40.736984       1 controller.go:1025] error syncing volume "pv-ndpym": persistentvolume pv-ndpym is still attached to node csi-kube129-1-x-worker2.fyre.ibm.com
I0624 07:04:40.737127       1 event.go:389] "Event occurred" object="pv-ndpym" fieldPath="" kind="PersistentVolume" apiVersion="v1" type="Warning" reason="VolumeFailedDelete" message="persistentvolume pv-ndpym is still attached to node csi-kube129-1-x-worker2.fyre.ibm.com"

As seen in the above logs, the csi driver threw the error with message persistentvolume pv-ndpym is still attached to node csi-kube129-1-x-worker2.fyre.ibm.com, this triggered a retry, in the meanwhile, kcm moved the PV into a Failed state, as a result, this made csi-provisioner fail the shouldDelete checks and make no more progress.

Now, even if kcm doesn't move into Failed state, the driver may not necessarily successfully delete the volume because of the above error.

In summary, the change kubernetes/kubernetes#125767 should go in, but doesn't mean it'll resolve @shrutinipane reported issue. Perhaps, the driver eventually detaches the volume from node, then maybe they'll not see the issue.

@carlory
Copy link
Member

carlory commented Jul 1, 2024

@deepakkinni I have a different opinion on the error log. Let's see the function signature:

func (p *csiProvisioner) canDeleteVolume(volume *v1.PersistentVolume) error

It uses the return value to indicate whether the volume can be deleted or not.

	for _, va := range vaList {
		if va.Spec.Source.PersistentVolumeName != nil && *va.Spec.Source.PersistentVolumeName == volume.Name {
			return fmt.Errorf("persistentvolume %s is still attached to node %s", volume.Name, va.Spec.NodeName)
		}
	}

the above code snippet is checking if the volume is still attached to a node.

If it is, it has to return an error (not a boolean value) to indicate that the volume cannot be deleted.

The detach operation is asynchronous, the volume may still be attached to the node when the canDeleteVolume function is called. So, the message's level should be Warning or Info if we use a boolean value to indicate the result.

@carlory
Copy link
Member

carlory commented Jul 1, 2024

  1. a user deletes pod, pvc

  2. adc does detach operation, the volumeattachment object is being deleted, and csi-attacher observes the event and calls RPC to detach the volume

  3. pv binder controller observes pod and pvc are deleted, and then updates pv's phase with Release. csi-provsioner watches the pv event and try to delete the pv if the volume is not attached to a node.

  4. If the csi-attacher takes a long time to detach volume, csi-provsioner has retried 15 times. the volume is still attached to a node but the volume key is removed from csi-provisioner's volume queue. so the pv is stuck in Terminating phase until the informer resync period is reached (15m) and retries the above step.

@deepakkinni
Copy link
Member

Thanks for the clarification, @carlory. I was assuming the error was coming from the driver, but it seems like it's from the provisioner itself.

@Jainbrt
Copy link

Jainbrt commented Jul 2, 2024

Thanks @carlory and @deepakkinni for helping on this issue, do we have any outlook when will we get the fix kubernetes/kubernetes#125767 in next k8s PTF builds ?

@shrutinipane
Copy link
Author

@carlory Regarding comment #1235 (comment), with the current implementation of provisioner v5.0.1, PV stays in Terminating stays and doesn't get cleaned up after 15 minutes.

@carlory
Copy link
Member

carlory commented Jul 2, 2024

@shrutinipane once the pv is updated with Failed state by kcm, the csi-provisioner won't be able to handle the pv. so kubernetes/kubernetes#125767 will fix this.

@carlory
Copy link
Member

carlory commented Jul 11, 2024

Thanks @carlory for the help in understanding, while are k8s < 1.27 versions also impacted ?

@deepakkinni @Jainbrt Kubernetes release 1.27 reaches the End of Life, please see https://kubernetes.io/releases/patch-releases/\#1-27. so 1.27 won't include this fix

@Jainbrt
Copy link

Jainbrt commented Jul 12, 2024

Sure @carlory . I hope this would be picked for 1.28,1.29 and 1.30 k8s further ptf releases.

@jsafrane
Copy link
Contributor

Found it, the provisioner computes a wrong JSON patch to remove the finalizer.

removeFinalizer modifies its argument, which means newVolume.finalizers:

finalizers, modified := removeFinalizer(newVolume.ObjectMeta.Finalizers, finalizerPV)

But newVolume is used as the original PV to compute JSON patch. With the finalizer removed from it, the resulting patch does not do anything useful:

if _, err = ctrl.patchPersistentVolumeWithFinalizers(ctx, newVolume, finalizers); err != nil {

@jsafrane
Copy link
Contributor

@carlory
Copy link
Member

carlory commented Jul 26, 2024

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Jul 26, 2024
@k8s-ci-robot
Copy link
Contributor

@carlory: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@carlory
Copy link
Member

carlory commented Jul 29, 2024

Needs to bump dependencies @jsafrane

@jsafrane
Copy link
Contributor

@shrutinipane @andyzhangx @Jainbrt

Before I tag a new sig-storage-lib-external-provisioner + external-provisioner versions, I'd like to make sure the fix is complete and there are no lose ends. Can you please test #1245? If you trust me, I prepared an image from the PR at quay.io/jsafrane/external-provisioner:test1245
(amd64 only).

Thanks in advance for any feedback.

@Jainbrt
Copy link

Jainbrt commented Jul 29, 2024

Sure @jsafrane we can give this a try. thanks for the help.

@Jainbrt
Copy link

Jainbrt commented Aug 1, 2024

@jsafrane we do not have access to pull the above images, can you please help with the same ?

@jsafrane
Copy link
Contributor

jsafrane commented Aug 1, 2024

@Jainbrt sorry, I forgot to make the repo public. Please try again.

@Jainbrt
Copy link

Jainbrt commented Aug 1, 2024

Thanks @jsafrane yes, we are able to pull now.

@jsafrane
Copy link
Contributor

jsafrane commented Aug 2, 2024

@Jainbrt is my image working? How much did you test it?

@jsafrane
Copy link
Contributor

jsafrane commented Aug 5, 2024

@Jainbrt friendly reminder

@Jainbrt
Copy link

Jainbrt commented Aug 6, 2024

Hey @jsafrane, we are still testing this while we do see issue in static provisioning.

@jsafrane
Copy link
Contributor

jsafrane commented Aug 6, 2024

I already released a fixed version of lib-external-provisioner and updated it here, I'm cherry-picking the fix and I plan to release v5.0.2 soon-ish.

@jsafrane
Copy link
Contributor

jsafrane commented Aug 6, 2024

v5.0.2 tagged, waiting for image builds

@shrutinipane
Copy link
Author

@jsafrane I noticed that this issue persists with the above image quay.io/jsafrane/external-provisioner:test1245
Please find the attached provisioner logs:
prov_logs.txt
I tested on k8s version: v1.29.3

Is there any specific k8s version this image would work on?

@jsafrane
Copy link
Contributor

jsafrane commented Aug 6, 2024

I just tested with 1.29.3 and csi-driver-hostpath with my provisioner image, the driver got just 1 DeleteVolume call.

@jsafrane
Copy link
Contributor

jsafrane commented Aug 6, 2024

prov_lopgs.txt shows 1 delete attempt:

I0806 14:01:16.090947 1 controller.go:1258] "shouldDelete" PV="pv-vyfakylg"
I0806 14:01:16.090972 1 controller.go:1288] "shouldDelete is true" PV="pv-vyfakylg"
I0806 14:01:16.090989 1 controller.go:1132] "shouldDelete" PV="pv-vyfakylg"
I0806 14:01:16.091011 1 controller.go:1548] "Started" PV="pv-vyfakylg"
E0806 14:01:16.091195 1 controller.go:1558] "Volume deletion failed" err="persistentvolume pv-vyfakylg is still attached to node csi-kube129-1-x-worker1.fyre.ibm.com" PV="pv-vyfakylg"
I0806 14:01:16.091263 1 controller.go:1007] "Retrying syncing volume" key="pv-vyfakylg" failures=0
E0806 14:01:16.091370 1 controller.go:1025] error syncing volume "pv-vyfakylg": persistentvolume pv-vyfakylg is still attached to node csi-kube129-1-x-worker1.fyre.ibm.com

But then I can see only:

I0806 14:01:16.120583 1 controller.go:1279] "shouldDelete is false: PersistentVolumePhase is not Released" PV="pv-vyfakylg"

(which is odd... how was it Released before and now it's not?)

@Jainbrt
Copy link

Jainbrt commented Aug 6, 2024

@jsafrane we have observed this issue when we run the quick test using automation ( where deletion of pvc and pv is quick). Not sure if this is related.

@Jainbrt
Copy link

Jainbrt commented Aug 8, 2024

@jsafrane we tried with k8s 1.30.3 which does have changes from k8s too, and that works fine. Can you please help release 5.0.2 images ?

@jsafrane
Copy link
Contributor

jsafrane commented Aug 8, 2024

5.0.2 images are out, docker pull registry.k8s.io/sig-storage/csi-provisioner:v5.0.2 works for me.

@zjuliuf
Copy link

zjuliuf commented Aug 14, 2024

Still having this problem.
kubernetes-sigs/sig-storage-lib-external-provisioner#176

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants