[QUESTION]: Persistent volume not getting detached after pod deletion #107

rajendraindukuri · 2021-11-25T05:21:01Z

Reported by customer :
I have a problem with csi-unity - once a PersistentVolume is attached to a node, it does not get detached after the pod using it is deleted
later on, when a different pod referring to that PV via PersistentVolumeClaim gets scheduled to a different node, CSI driver creates new VolumeAttachment which cannot be provisioned: Volume has been published to a different host already.

The text was updated successfully, but these errors were encountered:

karthikk92 · 2021-11-25T05:26:05Z

could you please let us know the bellow details

environment info
driver version
installed via helm/operator ? its version ?
k8s/openshift version
steps to reproduce this issue

mkej · 2021-11-25T08:12:58Z

Deployed on a VM-based Kubernetes cluster, each VM runs Ubuntu 20.04 with 5.11 kernel.
Observed on v1.6.0 and v2.6.0
Installed via Helm, using dell-csi-helm-installer/csi-install.sh script.
K8s cluster installed using kubeadm, issues observed on v1.21.3 and v1.22.4 (same cluster, upgraded)
Steps to reproduce:
Apply these resources to cluster:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: volume-test
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: unity-iscsi
  volumeMode: Filesystem
---
apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: volume-test
  labels:
    app: volume-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: volume-test
  template:
    metadata:
      labels:
        app: volume-test
    spec:
      containers:
      - name: nginx
        image: nginx
        volumeMounts:
        - mountPath: /usr/share/nginx/html
          name: volume-test
      volumes:
      - name: volume-test
        persistentVolumeClaim:
          claimName: volume-test

This will create PersistentVolume and a VolumeAttachment for it and a specific node.

In order to force rescheduling pod to a different node, add spec.template.spec.nodeName: some-different-node to the ReplicaSet and delete existing pod.

Another VolumeAttachment will be created, with the previous one still in place, with status.attachError.message = 'rpc error: code = Aborted desc = runid=7652 Volume has been published to a different host already.'. The previous attachement is still in place and active. This will keep the Pod stuck with AttachVolume.Attach failed error from attachdetach-controller.

mkej · 2021-11-25T08:39:34Z

Also, when the above ReplicaSet and PersistentVolumeClaim are deleted, the PersistentVolume is not deleted, even though it has Delete reclaim policy. It is stuck in Released status, with both VolumeAttachments still existing. When I manually delete VolumeAttachments the Host access for the original node gets removed from the block LUN on Unity, but PV is still there on the Kubernetes cluster. If I delete PV manually once attachements are gone, the PV does get deleted from the cluster, but related block LUN is still there on Unity.

This seems similar to issue described on https://www.dell.com/community/Containers/csi-unity-volumeattachments-pv-will-not-detach-and-delete-LUN-if/td-p/7847221, but "tear down the whole environment (nodes and all) and rebuild" sounds horrible and I would love to find a different solution.

karthikk92 · 2021-11-25T10:43:18Z

could you please share the cli id from unity console. It will be of format "sv_XXXX" and also please attach the complete controller driver and node logs with enabled debug logs

and also PVC, PV and volumeattachment of pod

karthikk92 · 2021-12-03T06:54:39Z

We have seen similar issue earlier when length of prefix+cliid is less than 15.

i.e
When total number of characters in volume-name-prefix + CLI-ID is 15 characters or more, then the volumeattachment issue goes away as mentioned in the github issue( for operator put this entry in sample files and if installation via helm , please update with values.yaml file)

So we recommend the customer to have longer prefix names, say for example "csiunitystorageclass" as a workaround.

There is a [PR |https://github.com/kubernetes/kubernetes/pull/106557] raised in kubernetes to fix the issue.

Hope this will resolve the issue. Please let us know if you are still facing the issue and will debug further

Thanks and Regards,
Karthik K

mkej · 2021-12-03T10:49:53Z

@karthikk92 thank you so much for your suggestion! I've changed volume name prefix to be longer - it was 7 characters long before, it is 18 chars long now. The scenario I described above, which was consistently failing before, now works like a charm. At most one VolumeAttachment is now present for a PersistentVolume at any given time. When deleting pod, VolumeAttachment is deleted as well and a new one is created when the pod is scheduled again, possibly on different node.

mkej · 2021-12-03T11:01:36Z

By the way, isn't default csi-unity configuration affected by this as well? The default prefix is csivol, which is even shorter than mine was. Or maybe I was unlucky enough to hit a particular problematic length range.

karthikk92 · 2021-12-03T11:04:30Z

Once the fix PR by kubernetes is merged(kubernetes/kubernetes#106557), this will be taken care. SO till than we recommend the customer to have longer prefix names.

karthikk92 · 2021-12-03T11:07:10Z

Since the issue resolved , will be closing this issue. Thanks @mkej

* [replication] Added upgrade page and updated install info (#57) * Added note about repctl logs file * Added upgrade instructions for both controller and sidecar * modified installation\upgrade section * Fixed couple of grammar mistakes * Added new entry to troubleshooting page * Addressed review comments * Changed link address Co-authored-by: Maxim Sklyarov <[email protected]> * Update deployment steps for CSM Authorization (#58) * begin updating deployment * fixed typos * add auth upgrade doc * updated powerscale with authorization * updated authorization documentation for powermax, powerflex, and powerscale * refactored for powermax * added vxflexos related docs for auth deployment and configuration * consolidated proxy server root cert * fix grammar, notes, value.yaml parameters, update auth deployment * added note for driver configurations with auth * updated note * add auth note to drivers * update upgrade path Co-authored-by: atye <[email protected]> Co-authored-by: sharmilarama <[email protected]> Co-authored-by: Logan Jones <[email protected]> * Fix operator install docs (#62) * Small update to the contributing doc (#54) * Update _index.md * Update _index.md * fixed sidecar instructions * Update _index.md * making changes requested by Aron * trying to get rid of unwanted changes Co-authored-by: gallacher <[email protected]> * add Volume Health Monitor section (#67) * add Volume Health Monitor section * PR feedback * pv/pvc metrics csi-powerstore changes (#64) * Added troubleshooting documentation about gateway timeout for authorization (#63) * Upgrade and Rollback Support for CSM for Authorization proxy server (#66) * added auth upgrade and rollback, updated auth notes for drivers * fixed spacing * [replication] Added uninstall page, updated repctl readme (#70) * static provisioning and ephemeral changes (#71) * Update uninstall.md * updated auuth deployment steps (#72) * add healthMonitorInterval to values table (#79) * Helm install update (#74) * updating helm install instructions * adding troubleshooting for helm update * minor changes and updates * more minor changes * word change * more minor changes * addressing comments from Jacob * fixing numbers * update code owners (#76) * Move health monitor section to correct file (#81) * update correct file * remove feature from wrong file * Removed older OpenShift and added new driver versions (#84) * Feature rwop csi powerstore (#89) * Documentation for RWOP - CSI Powerstore * Addressed review comment * Update powerstore.md Co-authored-by: shanmydell <[email protected]> * Feature rwop accessmode support for csi-powerscale (#90) Co-authored-by: shanmydell <[email protected]> * Tenant documentation for both csi-unity and operator (#85) Co-authored-by: shanmydell <[email protected]> * Replication prerequisites & troubleshooting (#93) Co-authored-by: shanmydell <[email protected]> * Feature/pvc metrics csi powerstore update (#91) * volume health monitoring update (#92) * volume health monitoring update * Update powerscale.md * update documentation for health monitoring Co-authored-by: shanmydell <[email protected]> Co-authored-by: Randeep Sharma <[email protected]> Co-authored-by: Bahubali Jain <[email protected]> * Changed replication support matrix (#94) * Changed replication support matrix * Changed to X * Add health values (#95) * add new values to values table * Add note to features section * fix typo * Common changes (#86) * Unity - RWOP Access Mode and Volume Health Monitoring (#77) * RWOP support matrix change (#96) * Added known issue for unity (#97) * Update powerflex.md (#98) * powerscale release notes updated (#99) * Operator Docs changes related to Unity features (#102) * Operator upgrade documentation for volume health monitor changes (#104) * Added note about how to list volume snapshots (#101) * restructured deployment docs (#106) * Improve operator install steps (#107) * Update versions (#100) * Added note that clarifies keys for csm installer (#108) * Added volume health monitor in CSI spec support (#109) * updated sample update for topology usage (#112) #82 Co-authored-by: Andrey Schipilo <[email protected]> Co-authored-by: Maxim Sklyarov <[email protected]> Co-authored-by: shaynafinocchiaro <[email protected]> Co-authored-by: atye <[email protected]> Co-authored-by: sharmilarama <[email protected]> Co-authored-by: Logan Jones <[email protected]> Co-authored-by: Jooseppi Luna <[email protected]> Co-authored-by: JacobGros <[email protected]> Co-authored-by: Ashish Verma <[email protected]> Co-authored-by: Trevor Dawe <[email protected]> Co-authored-by: gilltaran <[email protected]> Co-authored-by: hoppea2 <[email protected]> Co-authored-by: Francis Nijay <[email protected]> Co-authored-by: shanmydell <[email protected]> Co-authored-by: Bahubali Jain <[email protected]> Co-authored-by: karthikk92 <[email protected]> Co-authored-by: Sakshi-dell <[email protected]> Co-authored-by: Randeep Sharma <[email protected]> Co-authored-by: Bahubali Jain <[email protected]> Co-authored-by: rensyct <[email protected]> Co-authored-by: Rajendra Indukuri <[email protected]> Co-authored-by: abhi16394 <[email protected]> Co-authored-by: panigs7 <[email protected]> Co-authored-by: Prasanna M <[email protected]>

rajendraindukuri added type/question Ask a question. This is the default label associated with a question issue. help wanted Request for help from the community area/csi-unity Issue pertains to the CSI Driver for Dell EMC Unity labels Nov 25, 2021

rajendraindukuri assigned karthikk92 Nov 25, 2021

karthikk92 closed this as completed Dec 3, 2021

shanmydell removed the help wanted Request for help from the community label Feb 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION]: Persistent volume not getting detached after pod deletion #107

[QUESTION]: Persistent volume not getting detached after pod deletion #107

rajendraindukuri commented Nov 25, 2021

karthikk92 commented Nov 25, 2021 •

edited

Loading

mkej commented Nov 25, 2021

mkej commented Nov 25, 2021

karthikk92 commented Nov 25, 2021 •

edited

Loading

karthikk92 commented Dec 3, 2021

mkej commented Dec 3, 2021

mkej commented Dec 3, 2021

karthikk92 commented Dec 3, 2021 •

edited

Loading

karthikk92 commented Dec 3, 2021

[QUESTION]: Persistent volume not getting detached after pod deletion #107

[QUESTION]: Persistent volume not getting detached after pod deletion #107

Comments

rajendraindukuri commented Nov 25, 2021

karthikk92 commented Nov 25, 2021 • edited Loading

mkej commented Nov 25, 2021

mkej commented Nov 25, 2021

karthikk92 commented Nov 25, 2021 • edited Loading

karthikk92 commented Dec 3, 2021

mkej commented Dec 3, 2021

mkej commented Dec 3, 2021

karthikk92 commented Dec 3, 2021 • edited Loading

karthikk92 commented Dec 3, 2021

karthikk92 commented Nov 25, 2021 •

edited

Loading

karthikk92 commented Nov 25, 2021 •

edited

Loading

karthikk92 commented Dec 3, 2021 •

edited

Loading