Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unremoved RBD volume after removing PVC with snapshot #227

Closed
rollandf opened this issue Feb 25, 2019 · 15 comments
Closed

Unremoved RBD volume after removing PVC with snapshot #227

rollandf opened this issue Feb 25, 2019 · 15 comments

Comments

@rollandf
Copy link
Contributor

rollandf commented Feb 25, 2019

Describe the bug
Unremoved RBD volume after removing PVC with snapshot

Kubernetes and Ceph CSI Versions
Kubernetes: 1.13
Ceph CSI: 1.0.0

Ceph CSI Driver logs
Post logs from rbdplugin/cephfs plugin, provisioner, etc.

To Reproduce
From examples/rbd dir: (with Block mode for all PVCs)

  1. Create PVC
  2. Create VolumeSnapshot from PVC
  3. Create PVC from VolumeSnapshot
  4. Delete PVC from step 1 ( Volume still present in Ceph Server after deleting from k8s)
  5. Delete PVC from step 2
  6. Delete VolumeSnapshot
  7. RBD volume of PVC from step 1 is still present in the Ceph Server

Expected behavior
The RBD volume from step 1 should be deleted.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 25, 2019

@rollandf can you provide logs from the provisioner and node plugin.

@rollandf
Copy link
Contributor Author

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 27, 2019

From logs

2019-02-25 16:07:55.749071 7f05427fc700 -1 librbd::image::RemoveRequest: 0x561600948920 check_image_snaps: image has snapshots - not removing

Removing image: 0% complete...failed.
rbd: image has snapshots - these must be deleted with 'rbd snap purge' before the image can be removed.
I0225 16:07:55.803066       1 controllerserver.go:269] failed to delete rbd image: volumes/pvc-4766b679-3917-11e9-8105-52540060cf1f with error: exit status 39
I0225 16:07:55.803079       1 hashed.go:54] hashedKeyMutex.UnlockKey(...) called for id "csi-rbd-vol-4d7f656b-3917-11e9-b4f7-0a580af40060"
I0225 16:07:55.803088       1 hashed.go:56] hashedKeyMutex.UnlockKey(...) for id "csi-rbd-vol-4d7f656b-3917-11e9-b4f7-0a580af40060" completed.

if a snapshot exists for a volume , i think we cannot delete volume.
@rootfs @gman0 any pointer to fix this issue?

@rollandf
Copy link
Contributor Author

Maybe we should add a validation that a PVC with snapshots cannot be deleted as long as the snapshots exist?

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 27, 2019

@rollandf in current situation PVC is getting deleted but PV will be in terminating state.
as you suggested we can fail the precheck and return FAILED_PRECONDITION as suggested in spec
need to test this out.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 27, 2019

#70

@rootfs
Copy link
Member

rootfs commented Feb 27, 2019

2019-02-25 16:07:55.749071 7f05427fc700 -1 librbd::image::RemoveRequest: 0x561600948920 check_image_snaps: image has snapshots - not removing

Removing image: 0% complete...failed.
rbd: image has snapshots - these must be deleted with 'rbd snap purge' before the image can be removed.
I0225 16:07:55.803066 1 controllerserver.go:269] failed to delete rbd image: volumes/pvc-4766b679-3917-11e9-8105-52540060cf1f with error: exit status 39

status 39 is ENOEMPTY

#define ENOTEMPTY       39      /* Directory not empty */

So rbd is doing the right thing but our driver is not following the process.

@rootfs
Copy link
Member

rootfs commented Feb 27, 2019

looks flatten is the way to go to decouple image and snapshot
http://docs.ceph.com/docs/giant/rbd/rbd-snapshot/#flattening-a-cloned-image

@j-griffith
Copy link

j-griffith commented Mar 3, 2019

In the case of Snapshots I think it's ok to keep the linking in place and not take the time penalty on flatten; along those lines attempts to delete the parent volume with existing snapshots should fail. For clones, or create from snapshot however we should definitely use flatten, resulting in an independent volume object.

I'd be happy to work on this if @Madhu-1 or @rollandf aren't interested.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Mar 4, 2019

@j-griffith Thank you, please do send a PR.

@j-griffith
Copy link

/assign j-griffith

@dillaman
Copy link

One of the design goals for the RBD trash was to handle situations like this. Removing the RBD image should actually run rbd trash mv to move the image to the trash. Images with snapshots and clones can be moved to the trash without issue. Upon removal of a snapshot or removal of a cloned PV, you can attempt to remove the (parent) image from the trash and if it fails, it just means it's still in-use and will be cleaned up in the future when all remaining snapshots and clones are deleted.

The one immediate issues w/ this is that krbd won't currently allow you to open images that are in the trash (if trying to map @ into a pod). This can be worked around by always creating a new clone for volume snapshots and use that cloned image for all snapshot volume operations.

@ShyamsundarR
Copy link
Contributor

@rollandf in current situation PVC is getting deleted but PV will be in terminating state.
as you suggested we can fail the precheck and return FAILED_PRECONDITION as suggested in spec
need to test this out.

Just pointing out that text in the spec has been added to address this condition as well, see here. We could now return pre-condition failures if snapshots exist for a given volume, and hence cannot be deleted.

@dillaman given this, what are you thoughts on deleting images with snapshots?

@dillaman
Copy link

If the natural representation within k8s is that snapshots can have independent lifetimes from the volume they were created from, personally, I like the idea of creating a brand new clone for each snapshot so that they are also represented as first-class citizens within RBD.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Jul 16, 2019

closing as this is the duplicate of #70

@Madhu-1 Madhu-1 closed this as completed Jul 16, 2019
nixpanic pushed a commit to nixpanic/ceph-csi that referenced this issue Mar 4, 2024
Syncing latest changes from upstream devel for ceph-csi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants