Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provisioner should not ignore AreadyExists error when saving PVs #55

Closed
jsafrane opened this issue Sep 12, 2019 · 8 comments
Closed

Provisioner should not ignore AreadyExists error when saving PVs #55

jsafrane opened this issue Sep 12, 2019 · 8 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@jsafrane
Copy link
Contributor

Related to kubernetes-csi/external-provisioner#340 - when two provisioners (or in-tree and external provisioner) provision a volume, the faster one succeeds and creates a PV. When the second provisioner finishes provisioning another volume, it tries to save PV with the same name and potentially different spec and it succeeds:

kubernetes-csi/external-provisioner#340

I0912 10:58:32.508150       1 volume_store.go:147] Saving volume pvc-fc52e9df-f1ad-480c-8530-d8ed03a0d96e
I0912 10:58:32.512051       1 volume_store.go:150] Volume pvc-fc52e9df-f1ad-480c-8530-d8ed03a0d96e saved

It should not succeed, the second provisioner should see the PV already exists and if necessary, it should delete the volume in storage backend. AlreadyExists error should not be quietly ignored!

The first obvious solution is to fix kubernetes-csi/external-provisioner#340. There should be always only one provisioner that provisions a PV.

But even when it's fixed, there may be some corner cases when two provisioners could run at the same time - for example, if we had an alpha feature in the external CSI provisioner for migration, it's hard to sync the migration flags with kube-controller-manager and I can image that both in-tree and external provisioners may provision the same PVC during upgrade / downgrade / cluster reconfiguration.

Therefore, we should fix this error properly.

@davidz627 @msau42 @bertinatto @ddebroy @wongma7

@jsafrane
Copy link
Contributor Author

Brainstorming few solutions:

  1. After AlreadyExists, check the PV in API server and the one that we want to save. Ignore the error if both PVs have the same VolumeSource. Delete the unsaved volume if the VolumeSource is different.

  2. Or, each provisioner (meaning in-tree and csi-external-provisioner) saves PVs with different names (e.g. with pvc- and csi-pvc- prefix). Both PVs will be saved, Kubernetes will recognize that one of them is not necessary and initiate reclaim policy.

@wongma7
Copy link
Contributor

wongma7 commented Sep 12, 2019

I like 2 better because it's simple ...

For 1, are there cases where comparison will say not equal even though the "volume ID" is the same. For example I see azure file refers to a 'secretname', what if this is different between in-tree and csi because they are using different credentials.

@msau42
Copy link

msau42 commented Sep 13, 2019

For 2), are there any potential backwards compatibility issues if we change the way they're named?

@jsafrane
Copy link
Contributor Author

I don't think there are any backward compatibility issues, unless two provisioners are running at the same time and depend on the PV naming.

On the other hand, with Retain reclaim policy we may be piling up empty PVs that were never used.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 12, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 11, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants