Provisioner should not ignore AreadyExists error when saving PVs #55

jsafrane · 2019-09-12T12:17:16Z

Related to kubernetes-csi/external-provisioner#340 - when two provisioners (or in-tree and external provisioner) provision a volume, the faster one succeeds and creates a PV. When the second provisioner finishes provisioning another volume, it tries to save PV with the same name and potentially different spec and it succeeds:

kubernetes-csi/external-provisioner#340

I0912 10:58:32.508150       1 volume_store.go:147] Saving volume pvc-fc52e9df-f1ad-480c-8530-d8ed03a0d96e
I0912 10:58:32.512051       1 volume_store.go:150] Volume pvc-fc52e9df-f1ad-480c-8530-d8ed03a0d96e saved

It should not succeed, the second provisioner should see the PV already exists and if necessary, it should delete the volume in storage backend. AlreadyExists error should not be quietly ignored!

The first obvious solution is to fix kubernetes-csi/external-provisioner#340. There should be always only one provisioner that provisions a PV.

But even when it's fixed, there may be some corner cases when two provisioners could run at the same time - for example, if we had an alpha feature in the external CSI provisioner for migration, it's hard to sync the migration flags with kube-controller-manager and I can image that both in-tree and external provisioners may provision the same PVC during upgrade / downgrade / cluster reconfiguration.

Therefore, we should fix this error properly.

@davidz627 @msau42 @bertinatto @ddebroy @wongma7

The text was updated successfully, but these errors were encountered:

jsafrane · 2019-09-12T12:17:34Z

Brainstorming few solutions:

After AlreadyExists, check the PV in API server and the one that we want to save. Ignore the error if both PVs have the same VolumeSource. Delete the unsaved volume if the VolumeSource is different.
Or, each provisioner (meaning in-tree and csi-external-provisioner) saves PVs with different names (e.g. with pvc- and csi-pvc- prefix). Both PVs will be saved, Kubernetes will recognize that one of them is not necessary and initiate reclaim policy.

wongma7 · 2019-09-12T17:39:47Z

I like 2 better because it's simple ...

For 1, are there cases where comparison will say not equal even though the "volume ID" is the same. For example I see azure file refers to a 'secretname', what if this is different between in-tree and csi because they are using different credentials.

msau42 · 2019-09-13T01:32:08Z

For 2), are there any potential backwards compatibility issues if we change the way they're named?

jsafrane · 2019-09-13T10:50:56Z

I don't think there are any backward compatibility issues, unless two provisioners are running at the same time and depend on the PV naming.

On the other hand, with Retain reclaim policy we may be piling up empty PVs that were never used.

fejta-bot · 2019-12-12T11:36:21Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-01-11T12:23:01Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-02-10T13:04:40Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-02-10T13:04:48Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jsafrane mentioned this issue Sep 12, 2019

Migrated PVs are provisioned when migration is off kubernetes-csi/external-provisioner#340

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 12, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 11, 2020

k8s-ci-robot closed this as completed Feb 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provisioner should not ignore AreadyExists error when saving PVs #55

Provisioner should not ignore AreadyExists error when saving PVs #55

jsafrane commented Sep 12, 2019

jsafrane commented Sep 12, 2019

wongma7 commented Sep 12, 2019

msau42 commented Sep 13, 2019

jsafrane commented Sep 13, 2019

fejta-bot commented Dec 12, 2019

fejta-bot commented Jan 11, 2020

fejta-bot commented Feb 10, 2020

k8s-ci-robot commented Feb 10, 2020

Provisioner should not ignore AreadyExists error when saving PVs #55

Provisioner should not ignore AreadyExists error when saving PVs #55

Comments

jsafrane commented Sep 12, 2019

jsafrane commented Sep 12, 2019

wongma7 commented Sep 12, 2019

msau42 commented Sep 13, 2019

jsafrane commented Sep 13, 2019

fejta-bot commented Dec 12, 2019

fejta-bot commented Jan 11, 2020

fejta-bot commented Feb 10, 2020

k8s-ci-robot commented Feb 10, 2020