Fix Edge cases for already exists resources #5223

shubham-pampattiwar · 2022-08-16T15:37:55Z

What steps did you take and what happened:
[A clear and concise description of what the bug is, and what commands you ran.)
On OpenShift 4.11 (Kubernetes 1.24) we are seeing an error(not warning) related to the restoration of volumesnapshotclass that it already exists, not reproducilbe on earlier versions. Restore fails with the following logs, This is added as an error by the restore controller because the create call itself is incomplete due to admission webhooks validation, had the create call executed and Velero had received an IsalreadyExists this would have been a warning.

time="2022-07-14T14:34:44Z" level=error msg="error restoring example-snapclass: admission webhook \"volumesnapshotclasses.snapshot.storage.k8s.io\" denied the request: default snapshot class: example-snapclass already exits for driver: pd.csi.storage.gke.io" logSource="pkg/restore/restore.go:1357" restore=openshift-adp/mysql-42edcb0a-0381-11ed-9e9d-902e163f806c
time="2022-07-14T14:34:45Z" level=error msg="Cluster resource restore error: error restoring volumesnapshotclasses.snapshot.storage.k8s.io/example-snapclass: admission webhook \"volumesnapshotclasses.snapshot.storage.k8s.io\" denied the request: default snapshot class: example-snapclass already exits for driver: pd.csi.storage.gke.io" logSource="pkg/controller/restore_controller.go:500" restore=openshift-adp/mysql-42edcb0a-0381-11ed-9e9d-902e163f806c

What did you expect to happen:
Restore should complete without errors.

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

kubectl logs deployment/velero -n velero
velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Velero version (use velero version):
Velero features (use velero client config get features):
Kubernetes version (use kubectl version):
Kubernetes installer & version:
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

👍 for "I would like to see this bug fixed as soon as possible"
👎 for "There are more important bugs to focus on right now"

The text was updated successfully, but these errors were encountered:

reasonerjt · 2022-08-22T05:18:05Z

@shubham-pampattiwar

by not reproducilbe on earlier versions do you mean not reproducible on earlier versions of openshift?
If this was caused by the addmisionwebhook, what do you think we can fix in velero? If we always check the existence of resources upon seeing ANY error, it will cause other problems.

Would we suggest the maintainer of this admissionwebhook return a more explicit 409?

sseago · 2022-08-22T13:30:00Z

@reasonerjt Not reproducible in that the specific conditions that triggered the error for this particular resource type aren't there in earlier versions. But the problem is a more generic one. I know I've seen this manifest itself in the past when working on unrelated code which checked for IsAlreadyExistError as a means of determining whether something existed before.

What other problems are you referring to? We have tested a fix internally that should resolve this which I think @shubham-pampattiwar will be submitting as a PR soon. Basically if the Create fails with an error other than IsAlreadyExistsError, we do a Get call to see if it actually exists. If it does, treat it as IsAlreadyExistsError. If not, add the error to the restore like we currently do. What problems are you seeing with this approach?

shubham-pampattiwar · 2022-08-22T13:57:45Z

@sseago +1, @reasonerjt please take a look at this proposed PR as a fix for this issue #5239

blackpiglet · 2022-08-23T03:22:44Z

Which adminssion webhook triggered this error?
If the webhook is maintained by ourself, it's better to return a more self-explained error message or error code.

shubham-pampattiwar · 2022-08-23T14:16:29Z

@blackpiglet This was triggered by volumesnapshotclass admission webhook I believe.

sseago · 2022-08-23T14:28:38Z

@blackpiglet The issue is that the admission webhook validation happens before the normal create call, which means we don't get an IsAlreadyExists error. In this case, the resource already exists, so we don't want a more self-explained error message. We don't want the error message at all. We want to recognize that the resource is already in the cluster (just as we do when we get an IsAlreadyExists error), and then we can apply our existing resource policy and either patch it or warn (if the object is different than the obj from backup), or do nothing (if the object is the same as in the backup).

shubham-pampattiwar self-assigned this Aug 16, 2022

reasonerjt added the Needs info Waiting for information label Aug 22, 2022

shubham-pampattiwar mentioned this issue Aug 22, 2022

Fix edge cases for already exists resources #5239

Merged

3 tasks

reasonerjt closed this as completed in #5239 Aug 31, 2022

blackpiglet added the 1.9.2-candidate label Sep 1, 2022

qiuming-best removed the 1.9.2-candidate label Sep 5, 2022

qiuming-best mentioned this issue Sep 7, 2022

fix edge cases for already exists resources #5304

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Edge cases for already exists resources #5223

Fix Edge cases for already exists resources #5223

shubham-pampattiwar commented Aug 16, 2022

reasonerjt commented Aug 22, 2022 •

edited

Loading

sseago commented Aug 22, 2022

shubham-pampattiwar commented Aug 22, 2022

blackpiglet commented Aug 23, 2022

shubham-pampattiwar commented Aug 23, 2022

sseago commented Aug 23, 2022

Fix Edge cases for already exists resources #5223

Fix Edge cases for already exists resources #5223

Comments

shubham-pampattiwar commented Aug 16, 2022

reasonerjt commented Aug 22, 2022 • edited Loading

sseago commented Aug 22, 2022

shubham-pampattiwar commented Aug 22, 2022

blackpiglet commented Aug 23, 2022

shubham-pampattiwar commented Aug 23, 2022

sseago commented Aug 23, 2022

reasonerjt commented Aug 22, 2022 •

edited

Loading