Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip forbidden error for restore priorityClass #5292

Closed

Conversation

cleverhu
Copy link
Contributor

@cleverhu cleverhu commented Sep 6, 2022

Thank you for contributing to Velero!

Please add a summary of your change

skip forbidden error for restore priorityClass

Does your change fix a particular issue?

Fixes #5288

Please indicate you've done the following:

  • Accepted the DCO. Commits without the DCO will delay acceptance.
  • Created a changelog file or added /kind changelog-not-required as a comment on this pull request.
  • Updated the corresponding documentation in site/content/docs/main.

@cleverhu cleverhu force-pushed the skip-priority-class-already-error branch from 7bcecb3 to eb86804 Compare September 6, 2022 12:58
@sseago
Copy link
Collaborator

sseago commented Sep 6, 2022

First, to clarify the error you got -- the priorityClass you're trying to restore that fails -- is there already a priorityClass in the cluster with the same name? If so, then this is a case where Velero should normally just skip over it, as the resource is already in the cluster (warning, but not erroring out, if the in-cluster version is not identical to the backup verson). If that's true, then we already have a fix merged from #5239 -- which will be in 1.10 and is also targeted for 1.9.2.

The issue is that Velero uses the error return (looking for already exists error) to determine whether the reason for failure is that the resource is already in the cluster, and any other error type would normally indicate a different failure that should fail a restore. The problem is that in some cases, there is other ingress validation which is failing before checking whether the resource already exists, so Velero is working with incomplete information when determining whether to error out or not. The PR linked above fixes this for all resource types, not just priority classes.

If the above description of the problem is correct, we don't need this PR as it's already been fixed. Also, this fix introduces a different bug. If you're trying to restore a completely different priority class (with a different name), then erroring out is the correct thing to do, since the user is trying to create a second default priority class, which is not allowed by cluster rules. Also, this PR would discard any error on priority class restore and not just "too many default classes" errors.

@cleverhu
Copy link
Contributor Author

cleverhu commented Sep 6, 2022

First, to clarify the error you got -- the priorityClass you're trying to restore that fails -- is there already a priorityClass in the cluster with the same name? If so, then this is a case where Velero should normally just skip over it, as the resource is already in the cluster (warning, but not erroring out, if the in-cluster version is not identical to the backup verson). If that's true, then we already have a fix merged from #5239 -- which will be in 1.10 and is also targeted for 1.9.2.

The issue is that Velero uses the error return (looking for already exists error) to determine whether the reason for failure is that the resource is already in the cluster, and any other error type would normally indicate a different failure that should fail a restore. The problem is that in some cases, there is other ingress validation which is failing before checking whether the resource already exists, so Velero is working with incomplete information when determining whether to error out or not. The PR linked above fixes this for all resource types, not just priority classes.

If the above description of the problem is correct, we don't need this PR as it's already been fixed. Also, this fix introduces a different bug. If you're trying to restore a completely different priority class (with a different name), then erroring out is the correct thing to do, since the user is trying to create a second default priority class, which is not allowed by cluster rules. Also, this PR would discard any error on priority class restore and not just "too many default classes" errors.

Thanks for your advice, I also think that it is not reasonable to add many priority errors, which is inconvenient for later program changes. However, this will not introduce new bugs. The 'restore a completely different priority class (with a different name)' mentioned above will lead to data loss. The program verifies whether the resource exists at the end. I checked the PR integrated above, and I found that the previous priority detection function has completely lost its function. I will propose a PR for everyone to discuss whether it is equivalent to the previous program.

@cleverhu
Copy link
Contributor Author

cleverhu commented Sep 6, 2022

The implementation is not friendly, so it needs to be closed. It has been handed over to another PR: #5293

@cleverhu cleverhu closed this Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Velero restore partially failed as default priorityClass
2 participants