-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
capi-controller-manager crashes after a velero backup/restore #4105
Comments
Just a note for anybody taking this. Velero cannot restore |
My understanding of the Issue is that:
A first workaround of the issue is to instruct velero to restore clusters before any other CAP* object. A more long term solution is to make ClusterResourceSetBinding controller tolerant to this condition, assuming that this is temporary and everything will be fixed by velero as soon as the restore completes @yastij @nrb please chime in if; most specificaly I have some doubts WRT to the assumption in the latest sentences |
To add to what @fabriziopandini mentioned above, all |
/milestone Next |
@fabriziopandini Your understanding matches mine. My expectation would be that the ClusterResourceSetBinding controller would at least retry for a Cluster ownerRef before failing completely, and failure wouldn't be a panic. The source of the issue from the Velero side is Velero doesn't have knowledge of operator object graphs other than some included Kubernetes resources. Instead, Velero attempts to restore API groups in alphabetical order, which does not always align with the object graph. We do have an argument on the Velero server called We're looking at re-architecting the logic to allow for object graphs, but that likely will be many releases out. To @vincepri's point, yes, Velero never restores the |
@nrb Do you want us to keep this issue open, or track it on Velero's side? |
Closing this for now, given that it seems something that should be solved from Velero side. Feel free to reopen if necessary. /close |
@vincepri: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@vincepri Are users of CAPI expected to only use CAPI tools to manipulate clusters? Because this would be an issue with Velero can work around it and restore in the proper order, but this is not something we're going to be able to do for every operator/controller. We're not going to be able to keep track of operators as they change, either. |
What steps did you take and what happened:
After doing a velero backup/restore of a cluster, the capi-controller-manager crashes with the following logs:
This happens here
cluster-api/exp/addons/controllers/clusterresourcesetbinding_controller.go
Lines 75 to 81 in e851972
where
GetOwnerCluster
would returnnil,nil
when no cluster ownerRef is found. leading tocluster
being nilWhat did you expect to happen:
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
a74685ee93cd453a435e4f999526c606948bbf73
(this is valid for the main branch too)kubectl version
):/etc/os-release
):/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]
The text was updated successfully, but these errors were encountered: