capi-controller-manager in capi-system namespace crashes after restoring the management cluster in TKG 1.2.1 #3305

McCoyAle · 2021-01-19T18:02:44Z

What steps did you take and what happened:

Steps to create issue are below. In addition, the following document was followed by another user to reproduce the same outcome (https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.2/vmware-tanzu-kubernetes-grid-12/GUID-mgmt-clusters-backup-restore-mgmt-cluster.html?hWord=N4IghgNiBcIG4FMIIE4HsQF8g):

Steps to recreate the issue:

Create a management cluster.
Set context for the management cluster
Create a workload cluster.
Install velero on the management cluster using the below command:
velero install --provider aws --plugins oss-harbor.test.local/tkg/velero/velero-plugin-for-aws:v1.1.0_vmware.1 --bucket velero --secret-file ~/velero/credentials-velero --use-restic --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://10.92.221.234:9000 --image=oss-harbor.test.local/tkg/velero/velero:v1.4.2_vmware.1
Install the vSphere Velero plugin
velero plugin add oss-harbor.test.local/tkg/velero/velero-plugin-for-vsphere:v1.0.2_vmware.1
Take a backup of the management cluster.
velero snapshot-location create vsl-vsphere --provider velero.io/vsphere
kubectl patch cluster restworker --type='merge' -p '{"spec":{"paused": true}}'
velero backup create mgmtwork --exclude-namespaces=tkg-system
Create a second management cluster to run the restore process.
Change your context to the new management cluster.
Install the Velero plugin and the plugin for vSphere (steps 4 and 5)
Restore the second management cluster, using the backup created in step 6.
velero restore create mgmt1635 --from-backup mgmtwork
kubectl patch cluster restworker --type='merge' -p '{"spec":{"paused": false}}'

What did you expect to happen:
Expect the second cluster restored in step 10 to accept the changes and all resources in a healthy state.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

kubectl logs deployment/velero -n velero
velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>

The above commands from a lab environment of the recreation are here: https://gist.github.com/McCoyAle/1f999e4fe8c5de760a528d2beee6b25b

Second environment where issue was replicated in a lab:
backup and restore commands
https://gist.github.com/McCoyAle/58e6199c3a1478a64725803230169e8e

backup and restore logs
https://gist.github.com/McCoyAle/f2e35de1f71b610d6a061126c6904ad0
https://gist.github.com/McCoyAle/3186de00ba0b915587c304e54b327e81

Anything else you would like to add:
One thing I would like to include is that the backup in the lab to recreate this issue is in a partially failed state.

Environment:

Velero version (use velero version):
Velero features (use velero client config get features):
Kubernetes version (use kubectl version):
Kubernetes installer & version:
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

👍 for "I would like to see this bug fixed as soon as possible"
👎 for "There are more important bugs to focus on right now"

The text was updated successfully, but these errors were encountered:

nrb · 2021-05-12T18:43:38Z

Closed by #3446

carlisia added the Needs investigation label Jan 25, 2021

nrb self-assigned this Jan 25, 2021

eleanor-millman added the Reviewed Q2 2021 label May 12, 2021

nrb closed this as completed May 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

capi-controller-manager in capi-system namespace crashes after restoring the management cluster in TKG 1.2.1 #3305

capi-controller-manager in capi-system namespace crashes after restoring the management cluster in TKG 1.2.1 #3305

McCoyAle commented Jan 19, 2021 •

edited

Loading

nrb commented May 12, 2021

capi-controller-manager in capi-system namespace crashes after restoring the management cluster in TKG 1.2.1 #3305

capi-controller-manager in capi-system namespace crashes after restoring the management cluster in TKG 1.2.1 #3305

Comments

McCoyAle commented Jan 19, 2021 • edited Loading

nrb commented May 12, 2021

McCoyAle commented Jan 19, 2021 •

edited

Loading