Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

capi-controller-manager in capi-system namespace crashes after restoring the management cluster in TKG 1.2.1 #3305

Closed
McCoyAle opened this issue Jan 19, 2021 · 1 comment

Comments

@McCoyAle
Copy link

McCoyAle commented Jan 19, 2021

What steps did you take and what happened:

Steps to create issue are below. In addition, the following document was followed by another user to reproduce the same outcome (https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.2/vmware-tanzu-kubernetes-grid-12/GUID-mgmt-clusters-backup-restore-mgmt-cluster.html?hWord=N4IghgNiBcIG4FMIIE4HsQF8g):

Steps to recreate the issue:

  1. Create a management cluster.
  2. Set context for the management cluster
  3. Create a workload cluster.
  4. Install velero on the management cluster using the below command:
    velero install --provider aws --plugins oss-harbor.test.local/tkg/velero/velero-plugin-for-aws:v1.1.0_vmware.1 --bucket velero --secret-file ~/velero/credentials-velero --use-restic --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://10.92.221.234:9000 --image=oss-harbor.test.local/tkg/velero/velero:v1.4.2_vmware.1
  5. Install the vSphere Velero plugin
    velero plugin add oss-harbor.test.local/tkg/velero/velero-plugin-for-vsphere:v1.0.2_vmware.1
  6. Take a backup of the management cluster.
    velero snapshot-location create vsl-vsphere --provider velero.io/vsphere
    kubectl patch cluster restworker --type='merge' -p '{"spec":{"paused": true}}'
    velero backup create mgmtwork --exclude-namespaces=tkg-system
  7. Create a second management cluster to run the restore process.
  8. Change your context to the new management cluster.
  9. Install the Velero plugin and the plugin for vSphere (steps 4 and 5)
  10. Restore the second management cluster, using the backup created in step 6.
    velero restore create mgmt1635 --from-backup mgmtwork
    kubectl patch cluster restworker --type='merge' -p '{"spec":{"paused": false}}'

What did you expect to happen:
Expect the second cluster restored in step 10 to accept the changes and all resources in a healthy state.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

The above commands from a lab environment of the recreation are here: https://gist.github.com/McCoyAle/1f999e4fe8c5de760a528d2beee6b25b

Second environment where issue was replicated in a lab:
backup and restore commands
https://gist.github.com/McCoyAle/58e6199c3a1478a64725803230169e8e

backup and restore logs
https://gist.github.com/McCoyAle/f2e35de1f71b610d6a061126c6904ad0
https://gist.github.com/McCoyAle/3186de00ba0b915587c304e54b327e81

Anything else you would like to add:
One thing I would like to include is that the backup in the lab to recreate this issue is in a partially failed state.

Environment:

  • Velero version (use velero version):
  • Velero features (use velero client config get features):
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@nrb
Copy link
Contributor

nrb commented May 12, 2021

Closed by #3446

@nrb nrb closed this as completed May 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants