Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to restore persistenvolume and persistent volume claims using velero on AKS #5741

Closed
aloknagaonkar opened this issue Jan 5, 2023 · 15 comments
Assignees
Labels
Area/Cloud/Azure Area/CSI Related to Container Storage Interface support kind/requirement

Comments

@aloknagaonkar
Copy link

aloknagaonkar commented Jan 5, 2023

What steps did you take and what happened:
velero install

helm upgrade --install velero vmware-tanzu/velero --namespace velero --set-file credentials.secretContents.cloud=./cre
dentials-velero --set configuration.provider=azure --set configuration.backupStorageLocation.name=azure --set configuration.backupStorageLocation.bucket=
backup --set configuration.backupStorageLocation.config.resourceGroup=xxxx --set configuration.volumeSnapshotLocation.config.subscriptionId=xxxx --set c
onfiguration.backupStorageLocation.config.subscriptionId=xxx --set configuration.backupStorageLocation.config.storageAccount=xxxxxx --set snapshotsEnable
d=true --set deployNodeAgent=true --set configuration.volumeSnapshotLocation.name=azure --set configuration.features=EnableCSI --set image.repository=vel
ero/velero --set image.pullPolicy=Always --set configuration.volumeSnapshotLocation.config.resourceGroup=app-network --set configuration.volumeSnapshotLo
cation.config.snapshotLocation="East US" -f custom-values.yaml

custom-values,yaml :

initContainers:

  • name: velero-plugin-for-azure
    image: velero/velero-plugin-for-microsoft-azure:v1.6.0
    imagePullPolicy: IfNotPresent
    volumeMounts:
    • mountPath: /target
      name: plugins
  • name: velero-plugin-for-csi
    image: velero/velero-plugin-for-csi:v0.3.2
    imagePullPolicy: IfNotPresent
    volumeMounts:
    • mountPath: /target
      name: plugins

Extra K8s manifests to deploy

extraObjects:

  • apiVersion: snapshot.storage.k8s.io/v1
    kind: VolumeSnapshotClass
    metadata:
    name: csi-azuredisk-vsc
    labels:
    velero.io/csi-volumesnapshot-class: "true"
    driver: disk.csi.azure.com
    deletionPolicy: Retain
    parameters:
    resourcegroup: app-network

we are taking backup using velero

./velero backup create mysql1 --include-namespaces velero-test --volume-snapshot-locations azure --storage-location azure
and restoring as

./velero restore create mysql-restore2 --from-backup mysql1 --include-resources pvc,pv

while restoring getting below error

time="2023-01-05T13:39:39Z" level=error msg="Namespace velero-test, resource restore error: error preparing persistentvolumeclaims/velero-test/pvc-mysql:
rpc error: code = Unknown desc = Failed to get Volumesnapshot velero-test/velero-pvc-mysql-mfffx to restore PVC velero-test/pvc-mysql: volumesnapshots.s
bundle-2023-01-05-20-01-05.zip

napshot.storage.k8s.io "velero-pvc-mysql-mfffx" not found" logSource="pkg/controller/restore_controller.go:531" restore=velero/mysql-restore2
time="2023-01-05T13:39:39Z" level=info msg="restore completed" logSource="pkg/controller/restore_controller.go:545" restore=velero/mysql-restore2

What did you expect to happen:

I am expecting restore to work and pv and pvc should get created

The following information will help us better understand what's going on:

If you are using velero v1.7.0+: updated
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero

  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml

  • velero backup logs <backupname>

  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml

  • velero restore logs <restorename>

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

WARNING: the client version does not match the server version. Please update client

  • Velero features (use velero client config get features):

  • features:

  • Kubernetes version (use kubectl version):

  • Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", Bui
    ldDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.12", GitCommit:"f941a31f4515c5ac03f5fc7ccf9a330e3510b80d", GitTreeState:"clean", Bu
    ildDate:"2022-11-09T17:12:33Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes installer & version:

  • Cloud provider or hardware configuration: AKS

  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@allenxu404
Copy link
Contributor

allenxu404 commented Jan 6, 2023

As per the error message in restore log file, when executing restore item action, Velero fails to get Volumesnapshot to backup the corresponding pvc.
Need to further triage this issue.

@reasonerjt reasonerjt added Area/CSI Related to Container Storage Interface support Area/Cloud/Azure labels Jan 9, 2023
@blackpiglet
Copy link
Contributor

Hi @aloknagaonkar,

I think the failure is due to the restore setting.

./velero restore create mysql-restore2 --from-backup mysql1 --include-resources pvc,pv

Because when using CSI plugin to do the B/R, some CSI specific k8s resources are also created, e.g. VolumeSnapshot, VolumeSnapshotContent, and Velero needs the VolumeSnapshot and VolumeSnapshotContent to be the data source of creating PVC and PV.

Since the restore command only contains the PV and PVC as the resources to restore, VolumeSnapshot and VolumeSnapshotContent were not restored, and that triggered the error.

I suggest to remove the --include-resources parameter from the restore command, then have another try.

@sseago
Copy link
Collaborator

sseago commented Jan 9, 2023

@blackpiglet We recently added a feature to plugins to allow them to force including other resource types via an annotation. Maybe we should update the CSI plugin to use this new feature.

@blackpiglet
Copy link
Contributor

@sseago
I agree.
Need to deliver similar work in #5429 and vmware-tanzu/velero-plugin-for-csi#123 to RestoreItemAction.

@aloknagaonkar
Copy link
Author

hi Team

can you help us with the standard /working configuration to take volume snaphot fo mongodb and restore in the same namespace or another namespace
we are using azure cloud

@blackpiglet
Copy link
Contributor

Taking snapshots of volume is the default configuration of Velero.
Take this installation command as an example.

velero install \
    --provider azure \
    --bucket <bucket-name> \
    --secret-file ~/Documents/credentials-velero-azure \
    --image velero/velero:v1.10.0 \
    --plugins velero/velero-plugin-for-azure:v1.6.0 \
    --backup-location-config resourceGroup=<rg-name>,storageAccount=<account>,subscriptionId=<Sub-ID>

If MongoDB is installed in the mongo namespace, the backup command should be
velero backup create mongo-test --include-namespaces=mongo --wait

If preferring to restore the same namespace, suggest deleting the mongo namespace first, because Velero will skip the existing resources during restore.
If preferring to restore into another namespace, for example, mongo-1, the restore command should be
velero restore create --from-backup mongo-test --namespace-mappings mongo=mongo1 --wait.

@aloknagaonkar
Copy link
Author

What are the Azure disk type supported? if we have mongodb provisioned with Provisioner: kubernetes.io/azure-disk (Parameters: cachingmode=None,kind=Managed,location=eastus,storageaccounttype=StandardSSD_LRS,zoned=true).
can velero plugin for azure take volumesnapshot for above? what is the best practice
or
Do we need to use CSI to provision our disk

@blackpiglet
Copy link
Contributor

I don't think you need the CSI plugin to do that. The Azure plugin can take snapshots of the disk.
I cannot confirm the detailed parameter setting, but Provisioner: kubernetes.io/azure-disk is supported by the Azure plugin.

@aloknagaonkar
Copy link
Author

aloknagaonkar commented Jan 20, 2023

what is the best practice , should we use azure plugin or CSI plugin .

what is the need of restic during volume snapshot ? do we need that ?.

I want to understand the options and what would be best practice to use velero from reliability and support point of view

@blackpiglet
Copy link
Contributor

  1. For Restic, this is the option when the user environment doesn't support the disk snapshot function or some special cases, for example, cross-region scenarios. After Velero v1.10, Kopia is also supported. Kopia provides a similar function as Restic, but with better performance in most cases.
  2. Velero also maintains some plugins for cloud providers, such as Azure, GCP, and AWS. They work specifically for the provider, but they cannot be used for other environments, but it's not possible that Velero creates plugins for every environment and provider.
  3. To make Velero works for more environments, the CSI plugin comes on the stage. As long as your k8s cluster has a CSI driver, and the CSI driver supports snapshotting function, you can use the CSI plugin.

IMO, for your case, the Azure plugin is good enough. You can also try with CSI plugin. If you don't have a special need, Restic is not recommended.

@aloknagaonkar
Copy link
Author

aloknagaonkar commented Jan 20, 2023

I am getting this error when restoring in different namespace

Warnings:
  Velero:     <none>
  Cluster: 

could not restore, CustomResourceDefinition "mongodb.mongodb.com" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, VolumeSnapshotContent "snapcontent-68349033-bd46-4a8b-978c-36540c24d142" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, PersistentVolume "pvc-85be4a60-a0ff-478a-bc00-5cf7ae377aea" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, ClusterRoleBinding "mongodb-enterprise-operator-mongodb-certs-binding" already exists. 

Warning: 
the in-cluster version is different than the backed-up version. could not restore, ClusterRoleBinding "mongodb-enterprise-operator-mongodb-webhook-binding" already exists.

Warning: the in-cluster version is different than the backed-up version.
  Namespaces: velero-test

could not restore, PersistentVolumeClaim "data-mongodb-gateway-0" already exists. Warning: the in-cluster version is different than the backed-up version.

@blackpiglet
Copy link
Contributor

@aloknagaonkar
Those logs are warnings. They mean the existing k8s resource version is not the same as the restored cluster version.

This is normal for some cluster resources, e.g. CustomResourceDefinition "mongodb.mongodb.com", PersistentVolume "pvc-85be4a60-a0ff-478a-bc00-5cf7ae377aea", ClusterRoleBinding "mongodb-enterprise-operator-mongodb-webhook-binding" and ClusterRoleBinding "mongodb-enterprise-operator-mongodb-certs-binding".
But VolumeSnapshotContent "snapcontent-68349033-bd46-4a8b-978c-36540c24d142" and PersistentVolumeClaim "data-mongodb-gateway-0" are namespace resources. Looks like the into-different-namespace restore still has some overlap with the cluster.

@iusergii
Copy link

iusergii commented Mar 29, 2023

@blackpiglet I have a similar scenario but would like to have a restoration option.

  • StorageClass deletionPolicy is set to Retain
  • Application helm chart with workload, pvc, etc deployed to cluster.
  • Two schedules used for backup:
schedules:
  daily:
    schedule: "*/30  * * * *"
    template:
      ttl: "72h"
      snapshotVolumes: false
  snapshot:
    schedule: "*/30 * * * *"
    template:
      ttl: "72h"
      snapshotVolumes: true
      labelSelector:
        matchLabels:
          backup.velero.io/my-backup-volume: "true"
  1. The application PV data was corrupted and I would like to restore to the latest snapshot backup.
  2. The application helm chart is deleted unintentionally - I would like to restore PVC to the existing PV using the latest daily backup

Using CSI features seems to work with the case [1]:
velero restore create --from-backup=velero-snapshot-20230329135047 --include-namespaces=default
it creates a brand new PVC/PV successfully from the latest snapshot.

But it fails to restore PVC (azure-managed-disk) in case [2]:
velero restore create --from-backup=velero-daily-20230329135047 --include-namespaces=default

Log:

time="2023-03-29T14:22:47Z" level=error msg="Namespace default, resource restore error: error preparing persistentvolumeclaims/default/azure-managed-disk: rpc error: code = Unknown desc = Failed to get Volumesnapshot default/velero-azure-managed-disk-cjbc6 to restore PVC default/azure-managed-disk: volumesnapshots.snapshot.storage.k8s.io \"velero-azure-managed-disk-cjbc6\" not found" logSource="pkg/controller/restore_controller.go:534" restore=velero/velero-daily-20230329135047-20230329162202

Is there any way to restore PVC to the existing PV (Released one) ?

@blackpiglet
Copy link
Contributor

@iusergii
I'm sorry for the late response. I missed the update notification.
Velero cannot work well with StorageClass's policy setting to Retain.
How about changing the policy to Delete?

@blackpiglet
Copy link
Contributor

Close for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Cloud/Azure Area/CSI Related to Container Storage Interface support kind/requirement
Projects
None yet
Development

No branches or pull requests

6 participants