Pod Volume Backup Failed while backing up volumes. #5188

nwakalka · 2022-08-08T12:12:02Z

What steps did you take and what happened:
We were trying to run velero backups with pod volume backups. where pod volumes that need to be backed up using Restic.

nwakalka@nwakalka-virtual-machine:~/workspace/go/src/nwakalka-kind/multiple-pod-pv-test$ kubectl exec -it mcs-velero-b455f5465-s7xxd -n mcs-customer-backup  -- /velero get backups

NAME                         STATUS            ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
backupname       PartiallyFailed   1        0          2022-08-08 08:44:46 +0000 UTC   29d       default            <none>
customerbackup               PartiallyFailed   1        0          2022-08-08 07:42:19 +0000 UTC   29d       default            <none>
customerbackup-01-18fadb00   PartiallyFailed   1        0          2022-08-08 06:20:10 +0000 UTC   29d       default            <none>
customerbackup-02-38777e93   Completed         0        0          2022-08-08 08:56:21 +0000 UTC   29d       default            <none>

here you can see our velero backups are partiallyFailing. attaching the velero backup logs for more info

nwakalka@nwakalka-virtual-machine:~/workspace/go/src/nwakalka-kind/multiple-pod-pv-test$ kubectl exec -it mcs-velero-b455f5465-s7xxd -n mcs-customer-backup  -- /velero describe backup backupname-nwakalka-17 --details --insecure-skip-tls-verify
I0808 12:00:02.510894   15188 request.go:665] Waited for 1.166485167s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/noobaa.io/v1alpha1?timeout=32s
Name:         backupname-nwakalka-17
Namespace:    mcs-customer-backup
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.21.11+6b3cbdd
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=21

Phase:  PartiallyFailed (run `velero backup logs backupname-nwakalka-17` for more information)

Errors:    1
Warnings:  0

Namespaces:
  Included:  nwakalka
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2022-08-08 08:44:46 +0000 UTC
Completed:  2022-08-08 08:46:04 +0000 UTC

Expiration:  2022-09-07 08:44:46 +0000 UTC

Total items to be backed up:  40
Items backed up:              40

Resource List:
  apiextensions.k8s.io/v1/CustomResourceDefinition:
    - customerbackups.**.com
  apps/v1/Deployment:
    - nwakalka/backend-app
  apps/v1/ReplicaSet:
    - nwakalka/backend-app-5c7f8f6f67
  authorization.openshift.io/v1/RoleBinding:
    - nwakalka/admin
    - nwakalka/system:deployers
    - nwakalka/system:image-builders
    - nwakalka/system:image-pullers
  mcs.***.com/v1/CustomerBackup:
    - nwakalka/customerbackup-01
  rbac.authorization.k8s.io/v1/RoleBinding:
    - nwakalka/admin
    - nwakalka/system:deployers
    - nwakalka/system:image-builders
    - nwakalka/system:image-pullers
  v1/ConfigMap:
    - nwakalka/kube-root-ca.crt
    - nwakalka/openshift-service-ca.crt
  v1/Event:
    - nwakalka/backend-app-5c7f8f6f67-mx9hc.170949f2fa5978ee
    - nwakalka/backend-app-5c7f8f6f67-mx9hc.170949f548dceb61
    - nwakalka/backend-app-5c7f8f6f67-mx9hc.170949f79a38011d
    - nwakalka/backend-app-5c7f8f6f67-mx9hc.170949f7a36a0ea8
    - nwakalka/backend-app-5c7f8f6f67-mx9hc.170949f7b350d0f5
    - nwakalka/backend-app-5c7f8f6f67-mx9hc.170949f7b4e992c5
    - nwakalka/backend-app-5c7f8f6f67.170949f2bd2f3d19
    - nwakalka/backend-app.170949f2ba94a17d
    - nwakalka/e2eapp-pv-claim-test-backend.170949e95f467285
    - nwakalka/e2eapp-pv-claim-test-backend.170949f2dca6d757
  v1/Namespace:
    - nwakalka
  v1/PersistentVolume:
    - pvc-942d8b64-5a37-4b38-9764-73e1bd40bba1
  v1/PersistentVolumeClaim:
    - nwakalka/e2eapp-pv-claim-test-backend
  v1/Pod:
    - nwakalka/backend-app-5c7f8f6f67-mx9hc
  v1/Secret:
    - nwakalka/builder-dockercfg-dvg5h
    - nwakalka/builder-token-mvrnr
    - nwakalka/builder-token-w6ppq
    - nwakalka/default-dockercfg-vbrkr
    - nwakalka/default-token-4d96g
    - nwakalka/default-token-5w4jk
    - nwakalka/deployer-dockercfg-vthj6
    - nwakalka/deployer-token-d5xmw
    - nwakalka/deployer-token-fzhgg
  v1/ServiceAccount:
    - nwakalka/builder
    - nwakalka/default
    - nwakalka/deployer

Velero-Native Snapshots: <none included>

Restic Backups:
  Failed:
    nwakalka/backend-app-5c7f8f6f67-mx9hc: e2eapp-storage

Later checked for pod volume backups which was failed

nwakalka@nwakalka-virtual-machine:~/workspace/go/src/nwakalka-kind/multiple-pod-pv-test$ oc get podvolumebackup -n mcs-customer-backup
NAME                               STATUS    CREATED   NAMESPACE   POD                            VOLUME           RESTIC REPO                                                                                           STORAGE LOCATION   AGE
backupname-nwakalka-17-wh5jm       Failed    3h20m     nwakalka    backend-app-5c7f8f6f67-mx9hc   e2eapp-storage   s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/*****/backup-v2/restic/nwakalka   default            3h20m
customerbackup-01-18fadb00-vjpkj   Failed    5h45m     nwakalka    backend-app-5c7f8f6f67-mx9hc   e2eapp-storage   s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/******/backup-v2/restic/nwakalka   default            5h45m
customerbackup-g6xvn               Failed    4h23m     nwakalka    backend-app-5c7f8f6f67-mx9hc   e2eapp-storage   s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/******/backup-v2/restic/nwakalka   default            4h23m

Describing the pod volumebackup

nwakalka@nwakalka-virtual-machine:~/workspace/go/src/nwakalka-kind/multiple-pod-pv-test$ oc get podvolumebackup -n mcs-customer-backup backupname-nwakalka-17-wh5jm -o yaml

apiVersion: velero.io/v1
kind: PodVolumeBackup
metadata:
  annotations:
    velero.io/pvc-name: e2eapp-pv-claim-test-backend
  creationTimestamp: 2022-08-08T08:46:03Z
  generateName: backupname-nwakalka-17-
  generation: 4
  labels:
    velero.io/backup-name: backupname-nwakalka-17
    velero.io/backup-uid: cae0fd36-af1a-484b-96f9-b5394660337e
    velero.io/pvc-uid: 942d8b64-5a37-4b38-9764-73e1bd40bba1
  managedFields:
  - apiVersion: velero.io/v1
    fieldsType: FieldsV1
spec:
  backupStorageLocation: default
  node: t001-gvkns-worker-s3xlarge4-eu-de-03-gwxmr
  pod:
    kind: Pod
    name: backend-app-5c7f8f6f67-mx9hc
    namespace: nwakalka
    uid: 4bdc38c6-104e-4a51-8d64-17389ff5c658
  repoIdentifier: s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/t001-gvkns-backup/backup-v2/restic/nwakalka
  tags:
    backup: backupname-nwakalka-17
    backup-uid: cae0fd36-af1a-484b-96f9-b5394660337e
    ns: nwakalka
    pod: backend-app-5c7f8f6f67-mx9hc
    pod-uid: 4bdc38c6-104e-4a51-8d64-17389ff5c658
    pvc-uid: 942d8b64-5a37-4b38-9764-73e1bd40bba1
    volume: e2eapp-storage
  volume: e2eapp-storage
status:
  completionTimestamp: 2022-08-08T08:46:03Z
  message: "running Restic backup, stderr=unable to read root certificate: open /tmp/cacert-default104149325:
    no such file or directory\ngithub.com/restic/restic/internal/backend.Transport\n\t/restic/internal/backend/http_transport.go:110\nmain.open\n\t/restic/cmd/restic/global.go:687\nmain.OpenRepository\n\t/restic/cmd/restic/global.go:421\nmain.runBackup\n\t/restic/cmd/restic/cmd_backup.go:524\nmain.glob..func2\n\t/restic/cmd/restic/cmd_backup.go:61\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/build/go/pkg/mod/github.com/spf13/[email protected]/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/build/go/pkg/mod/github.com/spf13/[email protected]/command.go:974\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/build/go/pkg/mod/github.com/spf13/[email protected]/command.go:902\nmain.main\n\t/restic/cmd/restic/main.go:98\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\n:
    exit status 1"
  phase: Failed
  progress: {}
  startTimestamp: 2022-08-08T08:46:03Z

What did you expect to happen:

Pod volume backup need to be completed and hence velero backup need to completed.

Anything else you would like to add:

We are already annotating the pod volume backups.
Velero version 1.6.3-1 was working success.

Environment:

Velero version (use velero version): 1.9.0-1
Velero features (use velero client config get features): features:
Kubernetes version (use kubectl version): v1.23.9
Kubernetes installer & version:
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release): Ubuntu 20.04.4 LTS

The text was updated successfully, but these errors were encountered:

ywk253100 · 2022-08-09T09:32:01Z

Do you configure a customized CA in the BackupStorageLocation?
Could you check the status of BackupStorageLocation and check whether it is healthy or not?

nwakalka · 2022-08-09T10:42:43Z

Yes, it's configured and BackupStorageLocation is available.

nwakalka@nwakalka-virtual-machine:~$ oc get bsl -n mcs-backup
NAME       PHASE       LAST VALIDATED   AGE       DEFAULT
internal   Available   25s              7m25s     true
nwakalka@nwakalka-virtual-machine:~$ oc get bsl -n mcs-customer-backup
NAME      PHASE       LAST VALIDATED   AGE       DEFAULT
default   Available   56s              7m28s     true

nwakalka · 2022-08-09T11:01:14Z

I have also found out similar issue created for PodVolumeBackup failed

I have followed issue and added changes specified in cherrypick 5145 and tested again.

Now i can see PodVolumeBackups are getting completed but i am having another error. due to which my backups still getting partially failed.

attaching the logs of PodVolumeBackups

nwakalka@nwakalka-virtual-machine:~/workspace/go/src/nwakalka-kind/multiple-pod-pv-test$ oc get podvolumebackup -n mcs-customer-backup customerbackup-04-678cd5b0-jnqx9 -o yaml
apiVersion: velero.io/v1
kind: PodVolumeBackup
metadata:
  annotations:
    velero.io/pvc-name: e2eapp-pv-claim-test-backend
  creationTimestamp: 2022-08-09T10:53:54Z
  generateName: customerbackup-04-678cd5b0-
  generation: 5
  labels:
    velero.io/backup-name: customerbackup-04-678cd5b0
    velero.io/backup-uid: 6dd83b98-6028-4cf6-88de-4a352880ae24
    velero.io/pvc-uid: 942d8b64-5a37-4b38-9764-73e1bd40bba1
  managedFields:
  - apiVersion: velero.io/v1
  name: customerbackup-04-678cd5b0-jnqx9
  namespace: mcs-customer-backup
  ownerReferences:
  - apiVersion: velero.io/v1
    controller: true
    kind: Backup
    name: customerbackup-04-678cd5b0
    uid: 6dd83b98-6028-4cf6-88de-4a352880ae24
  resourceVersion: "1970171660"
  uid: 7774c22c-5500-4afa-be68-1df9a033c907
spec:
  backupStorageLocation: default
  node: t001-gvkns-worker-s3xlarge4-eu-de-03-gwxmr
  pod:
    kind: Pod
    name: backend-app-5c7f8f6f67-mx9hc
    namespace: nwakalka
    uid: 4bdc38c6-104e-4a51-8d64-17389ff5c658
  repoIdentifier: s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/t001-gvkns-backup/backup-v2/restic/nwakalka
  tags:
    backup: customerbackup-04-678cd5b0
    backup-uid: 6dd83b98-6028-4cf6-88de-4a352880ae24
    ns: nwakalka
    pod: backend-app-5c7f8f6f67-mx9hc
    pod-uid: 4bdc38c6-104e-4a51-8d64-17389ff5c658
    pvc-uid: 942d8b64-5a37-4b38-9764-73e1bd40bba1
    volume: e2eapp-storage
  volume: e2eapp-storage
status:
  completionTimestamp: 2022-08-09T10:53:56Z
  message: |-
    running Restic backup, stderr=Fatal: wrong password or no key found
    : exit status 1
  phase: Completed
  progress:
    bytesDone: 42949704
    totalBytes: 42949704
  snapshotID: 327685dc
  startTimestamp: 2022-08-09T10:53:54Z

also attaching the velero logs for reference

kup/item_backupper.go:417" name=pvc-942d8b64-5a37-4b38-9764-73e1bd40bba1 namespace= persistentVolume=pvc-942d8b64-5a37-4b38-9764-73e1bd40bba1 resource=persistentvolumes
time="2022-08-09T10:53:55Z" level=info msg="1 errors encountered backup up item" backup=mcs-customer-backup/customerbackup-04-678cd5b0 logSource="pkg/backup/backup.go:413" name=backend-app-5c7f8f6f67-mx9hc
time="2022-08-09T10:53:55Z" level=error msg="Error backing up item" backup=mcs-customer-backup/customerbackup-04-678cd5b0 error="pod volume backup failed: running Restic backup, stderr=Fatal: wrong password or no key found\n: exit status 1" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:199" error.function="github.com/vmware-tanzu/velero/pkg/restic.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:417" name=backend-app-5c7f8f6f67-mx9hc

ywk253100 · 2022-08-11T09:18:01Z

Are you using a dev build of Velero with your own patch? Could you try the v1.9.1-rc2?

BTW, the status of the PodVolumeBackup is weired, there is a error message but the phase is completed, did you patch the status manually?:

status:
  completionTimestamp: 2022-08-09T10:53:56Z
  message: |-
    running Restic backup, stderr=Fatal: wrong password or no key found
    : exit status 1
  phase: Completed

nwakalka · 2022-08-11T17:39:32Z

Yes, i have manually modified the files. Let me try to use the v1.9.1-rc2 and test.

nwakalka · 2022-08-12T09:19:38Z

Hi @ywk253100,

I have used the v1.9.1-rc2 patch and tested the backup again.
still facing the same error as above

nwakalka@nwakalka-virtual-machine:~/workspace/go/src/nwakalka-kind/multiple-pod-pv-test$ oc get podvolumebackup -n mcs-customer-backup customerbackup-05-2680dee7-frwqg -o yaml
apiVersion: velero.io/v1
kind: PodVolumeBackup
metadata:
  annotations:
    velero.io/pvc-name: e2eapp-pv-claim-test-backend
  creationTimestamp: 2022-08-12T09:16:11Z
  generation: 5
  labels:
    velero.io/backup-name: customerbackup-05-2680dee7
    velero.io/backup-uid: da153ad5-d69c-42e5-802e-151022b0073c
    velero.io/pvc-uid: 942d8b64-5a37-4b38-9764-73e1bd40bba1
  managedFields:
  - apiVersion: velero.io/v1

spec:
  backupStorageLocation: default
  node: t001-gvkns-worker-s3xlarge4-eu-de-03-gwxmr
  pod:
    kind: Pod
    name: backend-app-5c7f8f6f67-mx9hc
    namespace: nwakalka
    uid: 4bdc38c6-104e-4a51-8d64-17389ff5c658
  repoIdentifier: s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/t001-gvkns-backup/backup-v2/restic/nwakalka
  tags:
    backup: customerbackup-05-2680dee7
    backup-uid: da153ad5-d69c-42e5-802e-151022b0073c
    ns: nwakalka
    pod: backend-app-5c7f8f6f67-mx9hc
    pod-uid: 4bdc38c6-104e-4a51-8d64-17389ff5c658
    pvc-uid: 942d8b64-5a37-4b38-9764-73e1bd40bba1
    volume: e2eapp-storage
  volume: e2eapp-storage
status:
  completionTimestamp: 2022-08-12T09:16:14Z
  message: |-
    running Restic backup, stderr=Fatal: wrong password or no key found
    : exit status 1
  phase: Completed
  progress:
    bytesDone: 42949704
    totalBytes: 42949704
  snapshotID: 7071fc6f
  startTimestamp: 2022-08-12T09:16:11Z

ywk253100 · 2022-08-15T03:27:26Z

Is this a fresh installation or an upgrade one? Did you update the CRD as well if it is upgraded from the previous version? Could you try the fresh installation?

If the error is still there, please run velero debug to collect all the necessary information

nwakalka · 2022-08-17T05:52:24Z

Yes, we did the fresh installation with upgrading all crds with latest version as you mentioned.

Please find the attached velero logs in debug mode.

velero_debug_logs.txt

nwakalka · 2022-08-22T05:16:47Z

any update @ywk253100 ?

nwakalka · 2022-09-12T12:58:56Z

any update @ywk253100

ywk253100 · 2022-09-15T02:41:55Z

@nwakalka Sorry for the late reply.

I didn't find any useful information in the log file. I'm not sure whether this issue is related to your environment/configuration or it's a bug of Velero, but I'm afraid have no time to do more investigation.

Could you debug it further in your local environment and appreciate it if you can find out the root cause?

nwakalka · 2022-09-19T17:13:17Z

Hi @ywk253100 ,

After going through logs again, we have observed incorrect order of restic snapshots and initialization of restic repository.
please find logs for reference

time="2022-08-17T05:38:24Z" level=debug msg="Acquiring lock" backupLocation=default logSource="pkg/restic/repository_ensurer.go:122" volumeNamespace=e2e-br-src-singlepv-muk-wqmzxj

time="2022-08-17T05:38:24Z" level=debug msg="Acquired lock" backupLocation=default logSource="pkg/restic/repository_ensurer.go:131" volumeNamespace=e2e-br-src-singlepv-muk-wqmzxj

time="2022-08-17T05:38:24Z" level=debug msg="No repository found, creating one" backupLocation=default logSource="pkg/restic/repository_ensurer.go:151" volumeNamespace=e2e-br-src-singlepv-muk-wqmzxj

time="2022-08-17T05:38:24Z" level=info msg="Initializing restic repository" logSource="pkg/controller/restic_repository_controller.go:118" resticRepo=mcs-customer-backup/e2e-br-src-singlepv-muk-wqmzxj-default-7pj24

time="2022-08-17T05:38:24Z" level=debug msg="Ran restic command" command="restic snapshots --repo=s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/t007-wr9gc-backup/backup-v2/restic/e2e-br-src-singlepv-muk-wqmzxj --password-file=/tmp/credentials/mcs-customer-backup/velero-restic-credentials-repository-password --cacert=/tmp/cacert-default671251002 --cache-dir=/scratch/.cache/restic --latest=1" logSource="pkg/restic/repository_manager.go:294" repository=e2e-br-src-singlepv-muk-wqmzxj stderr="Fatal: unable to open config file: Stat: The specified key does not exist.\nIs there a repository at the following location?\ns3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/t007-wr9gc-backup/backup-v2/restic/e2e-br-src-singlepv-muk-wqmzxj\n" stdout=

time="2022-08-17T05:38:26Z" level=debug msg="Ran restic command" command="restic init --repo=s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/t007-wr9gc-backup/backup-v2/restic/e2e-br-src-singlepv-muk-wqmzxj --password-file=/tmp/credentials/mcs-customer-backup/velero-restic-credentials-repository-password --cacert=/tmp/cacert-default3207158131 --cache-dir=/scratch/.cache/restic" logSource="pkg/restic/repository_manager.go:294" repository=e2e-br-src-singlepv-muk-wqmzxj stderr= stdout="created restic repository 3d68d64577 at s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/t007-wr9gc-backup/backup-v2/restic/e2e-br-src-singlepv-muk-wqmzxj\n\nPlease note that knowledge of your password is required to access\nthe repository. Losing your password means that your data is\nirrecoverably lost.\n"

time="2022-08-17T05:38:26Z" level=info msg="Initializing restic repository" logSource="pkg/controller/restic_repository_controller.go:118" resticRepo=mcs-customer-backup/e2e-br-src-singlepv-muk-wqmzxj-default-7pj24

time="2022-08-17T05:38:26Z" level=debug msg="Released lock" backupLocation=default logSource="pkg/restic/repository_ensurer.go:128" volumeNamespace=e2e-br-src-singlepv-muk-wqmzxj

time="2022-08-17T05:38:27Z" level=debug msg="Ran restic command" command="restic snapshots --repo=s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/t007-wr9gc-backup/backup-v2/restic/e2e-br-src-singlepv-muk-wqmzxj --password-file=/tmp/credentials/mcs-customer-backup/velero-restic-credentials-repository-password --cacert=/tmp/cacert-default1211308944 --cache-dir=/scratch/.cache/restic --latest=1" logSource="pkg/restic/repository_manager.go:294" repository=e2e-br-src-singlepv-muk-wqmzxj stderr= stdout=

Execution it's not in correct order, restic snapshots are getting executed before initializing restic repository.

ywk253100 · 2022-09-20T02:24:03Z

@nwakalka This is the correct behavior, see the code, it tries to run snapshot command and if it fails run the init command.

Could you show me the logs of restic deamonset? I think that should contain more information about the error

nwakalka · 2022-09-26T07:42:51Z

@ywk253100

please the attached logs of restic, as i don't see any error here.

nwakalka@nwakalka-virtual-machine:~$ oc logs -n mcs-customer-backup restic-rkpxq -f
time="2022-10-04T16:18:01Z" level=info msg="Setting log-level to DEBUG"
time="2022-10-04T16:18:01Z" level=info msg="Starting Velero restic server feature-release-v1.9.2 (d7d4a7924e5b17a019555b8789ccb33b812bbad4)" logSource="pkg/cmd/cli/restic/server.go:88"
2022-10-04T16:18:02.401Z	INFO	controller-runtime.metrics	metrics server is starting to listen	{"addr": ":8086"}
time="2022-10-04T16:18:02Z" level=info msg="Starting metric server for restic at address [:8085]" logSource="pkg/cmd/cli/restic/server.go:179"
time="2022-10-04T16:18:02Z" level=info msg="Starting controllers" logSource="pkg/cmd/cli/restic/server.go:190"
time="2022-10-04T16:18:02Z" level=info msg="Controllers starting..." logSource="pkg/cmd/cli/restic/server.go:221"
2022-10-04T16:18:02.662Z	INFO	starting metrics server	{"path": "/metrics"}
2022-10-04T16:18:02.662Z	INFO	controller.podvolumebackup	Starting EventSource	{"reconciler group": "velero.io", "reconciler kind": "PodVolumeBackup", "source": "kind source: /, Kind="}
2022-10-04T16:18:02.662Z	INFO	controller.podvolumebackup	Starting Controller	{"reconciler group": "velero.io", "reconciler kind": "PodVolumeBackup"}
2022-10-04T16:18:02.679Z	INFO	controller.podvolumerestore	Starting EventSource	{"reconciler group": "velero.io", "reconciler kind": "PodVolumeRestore", "source": "kind source: /, Kind="}
2022-10-04T16:18:02.679Z	INFO	controller.podvolumerestore	Starting EventSource	{"reconciler group": "velero.io", "reconciler kind": "PodVolumeRestore", "source": "kind source: /, Kind="}
2022-10-04T16:18:02.679Z	INFO	controller.podvolumerestore	Starting Controller	{"reconciler group": "velero.io", "reconciler kind": "PodVolumeRestore"}
2022-10-04T16:18:02.780Z	INFO	controller.podvolumerestore	Starting workers	{"reconciler group": "velero.io", "reconciler kind": "PodVolumeRestore", "worker count": 1}
2022-10-04T16:18:02.781Z	INFO	controller.podvolumebackup	Starting workers	{"reconciler group": "velero.io", "reconciler kind": "PodVolumeBackup", "worker count": 1}
time="2022-10-04T16:22:37Z" level=info msg="PodVolumeBackup starting" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:87" podvolumebackup=mcs-customer-backup/customerbackup-9khdl
time="2022-10-04T16:22:37Z" level=debug msg="Looking for path matching glob" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:305" pathGlob="/host_pods/bad9bfaf-ff2c-4cda-b8ff-c94e8623d4c5/volumes/*/pvc-1691fd0b-7f64-4568-8520-af7f4ee6de79" podvolumebackup=mcs-customer-backup/customerbackup-9khdl
time="2022-10-04T16:22:37Z" level=debug msg="This is a valid volume path: /host_pods/bad9bfaf-ff2c-4cda-b8ff-c94e8623d4c5/volumes/kubernetes.io~nfs/pvc-1691fd0b-7f64-4568-8520-af7f4ee6de79." backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/util/kube/utils.go:210" podvolumebackup=mcs-customer-backup/customerbackup-9khdl
time="2022-10-04T16:22:37Z" level=debug msg="Found path matching glob" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:311" path="/host_pods/bad9bfaf-ff2c-4cda-b8ff-c94e8623d4c5/volumes/kubernetes.io~nfs/pvc-1691fd0b-7f64-4568-8520-af7f4ee6de79" podvolumebackup=mcs-customer-backup/customerbackup-9khdl
time="2022-10-04T16:22:38Z" level=info msg="Looking for most recent completed PodVolumeBackup for this PVC" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:216" podvolumebackup=mcs-customer-backup/customerbackup-9khdl pvcUID=1691fd0b-7f64-4568-8520-af7f4ee6de79
time="2022-10-04T16:22:38Z" level=info msg="No completed PodVolumeBackup found for PVC" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:254" podvolumebackup=mcs-customer-backup/customerbackup-9khdl pvcUID=1691fd0b-7f64-4568-8520-af7f4ee6de79
time="2022-10-04T16:22:38Z" level=info msg="No parent snapshot found for PVC, not using --parent flag for this backup" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:356" podvolumebackup=mcs-customer-backup/customerbackup-9khdl
time="2022-10-04T16:22:40Z" level=debug msg="Ran command=restic backup --repo=s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/test-01ds7j-backup/backup-v2/restic/nwakalka --password-file=/tmp/credentials/mcs-customer-backup/velero-restic-credentials-repository-password --cacert=/tmp/cacert-default458548719 --cache-dir=/scratch/.cache/restic . --tag=backup=customerbackup --tag=backup-uid=c6a76474-38ee-4ace-91fc-0003229348b0 --tag=ns=nwakalka --tag=pod=backend-app-7c998644df-bjj8g --tag=pod-uid=bad9bfaf-ff2c-4cda-b8ff-c94e8623d4c5 --tag=pvc-uid=1691fd0b-7f64-4568-8520-af7f4ee6de79 --tag=volume=e2eapp-storage --host=velero --json, stdout={\"message_type\":\"summary\",\"files_new\":2,\"files_changed\":0,\"files_unmodified\":0,\"dirs_new\":1,\"dirs_changed\":0,\"dirs_unmodified\":0,\"data_blobs\":4,\"tree_blobs\":2,\"data_added\":1439900,\"total_files_processed\":2,\"total_bytes_processed\":201166947,\"total_duration\":1.033140605,\"snapshot_id\":\"d4be1e8d\"}, stderr=" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:160" podvolumebackup=mcs-customer-backup/customerbackup-9khdl
time="2022-10-04T16:22:41Z" level=info msg="PodVolumeBackup completed" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:200" podvolumebackup=mcs-customer-backup/customerbackup-9khdl
time="2022-10-04T16:22:41Z" level=info msg="PodVolumeBackup starting" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:87" podvolumebackup=mcs-customer-backup/customerbackup-9khdl
time="2022-10-04T16:22:41Z" level=debug msg="PodVolumeBackup is not new, not processing" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:98" podvolumebackup=mcs-customer-backup/customerbackup-9khdl

kaovilai · 2022-10-04T17:10:24Z

Error string wrong password or no key found came from restic https://github.com/restic/restic/blob/443cc49afd4fb36bb4a5178e5c73f19d851ad972/internal/repository/key.go#L21

nwakalka · 2022-10-07T12:20:19Z

Hi @ywk253100 , @sseago ,

I checked with release 1.7 version and it's working as expected. but for v1.8 i am facing below error

time="2022-10-07T05:15:34Z" level=error msg="Error backing up item" backup=mcs-customer-backup/customerbackup error="pod volume backup failed: running Restic backup, stderr=unable to read root certificate: open /tmp/cacert-default3351459972: no such file or directory\ngithub.com/restic/restic/internal/backend.Transport\n\t/restic/internal/backend/http_transport.go:110\nmain.open\n\t/restic/cmd/restic/global.go:687\nmain.OpenRepository\n\t/restic/cmd/restic/global.go:421\nmain.runBackup\n\t/restic/cmd/restic/cmd_backup.go:524\nmain.glob..func2\n\t/restic/cmd/restic/cmd_backup.go:61\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/build/go/pkg/mod/github.com/spf13/[email protected]/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/build/go/pkg/mod/github.com/spf13/[email protected]/command.go:974\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/build/go/pkg/mod/github.com/spf13/[email protected]/command.go:902\nmain.main\n\t/restic/cmd/restic/main.go:98\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\n: exit status 1" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:199" error.function="github.com/vmware-tanzu/velero/pkg/restic.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:418" name=backend-app-7c998644df-4cf4l

nwakalka · 2022-10-11T06:18:27Z

Hi @ywk253100 , @sseago ,

Currently our issue is resolved with velero 1.9, as we have two namespaces and both namespaces having velero and restic pods. once removed velero and restic from one of namespace our backups and restore are success. till velero 1.7 both namespaces have velero and restic and are working independently. do we have any other provision now ?

sseago · 2022-10-11T23:23:59Z

@nwakalka Were both velero installations the same version? Two independent velero installations should still work as of 1.9. Nothing has been changed that was intended to break this, although it's possible that some of the refactoring that was done introduced a regression. However, if you have two different velero versions installed, then one of them will have the wrong CRDs, since CRDs are cluster-scoped -- in that case the installation with the wrong CRDs will probably not work properly.

nwakalka · 2022-10-12T10:46:42Z

Hi @sseago ,

Thank you for replying. yes, we are deploying same versions of velero and yes correct CRDs of same versions.

balpert89 · 2022-11-23T20:50:45Z

Hi all,

I had some time to dig into this issue (disclaimer: I am a colleage of @nwakalka).

What is missed here is the fact that we run two complete stacks of velero in two separate namespaces, including restic. There is one distinct daemonset for restic in each namespace.

With up until (and including) v1.7.2 velero & restic have acted only on podvolumebackups which were created in their respective namespace. Starting with v1.8, both restic instances will reconcile a podvolumebackup, ignoring the actual namespace.

v1.7.2 behavior:
"namespace A"/restic reconciled podvolumebackups only in "namespace A", ignoring "namespace B".
"namespace B"/restic reconciled podvolumebackups only in "namespace B", ignoring "namespace A".

v1.8 (and above) behavior:
"namespace A"/restic now reconciles podvolumebackups in "namespace A" & "namespace B".
"namespace B"/restic now reconciles podvolumebackups in "namespace A" & "namespace B".

This new behavior results in the described status above.

Now my questions:

Is it not intended to run more than one restic instance per cluster?
Is there a way to instruct restic to act like the v1.7.2 behavior?

kaovilai · 2022-11-24T04:05:26Z

I think "v1.8 behavior" if true is a bug.

stale · 2023-02-02T01:31:15Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2023-03-14T12:31:09Z

Closing the stale issue.

ywk253100 self-assigned this Aug 15, 2022

ywk253100 added the Needs info Waiting for information label Aug 15, 2022

ywk253100 added Needs reproduction and removed Needs info Waiting for information labels Sep 15, 2022

stale bot added the staled label Feb 2, 2023

stale bot closed this as completed Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod Volume Backup Failed while backing up volumes. #5188

Pod Volume Backup Failed while backing up volumes. #5188

nwakalka commented Aug 8, 2022 •

edited

Loading

ywk253100 commented Aug 9, 2022

nwakalka commented Aug 9, 2022

nwakalka commented Aug 9, 2022

ywk253100 commented Aug 11, 2022

nwakalka commented Aug 11, 2022

nwakalka commented Aug 12, 2022

ywk253100 commented Aug 15, 2022

nwakalka commented Aug 17, 2022

nwakalka commented Aug 22, 2022

nwakalka commented Sep 12, 2022

ywk253100 commented Sep 15, 2022

nwakalka commented Sep 19, 2022 •

edited

Loading

ywk253100 commented Sep 20, 2022

nwakalka commented Sep 26, 2022 •

edited

Loading

kaovilai commented Oct 4, 2022 •

edited

Loading

nwakalka commented Oct 7, 2022

nwakalka commented Oct 11, 2022

sseago commented Oct 11, 2022

nwakalka commented Oct 12, 2022

balpert89 commented Nov 23, 2022 •

edited

Loading

kaovilai commented Nov 24, 2022

stale bot commented Feb 2, 2023

stale bot commented Mar 14, 2023

Pod Volume Backup Failed while backing up volumes. #5188

Pod Volume Backup Failed while backing up volumes. #5188

Comments

nwakalka commented Aug 8, 2022 • edited Loading

ywk253100 commented Aug 9, 2022

nwakalka commented Aug 9, 2022

nwakalka commented Aug 9, 2022

ywk253100 commented Aug 11, 2022

nwakalka commented Aug 11, 2022

nwakalka commented Aug 12, 2022

ywk253100 commented Aug 15, 2022

nwakalka commented Aug 17, 2022

nwakalka commented Aug 22, 2022

nwakalka commented Sep 12, 2022

ywk253100 commented Sep 15, 2022

nwakalka commented Sep 19, 2022 • edited Loading

ywk253100 commented Sep 20, 2022

nwakalka commented Sep 26, 2022 • edited Loading

kaovilai commented Oct 4, 2022 • edited Loading

nwakalka commented Oct 7, 2022

nwakalka commented Oct 11, 2022

sseago commented Oct 11, 2022

nwakalka commented Oct 12, 2022

balpert89 commented Nov 23, 2022 • edited Loading

kaovilai commented Nov 24, 2022

stale bot commented Feb 2, 2023

stale bot commented Mar 14, 2023

nwakalka commented Aug 8, 2022 •

edited

Loading

nwakalka commented Sep 19, 2022 •

edited

Loading

nwakalka commented Sep 26, 2022 •

edited

Loading

kaovilai commented Oct 4, 2022 •

edited

Loading

balpert89 commented Nov 23, 2022 •

edited

Loading