Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod Volume Backup Failed while backing up volumes. #5188

Closed
nwakalka opened this issue Aug 8, 2022 · 23 comments
Closed

Pod Volume Backup Failed while backing up volumes. #5188

nwakalka opened this issue Aug 8, 2022 · 23 comments

Comments

@nwakalka
Copy link

nwakalka commented Aug 8, 2022

What steps did you take and what happened:
We were trying to run velero backups with pod volume backups. where pod volumes that need to be backed up using Restic.

nwakalka@nwakalka-virtual-machine:~/workspace/go/src/nwakalka-kind/multiple-pod-pv-test$ kubectl exec -it mcs-velero-b455f5465-s7xxd -n mcs-customer-backup  -- /velero get backups

NAME                         STATUS            ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
backupname       PartiallyFailed   1        0          2022-08-08 08:44:46 +0000 UTC   29d       default            <none>
customerbackup               PartiallyFailed   1        0          2022-08-08 07:42:19 +0000 UTC   29d       default            <none>
customerbackup-01-18fadb00   PartiallyFailed   1        0          2022-08-08 06:20:10 +0000 UTC   29d       default            <none>
customerbackup-02-38777e93   Completed         0        0          2022-08-08 08:56:21 +0000 UTC   29d       default            <none>

here you can see our velero backups are partiallyFailing. attaching the velero backup logs for more info

nwakalka@nwakalka-virtual-machine:~/workspace/go/src/nwakalka-kind/multiple-pod-pv-test$ kubectl exec -it mcs-velero-b455f5465-s7xxd -n mcs-customer-backup  -- /velero describe backup backupname-nwakalka-17 --details --insecure-skip-tls-verify
I0808 12:00:02.510894   15188 request.go:665] Waited for 1.166485167s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/noobaa.io/v1alpha1?timeout=32s
Name:         backupname-nwakalka-17
Namespace:    mcs-customer-backup
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.21.11+6b3cbdd
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=21

Phase:  PartiallyFailed (run `velero backup logs backupname-nwakalka-17` for more information)

Errors:    1
Warnings:  0

Namespaces:
  Included:  nwakalka
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2022-08-08 08:44:46 +0000 UTC
Completed:  2022-08-08 08:46:04 +0000 UTC

Expiration:  2022-09-07 08:44:46 +0000 UTC

Total items to be backed up:  40
Items backed up:              40

Resource List:
  apiextensions.k8s.io/v1/CustomResourceDefinition:
    - customerbackups.**.com
  apps/v1/Deployment:
    - nwakalka/backend-app
  apps/v1/ReplicaSet:
    - nwakalka/backend-app-5c7f8f6f67
  authorization.openshift.io/v1/RoleBinding:
    - nwakalka/admin
    - nwakalka/system:deployers
    - nwakalka/system:image-builders
    - nwakalka/system:image-pullers
  mcs.***.com/v1/CustomerBackup:
    - nwakalka/customerbackup-01
  rbac.authorization.k8s.io/v1/RoleBinding:
    - nwakalka/admin
    - nwakalka/system:deployers
    - nwakalka/system:image-builders
    - nwakalka/system:image-pullers
  v1/ConfigMap:
    - nwakalka/kube-root-ca.crt
    - nwakalka/openshift-service-ca.crt
  v1/Event:
    - nwakalka/backend-app-5c7f8f6f67-mx9hc.170949f2fa5978ee
    - nwakalka/backend-app-5c7f8f6f67-mx9hc.170949f548dceb61
    - nwakalka/backend-app-5c7f8f6f67-mx9hc.170949f79a38011d
    - nwakalka/backend-app-5c7f8f6f67-mx9hc.170949f7a36a0ea8
    - nwakalka/backend-app-5c7f8f6f67-mx9hc.170949f7b350d0f5
    - nwakalka/backend-app-5c7f8f6f67-mx9hc.170949f7b4e992c5
    - nwakalka/backend-app-5c7f8f6f67.170949f2bd2f3d19
    - nwakalka/backend-app.170949f2ba94a17d
    - nwakalka/e2eapp-pv-claim-test-backend.170949e95f467285
    - nwakalka/e2eapp-pv-claim-test-backend.170949f2dca6d757
  v1/Namespace:
    - nwakalka
  v1/PersistentVolume:
    - pvc-942d8b64-5a37-4b38-9764-73e1bd40bba1
  v1/PersistentVolumeClaim:
    - nwakalka/e2eapp-pv-claim-test-backend
  v1/Pod:
    - nwakalka/backend-app-5c7f8f6f67-mx9hc
  v1/Secret:
    - nwakalka/builder-dockercfg-dvg5h
    - nwakalka/builder-token-mvrnr
    - nwakalka/builder-token-w6ppq
    - nwakalka/default-dockercfg-vbrkr
    - nwakalka/default-token-4d96g
    - nwakalka/default-token-5w4jk
    - nwakalka/deployer-dockercfg-vthj6
    - nwakalka/deployer-token-d5xmw
    - nwakalka/deployer-token-fzhgg
  v1/ServiceAccount:
    - nwakalka/builder
    - nwakalka/default
    - nwakalka/deployer

Velero-Native Snapshots: <none included>

Restic Backups:
  Failed:
    nwakalka/backend-app-5c7f8f6f67-mx9hc: e2eapp-storage

Later checked for pod volume backups which was failed

nwakalka@nwakalka-virtual-machine:~/workspace/go/src/nwakalka-kind/multiple-pod-pv-test$ oc get podvolumebackup -n mcs-customer-backup
NAME                               STATUS    CREATED   NAMESPACE   POD                            VOLUME           RESTIC REPO                                                                                           STORAGE LOCATION   AGE
backupname-nwakalka-17-wh5jm       Failed    3h20m     nwakalka    backend-app-5c7f8f6f67-mx9hc   e2eapp-storage   s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/*****/backup-v2/restic/nwakalka   default            3h20m
customerbackup-01-18fadb00-vjpkj   Failed    5h45m     nwakalka    backend-app-5c7f8f6f67-mx9hc   e2eapp-storage   s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/******/backup-v2/restic/nwakalka   default            5h45m
customerbackup-g6xvn               Failed    4h23m     nwakalka    backend-app-5c7f8f6f67-mx9hc   e2eapp-storage   s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/******/backup-v2/restic/nwakalka   default            4h23m

Describing the pod volumebackup

nwakalka@nwakalka-virtual-machine:~/workspace/go/src/nwakalka-kind/multiple-pod-pv-test$ oc get podvolumebackup -n mcs-customer-backup backupname-nwakalka-17-wh5jm -o yaml

apiVersion: velero.io/v1
kind: PodVolumeBackup
metadata:
  annotations:
    velero.io/pvc-name: e2eapp-pv-claim-test-backend
  creationTimestamp: 2022-08-08T08:46:03Z
  generateName: backupname-nwakalka-17-
  generation: 4
  labels:
    velero.io/backup-name: backupname-nwakalka-17
    velero.io/backup-uid: cae0fd36-af1a-484b-96f9-b5394660337e
    velero.io/pvc-uid: 942d8b64-5a37-4b38-9764-73e1bd40bba1
  managedFields:
  - apiVersion: velero.io/v1
    fieldsType: FieldsV1
spec:
  backupStorageLocation: default
  node: t001-gvkns-worker-s3xlarge4-eu-de-03-gwxmr
  pod:
    kind: Pod
    name: backend-app-5c7f8f6f67-mx9hc
    namespace: nwakalka
    uid: 4bdc38c6-104e-4a51-8d64-17389ff5c658
  repoIdentifier: s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/t001-gvkns-backup/backup-v2/restic/nwakalka
  tags:
    backup: backupname-nwakalka-17
    backup-uid: cae0fd36-af1a-484b-96f9-b5394660337e
    ns: nwakalka
    pod: backend-app-5c7f8f6f67-mx9hc
    pod-uid: 4bdc38c6-104e-4a51-8d64-17389ff5c658
    pvc-uid: 942d8b64-5a37-4b38-9764-73e1bd40bba1
    volume: e2eapp-storage
  volume: e2eapp-storage
status:
  completionTimestamp: 2022-08-08T08:46:03Z
  message: "running Restic backup, stderr=unable to read root certificate: open /tmp/cacert-default104149325:
    no such file or directory\ngithub.com/restic/restic/internal/backend.Transport\n\t/restic/internal/backend/http_transport.go:110\nmain.open\n\t/restic/cmd/restic/global.go:687\nmain.OpenRepository\n\t/restic/cmd/restic/global.go:421\nmain.runBackup\n\t/restic/cmd/restic/cmd_backup.go:524\nmain.glob..func2\n\t/restic/cmd/restic/cmd_backup.go:61\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/build/go/pkg/mod/github.com/spf13/[email protected]/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/build/go/pkg/mod/github.com/spf13/[email protected]/command.go:974\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/build/go/pkg/mod/github.com/spf13/[email protected]/command.go:902\nmain.main\n\t/restic/cmd/restic/main.go:98\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\n:
    exit status 1"
  phase: Failed
  progress: {}
  startTimestamp: 2022-08-08T08:46:03Z

What did you expect to happen:

Pod volume backup need to be completed and hence velero backup need to completed.

Anything else you would like to add:

  1. We are already annotating the pod volume backups.
  2. Velero version 1.6.3-1 was working success.

Environment:

  • Velero version (use velero version): 1.9.0-1
  • Velero features (use velero client config get features): features:
  • Kubernetes version (use kubectl version): v1.23.9
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): Ubuntu 20.04.4 LTS
@ywk253100
Copy link
Contributor

Do you configure a customized CA in the BackupStorageLocation?
Could you check the status of BackupStorageLocation and check whether it is healthy or not?

@nwakalka
Copy link
Author

nwakalka commented Aug 9, 2022

Yes, it's configured and BackupStorageLocation is available.

nwakalka@nwakalka-virtual-machine:~$ oc get bsl -n mcs-backup
NAME       PHASE       LAST VALIDATED   AGE       DEFAULT
internal   Available   25s              7m25s     true
nwakalka@nwakalka-virtual-machine:~$ oc get bsl -n mcs-customer-backup
NAME      PHASE       LAST VALIDATED   AGE       DEFAULT
default   Available   56s              7m28s     true

@nwakalka
Copy link
Author

nwakalka commented Aug 9, 2022

I have also found out similar issue created for PodVolumeBackup failed

I have followed issue and added changes specified in cherrypick 5145 and tested again.

Now i can see PodVolumeBackups are getting completed but i am having another error. due to which my backups still getting partially failed.

attaching the logs of PodVolumeBackups

nwakalka@nwakalka-virtual-machine:~/workspace/go/src/nwakalka-kind/multiple-pod-pv-test$ oc get podvolumebackup -n mcs-customer-backup customerbackup-04-678cd5b0-jnqx9 -o yaml
apiVersion: velero.io/v1
kind: PodVolumeBackup
metadata:
  annotations:
    velero.io/pvc-name: e2eapp-pv-claim-test-backend
  creationTimestamp: 2022-08-09T10:53:54Z
  generateName: customerbackup-04-678cd5b0-
  generation: 5
  labels:
    velero.io/backup-name: customerbackup-04-678cd5b0
    velero.io/backup-uid: 6dd83b98-6028-4cf6-88de-4a352880ae24
    velero.io/pvc-uid: 942d8b64-5a37-4b38-9764-73e1bd40bba1
  managedFields:
  - apiVersion: velero.io/v1
  name: customerbackup-04-678cd5b0-jnqx9
  namespace: mcs-customer-backup
  ownerReferences:
  - apiVersion: velero.io/v1
    controller: true
    kind: Backup
    name: customerbackup-04-678cd5b0
    uid: 6dd83b98-6028-4cf6-88de-4a352880ae24
  resourceVersion: "1970171660"
  uid: 7774c22c-5500-4afa-be68-1df9a033c907
spec:
  backupStorageLocation: default
  node: t001-gvkns-worker-s3xlarge4-eu-de-03-gwxmr
  pod:
    kind: Pod
    name: backend-app-5c7f8f6f67-mx9hc
    namespace: nwakalka
    uid: 4bdc38c6-104e-4a51-8d64-17389ff5c658
  repoIdentifier: s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/t001-gvkns-backup/backup-v2/restic/nwakalka
  tags:
    backup: customerbackup-04-678cd5b0
    backup-uid: 6dd83b98-6028-4cf6-88de-4a352880ae24
    ns: nwakalka
    pod: backend-app-5c7f8f6f67-mx9hc
    pod-uid: 4bdc38c6-104e-4a51-8d64-17389ff5c658
    pvc-uid: 942d8b64-5a37-4b38-9764-73e1bd40bba1
    volume: e2eapp-storage
  volume: e2eapp-storage
status:
  completionTimestamp: 2022-08-09T10:53:56Z
  message: |-
    running Restic backup, stderr=Fatal: wrong password or no key found
    : exit status 1
  phase: Completed
  progress:
    bytesDone: 42949704
    totalBytes: 42949704
  snapshotID: 327685dc
  startTimestamp: 2022-08-09T10:53:54Z

also attaching the velero logs for reference

kup/item_backupper.go:417" name=pvc-942d8b64-5a37-4b38-9764-73e1bd40bba1 namespace= persistentVolume=pvc-942d8b64-5a37-4b38-9764-73e1bd40bba1 resource=persistentvolumes
time="2022-08-09T10:53:55Z" level=info msg="1 errors encountered backup up item" backup=mcs-customer-backup/customerbackup-04-678cd5b0 logSource="pkg/backup/backup.go:413" name=backend-app-5c7f8f6f67-mx9hc
time="2022-08-09T10:53:55Z" level=error msg="Error backing up item" backup=mcs-customer-backup/customerbackup-04-678cd5b0 error="pod volume backup failed: running Restic backup, stderr=Fatal: wrong password or no key found\n: exit status 1" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:199" error.function="github.com/vmware-tanzu/velero/pkg/restic.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:417" name=backend-app-5c7f8f6f67-mx9hc

@ywk253100
Copy link
Contributor

Are you using a dev build of Velero with your own patch? Could you try the v1.9.1-rc2?

BTW, the status of the PodVolumeBackup is weired, there is a error message but the phase is completed, did you patch the status manually?:

status:
  completionTimestamp: 2022-08-09T10:53:56Z
  message: |-
    running Restic backup, stderr=Fatal: wrong password or no key found
    : exit status 1
  phase: Completed

@nwakalka
Copy link
Author

Yes, i have manually modified the files. Let me try to use the v1.9.1-rc2 and test.

@nwakalka
Copy link
Author

Hi @ywk253100,

I have used the v1.9.1-rc2 patch and tested the backup again.
still facing the same error as above

nwakalka@nwakalka-virtual-machine:~/workspace/go/src/nwakalka-kind/multiple-pod-pv-test$ oc get podvolumebackup -n mcs-customer-backup customerbackup-05-2680dee7-frwqg -o yaml
apiVersion: velero.io/v1
kind: PodVolumeBackup
metadata:
  annotations:
    velero.io/pvc-name: e2eapp-pv-claim-test-backend
  creationTimestamp: 2022-08-12T09:16:11Z
  generation: 5
  labels:
    velero.io/backup-name: customerbackup-05-2680dee7
    velero.io/backup-uid: da153ad5-d69c-42e5-802e-151022b0073c
    velero.io/pvc-uid: 942d8b64-5a37-4b38-9764-73e1bd40bba1
  managedFields:
  - apiVersion: velero.io/v1

spec:
  backupStorageLocation: default
  node: t001-gvkns-worker-s3xlarge4-eu-de-03-gwxmr
  pod:
    kind: Pod
    name: backend-app-5c7f8f6f67-mx9hc
    namespace: nwakalka
    uid: 4bdc38c6-104e-4a51-8d64-17389ff5c658
  repoIdentifier: s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/t001-gvkns-backup/backup-v2/restic/nwakalka
  tags:
    backup: customerbackup-05-2680dee7
    backup-uid: da153ad5-d69c-42e5-802e-151022b0073c
    ns: nwakalka
    pod: backend-app-5c7f8f6f67-mx9hc
    pod-uid: 4bdc38c6-104e-4a51-8d64-17389ff5c658
    pvc-uid: 942d8b64-5a37-4b38-9764-73e1bd40bba1
    volume: e2eapp-storage
  volume: e2eapp-storage
status:
  completionTimestamp: 2022-08-12T09:16:14Z
  message: |-
    running Restic backup, stderr=Fatal: wrong password or no key found
    : exit status 1
  phase: Completed
  progress:
    bytesDone: 42949704
    totalBytes: 42949704
  snapshotID: 7071fc6f
  startTimestamp: 2022-08-12T09:16:11Z

@ywk253100
Copy link
Contributor

Is this a fresh installation or an upgrade one? Did you update the CRD as well if it is upgraded from the previous version? Could you try the fresh installation?

If the error is still there, please run velero debug to collect all the necessary information

@ywk253100 ywk253100 self-assigned this Aug 15, 2022
@ywk253100 ywk253100 added the Needs info Waiting for information label Aug 15, 2022
@nwakalka
Copy link
Author

Yes, we did the fresh installation with upgrading all crds with latest version as you mentioned.

Please find the attached velero logs in debug mode.

velero_debug_logs.txt

@nwakalka
Copy link
Author

any update @ywk253100 ?

@nwakalka
Copy link
Author

any update @ywk253100

@ywk253100 ywk253100 added Needs reproduction and removed Needs info Waiting for information labels Sep 15, 2022
@ywk253100
Copy link
Contributor

@nwakalka Sorry for the late reply.

I didn't find any useful information in the log file. I'm not sure whether this issue is related to your environment/configuration or it's a bug of Velero, but I'm afraid have no time to do more investigation.

Could you debug it further in your local environment and appreciate it if you can find out the root cause?

@nwakalka
Copy link
Author

nwakalka commented Sep 19, 2022

Hi @ywk253100 ,

After going through logs again, we have observed incorrect order of restic snapshots and initialization of restic repository.
please find logs for reference

time="2022-08-17T05:38:24Z" level=debug msg="Acquiring lock" backupLocation=default logSource="pkg/restic/repository_ensurer.go:122" volumeNamespace=e2e-br-src-singlepv-muk-wqmzxj

time="2022-08-17T05:38:24Z" level=debug msg="Acquired lock" backupLocation=default logSource="pkg/restic/repository_ensurer.go:131" volumeNamespace=e2e-br-src-singlepv-muk-wqmzxj

time="2022-08-17T05:38:24Z" level=debug msg="No repository found, creating one" backupLocation=default logSource="pkg/restic/repository_ensurer.go:151" volumeNamespace=e2e-br-src-singlepv-muk-wqmzxj

time="2022-08-17T05:38:24Z" level=info msg="Initializing restic repository" logSource="pkg/controller/restic_repository_controller.go:118" resticRepo=mcs-customer-backup/e2e-br-src-singlepv-muk-wqmzxj-default-7pj24

time="2022-08-17T05:38:24Z" level=debug msg="Ran restic command" command="restic snapshots --repo=s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/t007-wr9gc-backup/backup-v2/restic/e2e-br-src-singlepv-muk-wqmzxj --password-file=/tmp/credentials/mcs-customer-backup/velero-restic-credentials-repository-password --cacert=/tmp/cacert-default671251002 --cache-dir=/scratch/.cache/restic --latest=1" logSource="pkg/restic/repository_manager.go:294" repository=e2e-br-src-singlepv-muk-wqmzxj stderr="Fatal: unable to open config file: Stat: The specified key does not exist.\nIs there a repository at the following location?\ns3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/t007-wr9gc-backup/backup-v2/restic/e2e-br-src-singlepv-muk-wqmzxj\n" stdout=

time="2022-08-17T05:38:26Z" level=debug msg="Ran restic command" command="restic init --repo=s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/t007-wr9gc-backup/backup-v2/restic/e2e-br-src-singlepv-muk-wqmzxj --password-file=/tmp/credentials/mcs-customer-backup/velero-restic-credentials-repository-password --cacert=/tmp/cacert-default3207158131 --cache-dir=/scratch/.cache/restic" logSource="pkg/restic/repository_manager.go:294" repository=e2e-br-src-singlepv-muk-wqmzxj stderr= stdout="created restic repository 3d68d64577 at s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/t007-wr9gc-backup/backup-v2/restic/e2e-br-src-singlepv-muk-wqmzxj\n\nPlease note that knowledge of your password is required to access\nthe repository. Losing your password means that your data is\nirrecoverably lost.\n"

time="2022-08-17T05:38:26Z" level=info msg="Initializing restic repository" logSource="pkg/controller/restic_repository_controller.go:118" resticRepo=mcs-customer-backup/e2e-br-src-singlepv-muk-wqmzxj-default-7pj24

time="2022-08-17T05:38:26Z" level=debug msg="Released lock" backupLocation=default logSource="pkg/restic/repository_ensurer.go:128" volumeNamespace=e2e-br-src-singlepv-muk-wqmzxj

time="2022-08-17T05:38:27Z" level=debug msg="Ran restic command" command="restic snapshots --repo=s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/t007-wr9gc-backup/backup-v2/restic/e2e-br-src-singlepv-muk-wqmzxj --password-file=/tmp/credentials/mcs-customer-backup/velero-restic-credentials-repository-password --cacert=/tmp/cacert-default1211308944 --cache-dir=/scratch/.cache/restic --latest=1" logSource="pkg/restic/repository_manager.go:294" repository=e2e-br-src-singlepv-muk-wqmzxj stderr= stdout=

Execution it's not in correct order, restic snapshots are getting executed before initializing restic repository.

@ywk253100
Copy link
Contributor

@nwakalka This is the correct behavior, see the code, it tries to run snapshot command and if it fails run the init command.

Could you show me the logs of restic deamonset? I think that should contain more information about the error

@nwakalka
Copy link
Author

nwakalka commented Sep 26, 2022

@ywk253100

please the attached logs of restic, as i don't see any error here.

nwakalka@nwakalka-virtual-machine:~$ oc logs -n mcs-customer-backup restic-rkpxq -f
time="2022-10-04T16:18:01Z" level=info msg="Setting log-level to DEBUG"
time="2022-10-04T16:18:01Z" level=info msg="Starting Velero restic server feature-release-v1.9.2 (d7d4a7924e5b17a019555b8789ccb33b812bbad4)" logSource="pkg/cmd/cli/restic/server.go:88"
2022-10-04T16:18:02.401Z	INFO	controller-runtime.metrics	metrics server is starting to listen	{"addr": ":8086"}
time="2022-10-04T16:18:02Z" level=info msg="Starting metric server for restic at address [:8085]" logSource="pkg/cmd/cli/restic/server.go:179"
time="2022-10-04T16:18:02Z" level=info msg="Starting controllers" logSource="pkg/cmd/cli/restic/server.go:190"
time="2022-10-04T16:18:02Z" level=info msg="Controllers starting..." logSource="pkg/cmd/cli/restic/server.go:221"
2022-10-04T16:18:02.662Z	INFO	starting metrics server	{"path": "/metrics"}
2022-10-04T16:18:02.662Z	INFO	controller.podvolumebackup	Starting EventSource	{"reconciler group": "velero.io", "reconciler kind": "PodVolumeBackup", "source": "kind source: /, Kind="}
2022-10-04T16:18:02.662Z	INFO	controller.podvolumebackup	Starting Controller	{"reconciler group": "velero.io", "reconciler kind": "PodVolumeBackup"}
2022-10-04T16:18:02.679Z	INFO	controller.podvolumerestore	Starting EventSource	{"reconciler group": "velero.io", "reconciler kind": "PodVolumeRestore", "source": "kind source: /, Kind="}
2022-10-04T16:18:02.679Z	INFO	controller.podvolumerestore	Starting EventSource	{"reconciler group": "velero.io", "reconciler kind": "PodVolumeRestore", "source": "kind source: /, Kind="}
2022-10-04T16:18:02.679Z	INFO	controller.podvolumerestore	Starting Controller	{"reconciler group": "velero.io", "reconciler kind": "PodVolumeRestore"}
2022-10-04T16:18:02.780Z	INFO	controller.podvolumerestore	Starting workers	{"reconciler group": "velero.io", "reconciler kind": "PodVolumeRestore", "worker count": 1}
2022-10-04T16:18:02.781Z	INFO	controller.podvolumebackup	Starting workers	{"reconciler group": "velero.io", "reconciler kind": "PodVolumeBackup", "worker count": 1}
time="2022-10-04T16:22:37Z" level=info msg="PodVolumeBackup starting" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:87" podvolumebackup=mcs-customer-backup/customerbackup-9khdl
time="2022-10-04T16:22:37Z" level=debug msg="Looking for path matching glob" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:305" pathGlob="/host_pods/bad9bfaf-ff2c-4cda-b8ff-c94e8623d4c5/volumes/*/pvc-1691fd0b-7f64-4568-8520-af7f4ee6de79" podvolumebackup=mcs-customer-backup/customerbackup-9khdl
time="2022-10-04T16:22:37Z" level=debug msg="This is a valid volume path: /host_pods/bad9bfaf-ff2c-4cda-b8ff-c94e8623d4c5/volumes/kubernetes.io~nfs/pvc-1691fd0b-7f64-4568-8520-af7f4ee6de79." backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/util/kube/utils.go:210" podvolumebackup=mcs-customer-backup/customerbackup-9khdl
time="2022-10-04T16:22:37Z" level=debug msg="Found path matching glob" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:311" path="/host_pods/bad9bfaf-ff2c-4cda-b8ff-c94e8623d4c5/volumes/kubernetes.io~nfs/pvc-1691fd0b-7f64-4568-8520-af7f4ee6de79" podvolumebackup=mcs-customer-backup/customerbackup-9khdl
time="2022-10-04T16:22:38Z" level=info msg="Looking for most recent completed PodVolumeBackup for this PVC" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:216" podvolumebackup=mcs-customer-backup/customerbackup-9khdl pvcUID=1691fd0b-7f64-4568-8520-af7f4ee6de79
time="2022-10-04T16:22:38Z" level=info msg="No completed PodVolumeBackup found for PVC" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:254" podvolumebackup=mcs-customer-backup/customerbackup-9khdl pvcUID=1691fd0b-7f64-4568-8520-af7f4ee6de79
time="2022-10-04T16:22:38Z" level=info msg="No parent snapshot found for PVC, not using --parent flag for this backup" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:356" podvolumebackup=mcs-customer-backup/customerbackup-9khdl
time="2022-10-04T16:22:40Z" level=debug msg="Ran command=restic backup --repo=s3:https://s3-backup-proxy.mcs-backup.svc.cluster.local/test-01ds7j-backup/backup-v2/restic/nwakalka --password-file=/tmp/credentials/mcs-customer-backup/velero-restic-credentials-repository-password --cacert=/tmp/cacert-default458548719 --cache-dir=/scratch/.cache/restic . --tag=backup=customerbackup --tag=backup-uid=c6a76474-38ee-4ace-91fc-0003229348b0 --tag=ns=nwakalka --tag=pod=backend-app-7c998644df-bjj8g --tag=pod-uid=bad9bfaf-ff2c-4cda-b8ff-c94e8623d4c5 --tag=pvc-uid=1691fd0b-7f64-4568-8520-af7f4ee6de79 --tag=volume=e2eapp-storage --host=velero --json, stdout={\"message_type\":\"summary\",\"files_new\":2,\"files_changed\":0,\"files_unmodified\":0,\"dirs_new\":1,\"dirs_changed\":0,\"dirs_unmodified\":0,\"data_blobs\":4,\"tree_blobs\":2,\"data_added\":1439900,\"total_files_processed\":2,\"total_bytes_processed\":201166947,\"total_duration\":1.033140605,\"snapshot_id\":\"d4be1e8d\"}, stderr=" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:160" podvolumebackup=mcs-customer-backup/customerbackup-9khdl
time="2022-10-04T16:22:41Z" level=info msg="PodVolumeBackup completed" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:200" podvolumebackup=mcs-customer-backup/customerbackup-9khdl
time="2022-10-04T16:22:41Z" level=info msg="PodVolumeBackup starting" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:87" podvolumebackup=mcs-customer-backup/customerbackup-9khdl
time="2022-10-04T16:22:41Z" level=debug msg="PodVolumeBackup is not new, not processing" backup=mcs-customer-backup/customerbackup controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:98" podvolumebackup=mcs-customer-backup/customerbackup-9khdl







@kaovilai
Copy link
Member

kaovilai commented Oct 4, 2022

@nwakalka
Copy link
Author

nwakalka commented Oct 7, 2022

Hi @ywk253100 , @sseago ,

I checked with release 1.7 version and it's working as expected. but for v1.8 i am facing below error

time="2022-10-07T05:15:34Z" level=error msg="Error backing up item" backup=mcs-customer-backup/customerbackup error="pod volume backup failed: running Restic backup, stderr=unable to read root certificate: open /tmp/cacert-default3351459972: no such file or directory\ngithub.com/restic/restic/internal/backend.Transport\n\t/restic/internal/backend/http_transport.go:110\nmain.open\n\t/restic/cmd/restic/global.go:687\nmain.OpenRepository\n\t/restic/cmd/restic/global.go:421\nmain.runBackup\n\t/restic/cmd/restic/cmd_backup.go:524\nmain.glob..func2\n\t/restic/cmd/restic/cmd_backup.go:61\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/build/go/pkg/mod/github.com/spf13/[email protected]/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/build/go/pkg/mod/github.com/spf13/[email protected]/command.go:974\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/build/go/pkg/mod/github.com/spf13/[email protected]/command.go:902\nmain.main\n\t/restic/cmd/restic/main.go:98\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\n: exit status 1" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:199" error.function="github.com/vmware-tanzu/velero/pkg/restic.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:418" name=backend-app-7c998644df-4cf4l

@nwakalka
Copy link
Author

Hi @ywk253100 , @sseago ,

Currently our issue is resolved with velero 1.9, as we have two namespaces and both namespaces having velero and restic pods. once removed velero and restic from one of namespace our backups and restore are success. till velero 1.7 both namespaces have velero and restic and are working independently. do we have any other provision now ?

@sseago
Copy link
Collaborator

sseago commented Oct 11, 2022

@nwakalka Were both velero installations the same version? Two independent velero installations should still work as of 1.9. Nothing has been changed that was intended to break this, although it's possible that some of the refactoring that was done introduced a regression. However, if you have two different velero versions installed, then one of them will have the wrong CRDs, since CRDs are cluster-scoped -- in that case the installation with the wrong CRDs will probably not work properly.

@nwakalka
Copy link
Author

Hi @sseago ,

Thank you for replying. yes, we are deploying same versions of velero and yes correct CRDs of same versions.

@balpert89
Copy link

balpert89 commented Nov 23, 2022

Hi all,

I had some time to dig into this issue (disclaimer: I am a colleage of @nwakalka).

What is missed here is the fact that we run two complete stacks of velero in two separate namespaces, including restic. There is one distinct daemonset for restic in each namespace.

With up until (and including) v1.7.2 velero & restic have acted only on podvolumebackups which were created in their respective namespace. Starting with v1.8, both restic instances will reconcile a podvolumebackup, ignoring the actual namespace.

v1.7.2 behavior:
"namespace A"/restic reconciled podvolumebackups only in "namespace A", ignoring "namespace B".
"namespace B"/restic reconciled podvolumebackups only in "namespace B", ignoring "namespace A".

v1.8 (and above) behavior:
"namespace A"/restic now reconciles podvolumebackups in "namespace A" & "namespace B".
"namespace B"/restic now reconciles podvolumebackups in "namespace A" & "namespace B".

This new behavior results in the described status above.

Now my questions:

  1. Is it not intended to run more than one restic instance per cluster?
  2. Is there a way to instruct restic to act like the v1.7.2 behavior?

@kaovilai
Copy link
Member

I think "v1.8 behavior" if true is a bug.

@stale
Copy link

stale bot commented Feb 2, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the staled label Feb 2, 2023
@stale
Copy link

stale bot commented Mar 14, 2023

Closing the stale issue.

@stale stale bot closed this as completed Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants