Velero v1.9.0 breaks restic backups to multiple backup storage locations #5164

fl42 · 2022-07-31T14:04:07Z

Hello,

What steps did you take and what happened:
When using multiple backup storage locations with restic enabled backups, all restic backups fail from the second backup onwards (the first backup is successful, then all backups fail).

What did you expect to happen:
Restic backups should be successful.

The following information will help us better understand what's going on:

I'm using 2 backup storage locations (separate S3 buckets):

default
dr

I'm using 3 schedules :

hourly (to default location)
daily (to default location)
daily-dr (to dr location)

Failed backup log:

time="2022-07-23T10:17:27Z" level=info msg="1 errors encountered backup up item" backup=velero/daily-dr-20220723101701 logSource="pkg/backup/backup.go:413" name=mailserver-656b69c6cb-h5n8h
time="2022-07-23T10:17:27Z" level=error msg="Error backing up item" backup=velero/daily-dr-20220723101701 error="pod volume backup failed: running Restic backup, stderr=Fatal: invalid id \"df9ec978\": no matching ID found for prefix \"df9ec978\"\n: exit status 1" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:199" error.function="github.com/vmware-tanzu/velero/pkg/restic.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:417" name=mailserver-656b69c6cb-h5n8h

Restic pod:

time="2022-07-23T10:17:25Z" level=info msg="Looking for most recent completed PodVolumeBackup for this PVC" backup=velero/daily-dr-20220723101701 controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:225" podvolumebackup=velero/daily-dr-20220723101701-fkdzm pvcUID=ff50086e-214c-443d-a83a-612ea4a4ccab
time="2022-07-23T10:17:25Z" level=info msg="Found most recent completed PodVolumeBackup for PVC" backup=velero/daily-dr-20220723101701 controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:270" parentPodVolumeBackup=hourly-20220722100000-k5qwq parentSnapshotID=df9ec978 podvolumebackup=velero/daily-dr-20220723101701-fkdzm pvcUID=ff50086e-214c-443d-a83a-612ea4a4ccab
time="2022-07-23T10:17:25Z" level=info msg="Setting --parent flag for this backup" backup=velero/daily-dr-20220723101701 controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:370" parentSnapshotID=df9ec978 podvolumebackup=velero/daily-dr-20220723101701-fkdzm

The root cause is the previous PodVolumeBackup is wrongly identified: parentPodVolumeBackup=hourly-20220722100000-k5qwq for backup=velero/daily-dr-20220723101701 (this is not the same location!)

apiVersion: velero.io/v1
kind: PodVolumeBackup
metadata:
[...]
  name: hourly-20220722100000-k5qwq
spec:
  backupStorageLocation: default
[...]
status:
  completionTimestamp: "2022-07-22T19:30:16Z"
  phase: Completed
  progress:
    bytesDone: xx
    totalBytes: xx
  snapshotID: df9ec978
  startTimestamp: "2022-07-22T19:29:38Z"

apiVersion: velero.io/v1
kind: PodVolumeBackup
metadata:
[...]
  name: daily-dr-20220723101701-fkdzm
[...]
spec:
  backupStorageLocation: dr
[...]
status:
  completionTimestamp: "2022-07-23T10:17:27Z"
  message: |-
    running Restic backup, stderr=Fatal: invalid id "df9ec978": no matching ID found for prefix "df9ec978"
    : exit status 1
  phase: Failed
  progress: {}
  startTimestamp: "2022-07-23T10:17:25Z"

Anything else you would like to add:

This issue occurs only for v1.9.0.
Downgrading to v1.8.1 fixed the issue.

Surprisingly the check seems present to handle exactly this case:
https://github.com/vmware-tanzu/velero/blob/v1.9.0/pkg/controller/pod_volume_backup_controller.go#L246

Environment:

Velero v1.9.0
Kubernetes v1.23.6

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

👍 for "I would like to see this bug fixed as soon as possible"
👎 for "There are more important bugs to focus on right now"

The text was updated successfully, but these errors were encountered:

qiuming-best · 2022-08-02T07:14:58Z

@fl42 currently, I've tested following as your step.
I've created two schedules that will trigger every 5min. everything goes in normal ...

At the same time , I will check the related codes in detail

qiuming-best · 2022-08-02T10:11:30Z

@fl42 I've found out what the problem is.
var mostRecentPVB *velerov1api.PodVolumeBackup in here which is a pointer, the pointer will be changed with the traversing of pvb item in pvbList.Items. so mostRecentPVB will be assigned with the last item in pvbList.Items

fl42 · 2022-08-04T11:01:40Z

Thanks for the quick fix!

qiuming-best added the Restic Relates to the restic integration label Aug 1, 2022

qiuming-best self-assigned this Aug 1, 2022

This was referenced Aug 2, 2022

Fix restic backups to multiple backup storage locations bug #5172

Merged

[Cherrypick - 1.9] Fix restic backups to multiple backup storage locations bug #5175

Merged

fl42 closed this as completed Aug 4, 2022

qiuming-best added the target/v1.9.1 label Aug 9, 2022

qiuming-best added this to the v1.9.1 milestone Aug 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Velero v1.9.0 breaks restic backups to multiple backup storage locations #5164

Velero v1.9.0 breaks restic backups to multiple backup storage locations #5164

fl42 commented Jul 31, 2022

qiuming-best commented Aug 2, 2022

qiuming-best commented Aug 2, 2022

fl42 commented Aug 4, 2022

Velero v1.9.0 breaks restic backups to multiple backup storage locations #5164

Velero v1.9.0 breaks restic backups to multiple backup storage locations #5164

Comments

fl42 commented Jul 31, 2022

qiuming-best commented Aug 2, 2022

qiuming-best commented Aug 2, 2022

fl42 commented Aug 4, 2022