Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Velero v1.9.0 breaks restic backups to multiple backup storage locations #5164

Closed
fl42 opened this issue Jul 31, 2022 · 3 comments
Closed

Velero v1.9.0 breaks restic backups to multiple backup storage locations #5164

fl42 opened this issue Jul 31, 2022 · 3 comments
Assignees
Labels
Restic Relates to the restic integration target/v1.9.1
Milestone

Comments

@fl42
Copy link

fl42 commented Jul 31, 2022

Hello,

What steps did you take and what happened:
When using multiple backup storage locations with restic enabled backups, all restic backups fail from the second backup onwards (the first backup is successful, then all backups fail).

What did you expect to happen:
Restic backups should be successful.

The following information will help us better understand what's going on:

I'm using 2 backup storage locations (separate S3 buckets):

  • default
  • dr

I'm using 3 schedules :

  • hourly (to default location)
  • daily (to default location)
  • daily-dr (to dr location)

Failed backup log:

time="2022-07-23T10:17:27Z" level=info msg="1 errors encountered backup up item" backup=velero/daily-dr-20220723101701 logSource="pkg/backup/backup.go:413" name=mailserver-656b69c6cb-h5n8h
time="2022-07-23T10:17:27Z" level=error msg="Error backing up item" backup=velero/daily-dr-20220723101701 error="pod volume backup failed: running Restic backup, stderr=Fatal: invalid id \"df9ec978\": no matching ID found for prefix \"df9ec978\"\n: exit status 1" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:199" error.function="github.com/vmware-tanzu/velero/pkg/restic.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:417" name=mailserver-656b69c6cb-h5n8h

Restic pod:

time="2022-07-23T10:17:25Z" level=info msg="Looking for most recent completed PodVolumeBackup for this PVC" backup=velero/daily-dr-20220723101701 controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:225" podvolumebackup=velero/daily-dr-20220723101701-fkdzm pvcUID=ff50086e-214c-443d-a83a-612ea4a4ccab
time="2022-07-23T10:17:25Z" level=info msg="Found most recent completed PodVolumeBackup for PVC" backup=velero/daily-dr-20220723101701 controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:270" parentPodVolumeBackup=hourly-20220722100000-k5qwq parentSnapshotID=df9ec978 podvolumebackup=velero/daily-dr-20220723101701-fkdzm pvcUID=ff50086e-214c-443d-a83a-612ea4a4ccab
time="2022-07-23T10:17:25Z" level=info msg="Setting --parent flag for this backup" backup=velero/daily-dr-20220723101701 controller=podvolumebackup logSource="pkg/controller/pod_volume_backup_controller.go:370" parentSnapshotID=df9ec978 podvolumebackup=velero/daily-dr-20220723101701-fkdzm

The root cause is the previous PodVolumeBackup is wrongly identified: parentPodVolumeBackup=hourly-20220722100000-k5qwq for backup=velero/daily-dr-20220723101701 (this is not the same location!)

apiVersion: velero.io/v1
kind: PodVolumeBackup
metadata:
[...]
  name: hourly-20220722100000-k5qwq
spec:
  backupStorageLocation: default
[...]
status:
  completionTimestamp: "2022-07-22T19:30:16Z"
  phase: Completed
  progress:
    bytesDone: xx
    totalBytes: xx
  snapshotID: df9ec978
  startTimestamp: "2022-07-22T19:29:38Z"
apiVersion: velero.io/v1
kind: PodVolumeBackup
metadata:
[...]
  name: daily-dr-20220723101701-fkdzm
[...]
spec:
  backupStorageLocation: dr
[...]
status:
  completionTimestamp: "2022-07-23T10:17:27Z"
  message: |-
    running Restic backup, stderr=Fatal: invalid id "df9ec978": no matching ID found for prefix "df9ec978"
    : exit status 1
  phase: Failed
  progress: {}
  startTimestamp: "2022-07-23T10:17:25Z"

Anything else you would like to add:

This issue occurs only for v1.9.0.
Downgrading to v1.8.1 fixed the issue.

Surprisingly the check seems present to handle exactly this case:
https://github.com/vmware-tanzu/velero/blob/v1.9.0/pkg/controller/pod_volume_backup_controller.go#L246

Environment:

Velero v1.9.0
Kubernetes v1.23.6

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@qiuming-best qiuming-best added the Restic Relates to the restic integration label Aug 1, 2022
@qiuming-best qiuming-best self-assigned this Aug 1, 2022
@qiuming-best
Copy link
Contributor

@fl42 currently, I've tested following as your step.
I've created two schedules that will trigger every 5min. everything goes in normal ...
WX20220802-151343@2x
At the same time , I will check the related codes in detail

@qiuming-best
Copy link
Contributor

@fl42 I've found out what the problem is.
var mostRecentPVB *velerov1api.PodVolumeBackup in here which is a pointer, the pointer will be changed with the traversing of pvb item in pvbList.Items. so mostRecentPVB will be assigned with the last item in pvbList.Items

@fl42
Copy link
Author

fl42 commented Aug 4, 2022

Thanks for the quick fix!

@fl42 fl42 closed this as completed Aug 4, 2022
@qiuming-best qiuming-best added this to the v1.9.1 milestone Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Restic Relates to the restic integration target/v1.9.1
Projects
None yet
Development

No branches or pull requests

2 participants