Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade restic version to get multithreaded restore feature when it's released #916

Closed
sachinar opened this issue Oct 9, 2018 · 4 comments
Milestone

Comments

@sachinar
Copy link

sachinar commented Oct 9, 2018

@ncdc I tried ark tool it worked for some of our deployments, but it is not working for Kafka for restoring volumes.

I tried with these steps,

For taking backup.

kubectl annotate pod/broker-2 backup.ark.heptio.com/backup-volumes=config,broker-data
kubectl annotate pod/broker-1 backup.ark.heptio.com/backup-volumes=config,broker-data

kubectl annotate pod/broker-0 restores.ark.heptio.com/backup-volumes=config,broker-data
kubectl annotate pod/broker-2 restores.ark.heptio.com/backup-volumes=config,broker-data
kubectl annotate pod/broker-1 restores.ark.heptio.com/backup-volumes=config,broker-data

kubectl annotate statefulset/broker backup.ark.heptio.com/backup-volumes=config,broker-data
kubectl annotate statefulset/broker restores.ark.heptio.com/backup-volumes=config,broker-data

ark backup create broker-backup --selector app=broker --snapshot-volumes

This worked & backup was completed without any error.

Then I have deleted everything for restoring backup.

I tried with this command,

ark restore create broker-restore --from-backup broker-backup

It is get completed but showing error in ark restore log <restore-name>

logSource="pkg/restore/restore.go:583"
time="2018-10-09T07:17:02Z" level=info msg="Getting client for apps/v1, Kind=ControllerRevision" logSource="pkg/restore/restore.go:643"
time="2018-10-09T07:17:02Z" level=info msg="Restoring ControllerRevision: broker-5b5c484896" logSource="pkg/restore/restore.go:768"
time="2018-10-09T07:17:02Z" level=info msg="Restoring resource 'endpoints' into namespace 'default' from: /tmp/364264448/resources/endpoints/namespaces/default" logSource="pkg/restore/restore.go:583"
time="2018-10-09T07:17:02Z" level=info msg="Getting client for /v1, Kind=Endpoints" logSource="pkg/restore/restore.go:643"
time="2018-10-09T07:17:02Z" level=info msg="Restoring Endpoints: broker" logSource="pkg/restore/restore.go:768"
time="2018-10-09T07:17:03Z" level=info msg="Restoring resource 'services' into namespace 'default' from: /tmp/364264448/resources/services/namespaces/default" logSource="pkg/restore/restore.go:583"
time="2018-10-09T07:17:03Z" level=info msg="Getting client for /v1, Kind=Service" logSource="pkg/restore/restore.go:643"
time="2018-10-09T07:17:03Z" level=info msg="Executing item action for services" logSource="pkg/restore/restore.go:728"
time="2018-10-09T07:17:03Z" level=info msg="Restoring Service: broker" logSource="pkg/restore/restore.go:768"
time="2018-10-09T07:17:03Z" level=info msg="Restoring resource 'statefulsets.apps' into namespace 'default' from: /tmp/364264448/resources/statefulsets.apps/namespaces/default" logSource="pkg/restore/restore.go:583"
time="2018-10-09T07:17:03Z" level=info msg="Getting client for apps/v1, Kind=StatefulSet" logSource="pkg/restore/restore.go:643"
time="2018-10-09T07:17:03Z" level=info msg="Restoring StatefulSet: broker" logSource="pkg/restore/restore.go:768"
time="2018-10-09T08:17:01Z" level=error msg="unable to successfully complete restic restores of pod's volumes" error="timed out waiting for all PodVolumeRestores to complete" logSource="pkg/restore/restore.go:844"
time="2018-10-09T08:17:01Z" level=error msg="unable to successfully complete restic restores of pod's volumes" error="timed out waiting for all PodVolumeRestores to complete" logSource="pkg/restore/restore.go:844"
time="2018-10-09T08:17:01Z" level=error msg="unable to successfully complete restic restores of pod's volumes" error="timed out waiting for all PodVolumeRestores to complete" logSource="pkg/restore/restore.go:844"

@nrb
Copy link
Contributor

nrb commented Oct 9, 2018

Hi @sachinar,

Based on the logs, it looks like there's a timeout happening when restic is trying to restore your data - the restores begin at 07:17, and then fail at 08:17.

In v0.9.x, you can change this timeout with the podVolumeOperationTimeout value on your Config. You can set it to any valid Duration string representation. I've listed an example configuration below,

---
apiVersion: ark.heptio.com/v1
kind: Config
metadata:
  namespace: heptio-ark
  name: default
persistentVolumeProvider:
  name: aws
  config:
    region: <YOUR_REGION>
backupStorageProvider:
  name: aws
  bucket: <YOUR_BUCKET>
   resticLocation: <YOUR_RESTIC_LOCATION>
  config:
    region: <YOUR_REGION>
backupSyncPeriod: 30m
gcSyncPeriod: 30m
scheduleSyncPeriod: 1m
restoreOnlyMode: false

# This part is what was added
podVolumeOperationTimeout: 2h

This appears to be a limitation of restic currently. They have a PR (restic/restic#1719) that is meant to address it, which we're keeping an eye on.

Out of curiosity, roughly how much data is in the volumes being restored?

@sachinar
Copy link
Author

@nrb We are having near about 6GB of data in volumes for restore.

@rosskukulinski
Copy link
Contributor

@nrb FYI the restic PR was was merged to master, but it's not clear how/when the Restic community decides when to ship a release.

@nrb
Copy link
Contributor

nrb commented Nov 27, 2018

@rosskukulinski Thanks for that update - I saw it, but correct, we are waiting on a new release.

@skriss skriss added this to the v1.0.0 milestone Nov 28, 2018
@skriss skriss changed the title unable to successfully complete restic restores of pod's volumes" error="timed out waiting for all PodVolumeRestores to complete" upgrade restic version to get multithreaded restore feature when it's released Nov 28, 2018
@nrb nrb closed this as completed in #1156 Feb 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants