Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design doc for upload progress monitoring #3416

Conversation

dsu-igeek
Copy link
Contributor

Signed-off-by: Dave Smith-Uchida [email protected]

@nrb nrb added Area/Design Design Documents kind/release-blocker Must fix issues for the coming release (milestone) and removed kind/release-blocker Must fix issues for the coming release (milestone) labels Feb 8, 2021
@dsu-igeek dsu-igeek added this to the v1.7.0 milestone Feb 22, 2021
@dsu-igeek dsu-igeek force-pushed the snapshot-upload-progress-design-12-14-2020 branch from 3115591 to cae5d85 Compare March 10, 2021 02:11
Copy link
Member

@ashish-amarnath ashish-amarnath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an alternate proposal:
The goal here is to ensure the durability of Velero backups, and that backups, once completed, can be fully restored.
To accomplish this we propose:

  1. Introducing a new BackupProgressTracker gRpc service that would be implemented as a plugin. This service would have the familiar AppliesTo and Execute rpcs. The AppliesTo rpc would indicate what resources this plugin applies to, and the Execute (this can be called something else) would take as input ExecuteRequest with fields item and the backup (similar to BackupItemAction's ExecuteRequest), and return, as output, ExecuteResponse with fields similar to the UploadProgress` that's in this proposal.

  2. Following changes in the Backup controller:
    i. Here, instead of setting the backup phase as Completed would set the phase as Uploading.
    ii. Here, start a progress tracker goroutine which would periodically, say every 1s, invoke all the BackupProgressTracker plugins' Execute rpc to obtain the status of the backup for each item and patch the backup object's status with this info. Once the UploadProgress from all the plugins have reached a state indicating durability of the backup, the goroutine returns which is when the backup's phase will be set to completed/uploaded.

  3. Requires the following changes to the backup API:

    • Introduce a map[string]UploadProgress as a field in the backup status which will be patched by the goroutine explained in 2.ii above. This will be mapping item to its latest UploadProgress

Benefits of this approach:

  1. Introducing a new gRpc service is not a breaking change. (So is adding a new rpc to an existing service, but this is more intuitive in the go-plugin framework.)
  2. It is more extendable across resource types as it enables tracking upload progress of all resource types, even those that don't have a corresponding BackupItemAction or a VolumeSnapshotter interface.
  3. UploadProgress of all items in the backup can be tracked in a consistent way and in one single point.

We may have previously considered this approach but this seems more intuitive to me now than modifying the existing BackupItemAction and VolumeSnapshotter plugins. WDYT?

@carlisia carlisia removed the request for review from nrb April 5, 2021 14:55
@carlisia carlisia requested review from nrb and removed request for jenting April 15, 2021 22:29
@dsu-igeek
Copy link
Contributor Author

Here is an alternate proposal:
The goal here is to ensure the durability of Velero backups, and that backups, once completed, can be fully restored.
To accomplish this we propose:

  1. Introducing a new BackupProgressTracker gRpc service that would be implemented as a plugin. This service would have the familiar AppliesTo and Execute rpcs. The AppliesTo rpc would indicate what resources this plugin applies to, and the Execute (this can be called something else) would take as input ExecuteRequest with fields item and the backup (similar to BackupItemAction's ExecuteRequest), and return, as output, ExecuteResponse with fields similar to the UploadProgress` that's in this proposal.

  2. Following changes in the Backup controller:
    i. Here, instead of setting the backup phase as Completed would set the phase as Uploading.
    ii. Here, start a progress tracker goroutine which would periodically, say every 1s, invoke all the BackupProgressTracker plugins' Execute rpc to obtain the status of the backup for each item and patch the backup object's status with this info. Once the UploadProgress from all the plugins have reached a state indicating durability of the backup, the goroutine returns which is when the backup's phase will be set to completed/uploaded.

  3. Requires the following changes to the backup API:

    • Introduce a map[string]UploadProgress as a field in the backup status which will be patched by the goroutine explained in 2.ii above. This will be mapping item to its latest UploadProgress

Benefits of this approach:

  1. Introducing a new gRpc service is not a breaking change. (So is adding a new rpc to an existing service, but this is more intuitive in the go-plugin framework.)
  2. It is more extendable across resource types as it enables tracking upload progress of all resource types, even those that don't have a corresponding BackupItemAction or a VolumeSnapshotter interface.
  3. UploadProgress of all items in the backup can be tracked in a consistent way and in one single point.

We may have previously considered this approach but this seems more intuitive to me now than modifying the existing BackupItemAction and VolumeSnapshotter plugins. WDYT?

Upload progress may take a long time. We want to be able to recover from restarts of the Velero server while uploads are in progress. Also, if the backup is deleted before the upload finishes we want to terminate any polling on upload status and, if possible, abort the upload.

The design introduces a BackupItemProgress plug-in for BackupItemAction monitoring. In order to monitor the upload progress, we will need to know what to monitor, which is why the BackupItemAction needs to be modified to return a snapshot ID. It might be worthwhile to add a new BackupItemActionWithSnapshot plug-in rather than add the snapshot ID return to the existing plug-ins.

The VolumeSnapshotter plugins work at the volume level, so invoking a BackupItemProgress action for them requires that they map the PV to a volume. Adding the UploadProgress API should not be a breaking change once API versioning has been introduced.

@dsu-igeek dsu-igeek removed request for zubron and nrb April 28, 2021 16:47
Copy link
Member

@ashish-amarnath ashish-amarnath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach LGTM.
Added a few comments that can be addressed/figured out during implementation.

design/upload-progress.md Show resolved Hide resolved
design/upload-progress.md Outdated Show resolved Hide resolved
design/upload-progress.md Show resolved Hide resolved
Copy link
Contributor

@carlisia carlisia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple suggestions, couple comments, couple questions.

Overall lgtm, looks like someone could take it and implemented. Can't wait to have this split in the backup process!

design/upload-progress.md Outdated Show resolved Hide resolved
design/upload-progress.md Outdated Show resolved Hide resolved
design/upload-progress.md Show resolved Hide resolved
design/upload-progress.md Show resolved Hide resolved
design/upload-progress.md Show resolved Hide resolved
design/upload-progress.md Show resolved Hide resolved
design/upload-progress.md Show resolved Hide resolved
@ashish-amarnath
Copy link
Member

@dsu-igeek Looks like the latest commit is missing a sign-off.

@dsu-igeek dsu-igeek dismissed stale reviews from ashish-amarnath and carlisia via 3af5a60 April 30, 2021 05:28
@dsu-igeek dsu-igeek force-pushed the snapshot-upload-progress-design-12-14-2020 branch from 077c32c to 3af5a60 Compare April 30, 2021 05:28
carlisia
carlisia previously approved these changes Apr 30, 2021
Copy link
Contributor

@carlisia carlisia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Change to add new plugin SnapshotItemAction, added started/updated fields to UploadProgress
Updated SnapshotItemAction, added additional tasks

Signed-off-by: Dave Smith-Uchida <[email protected]>
@dsu-igeek dsu-igeek dismissed stale reviews from ashish-amarnath and carlisia via f414746 April 30, 2021 21:08
@dsu-igeek dsu-igeek force-pushed the snapshot-upload-progress-design-12-14-2020 branch from 3af5a60 to f414746 Compare April 30, 2021 21:08
@carlisia carlisia merged commit 3b3d228 into vmware-tanzu:main Apr 30, 2021
ywk253100 pushed a commit to ywk253100/velero that referenced this pull request Jun 29, 2021
Change to add new plugin SnapshotItemAction, added started/updated fields to UploadProgress
Updated SnapshotItemAction, added additional tasks

Signed-off-by: Dave Smith-Uchida <[email protected]>
gyaozhou pushed a commit to gyaozhou/velero-read that referenced this pull request May 14, 2022
Change to add new plugin SnapshotItemAction, added started/updated fields to UploadProgress
Updated SnapshotItemAction, added additional tasks

Signed-off-by: Dave Smith-Uchida <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Design Design Documents kind/changelog-not-required PR does not require a user changelog. Often for docs, website, or build changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants