-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design doc for upload progress monitoring #3416
Design doc for upload progress monitoring #3416
Conversation
3115591
to
cae5d85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an alternate proposal:
The goal here is to ensure the durability of Velero backups, and that backups, once completed, can be fully restored.
To accomplish this we propose:
-
Introducing a new
BackupProgressTracker
gRpc service that would be implemented as a plugin. This service would have the familiarAppliesTo
andExecute
rpcs. TheAppliesTo
rpc would indicate what resources this plugin applies to, and theExecute
(this can be called something else) would take as input ExecuteRequest with fieldsitem
and thebackup
(similar toBackupItemAction's ExecuteRequest), and return, as output, ExecuteResponse with fields similar to the
UploadProgress` that's in this proposal. -
Following changes in the Backup controller:
i. Here, instead of setting the backup phase asCompleted
would set the phase asUploading
.
ii. Here, start a progress tracker goroutine which would periodically, say every 1s, invoke all theBackupProgressTracker
plugins'Execute
rpc to obtain the status of the backup for each item and patch the backup object's status with this info. Once theUploadProgress
from all the plugins have reached a state indicating durability of the backup, the goroutine returns which is when the backup's phase will be set to completed/uploaded. -
Requires the following changes to the backup API:
- Introduce a
map[string]UploadProgress
as a field in the backup status which will be patched by the goroutine explained in 2.ii above. This will be mapping item to its latestUploadProgress
- Introduce a
Benefits of this approach:
- Introducing a new gRpc service is not a breaking change. (So is adding a new rpc to an existing service, but this is more intuitive in the
go-plugin
framework.) - It is more extendable across resource types as it enables tracking upload progress of all resource types, even those that don't have a corresponding
BackupItemAction
or aVolumeSnapshotter
interface. - UploadProgress of all items in the backup can be tracked in a consistent way and in one single point.
We may have previously considered this approach but this seems more intuitive to me now than modifying the existing BackupItemAction
and VolumeSnapshotter
plugins. WDYT?
Upload progress may take a long time. We want to be able to recover from restarts of the Velero server while uploads are in progress. Also, if the backup is deleted before the upload finishes we want to terminate any polling on upload status and, if possible, abort the upload. The design introduces a BackupItemProgress plug-in for BackupItemAction monitoring. In order to monitor the upload progress, we will need to know what to monitor, which is why the BackupItemAction needs to be modified to return a snapshot ID. It might be worthwhile to add a new BackupItemActionWithSnapshot plug-in rather than add the snapshot ID return to the existing plug-ins. The VolumeSnapshotter plugins work at the volume level, so invoking a BackupItemProgress action for them requires that they map the PV to a volume. Adding the UploadProgress API should not be a breaking change once API versioning has been introduced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach LGTM.
Added a few comments that can be addressed/figured out during implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple suggestions, couple comments, couple questions.
Overall lgtm, looks like someone could take it and implemented. Can't wait to have this split in the backup process!
@dsu-igeek Looks like the latest commit is missing a sign-off. |
3af5a60
077c32c
to
3af5a60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Change to add new plugin SnapshotItemAction, added started/updated fields to UploadProgress Updated SnapshotItemAction, added additional tasks Signed-off-by: Dave Smith-Uchida <[email protected]>
f414746
3af5a60
to
f414746
Compare
Change to add new plugin SnapshotItemAction, added started/updated fields to UploadProgress Updated SnapshotItemAction, added additional tasks Signed-off-by: Dave Smith-Uchida <[email protected]>
Change to add new plugin SnapshotItemAction, added started/updated fields to UploadProgress Updated SnapshotItemAction, added additional tasks Signed-off-by: Dave Smith-Uchida <[email protected]>
Signed-off-by: Dave Smith-Uchida [email protected]