-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design proposal for Restore hooks #2465
Design proposal for Restore hooks #2465
Conversation
f781750
to
a3971a8
Compare
@jimbo459 @neil-hickey @aclevername you may be interested in this design proposal as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marccampbell thanks for writing this up! I read through and added a few comments/questions. I'll continue to think about this and add any more thoughts I have.
|
||
## High-Level Design | ||
|
||
The Restore spec will have a `spec.hooks` section matching the same section on the Backup spec except no `pre` hooks can be defined - only `post`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there are different parts of the pod lifecycle that we would want to model/enable hooks for here. For example, we could theoretically support both (1) running a restore hook as an init container, so guaranteed to run before the main containers in the pod; and (2) running a restore hook after the pod is running, so as an exec hook inside the main container. Does anyone have thoughts on if something like this would be useful?
If we don't model this, then I believe you're proposing that we only support (2) from above, i.e. run the hook inside one of the existing containers in the pod once it's up and running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For our use case (working on TAS backup-and-restore), this use-case (2) looks like exactly what we need: we'd like to be able to re-hydrate a database after the pod running the database server is ready.
Use-case (1) looks potentially useful to us in future, but no where near as immediately useful as use-case (2). So we're totally in agreement with adding use-case (2) right now, and maybe opening a new design PR later if we end up with a real-world use for (1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that the primary purpose of these hooks are for application data restoration and the fact that these hooks will be run once the pod becomes ready. Most applications would then need to be restarted to recognize the restored data. For this reason, the init container approach may be better suited.
As we are designing this, we should add both flavors of this restore hook and not just running commands. The approach of init containers will also have better error handling. In that, the pod doesn't run if the data isn't restored correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would help with the design if we had a concrete example of when initContainers would be useful during restore. The closest thing I could come up with is a scenario like this:
You have an API deployment that uses Postgres as its DB. You have your migrations in an initContainer for that deployment. You have a pre-snapshot hook that executes against the API pod and runs pg_dump to a volume in the pod. Then you would want an initContainer to be injected before the migrations initContainer to run pg_restore.
I'm not sure the above scenario would appear in the field since most API deployments have multiple replicas and it would be weird to run pg_dump/pg_restore multiple times for snapshot and restore. Do you have another use case in mind for initContainers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@areed
The scenario I had was restoring of the stateful application itself and most stateful applications maintain state by writing files to disk.
In the case of postgres, if postgres were to start before running pg_dump
/pg_restore
, it would start by writing data into files that pg_dump
/pg_restore
would write when the restore hooks are run. The hooks will then overwrite the data that postgres initialized itself with. For postgres to heal, it may require a restart of postgres itself. Which means there will be a container restart. When this happens, it is possible for the pod, running the application, to be relocated and the new pod wouldn't be controlled by Velero for it to run the restore hooks to restore data. This is scope for data-loss.
Furthermore, when restoring a stateful application, say like Apache Zookeeper, which uses transaction semantics to maintain/manage state, overwriting data (performed in the restore hook) that the application started with will lead to data corruption causing the application to crash. Making the restore unsuccessful.
For this reason, the restore hooks should be run before the application can start.
design/restore-hooks.md
Outdated
The Restore spec will have a `spec.hooks` section matching the same section on the Backup spec except no `pre` hooks can be defined - only `post`. | ||
Annotations comparable to the annotations used during backup can also be set on pods. | ||
For each restored pod, the Velero server will check if there are any hooks applicable to the pod. | ||
If a restored pod has any applicable hooks, Velero will wait for the pod to reach status Ready and then execute the hooks in the pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that, for the restic integration, certain users have seen issues where the pod that velero restores (and attempts to run the restic restore for) ends up being replaced by a controlling ReplicaSet shortly after restore, which ultimately makes that restic restore fail. See #1981 (comment) and the subsequent discussion for a lot more detail on that.
I do think, though, that for most environments, Velero's assumption that the pod that it restores will be "adopted" by its controller, seems to work fine. It sounds like, in order for the restore hooks to work as expected, we'd be making that same assumption?
(I know this is a somewhat vague comment, probably merits a live discussion at some point, but hopefully I wrote enough to point in the right direction for some context).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this design makes the same assumption.
An alternative to avoid this issue would be:
- Drop support for restore annotations
- Move restore hook execution stage to after the restic restore wait
- Then use the labelSelector in the Restore hook spec to list Pods and then execute the hook in all that match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect the annotations are included to mirror the current backup hooks design, but it certainly looks like there's some cases where it's likely to cause trouble.
Perhaps we don't include them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to dig into this some more and see if I can get a better understanding of the scenarios where the pod adoption wouldn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skriss Have you been able to do this research? Any further thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the latest community meeting the consensus was that since adoption behavior is already relied on for the restic integration it's safe to rely on for restore hooks.
Failed hooks with `onError: Continue` will appear in the Restore log but will not affect the status of the parent Restore. | ||
Failed hooks with `onError: Fail` will cause the parent Restore to have status Partially Failed. | ||
|
||
## Alternatives Considered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple other related upstream efforts to consider:
https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/20190120-execution-hook-design.md
https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/20200120-generic-data-populators.md
I don't think that either of those are in a state where we'd want to change our approach here for now, but worth look at/keeping an eye on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execution-hook proposal is now obsolete and is being replaced by Draft ContainerNotifier Proposal
This issue also seems related from a quick skim: #1150 |
LGTM. We had a look at the design doc and added comments (via @totherme ) |
|
||
Put the restore hooks on the Backup spec as a third lifecycle event named `restore` along with `pre` and `post`. | ||
That would be confusing since `pre` and `post` would appear in the Backup log but `restore` would only be in the Restore log. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like calling this hook a "post" hook, since that leaves us open to add "pre" hooks in future if we want them (along the lines of skriss' comments above) 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example of a pre hook might be:
- I have a corrupted database / invalid state of a DB I am restoring into. My pre-hook will delete the corrupted data prior to restore, or reconcile a good state of the world prior to restore. (This can be done via one all encompassing hook also. But the separation may be cleaner)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will be the lifecycle phase of the pod for running a pre-restore-hook
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have a need for pre-restore-hooks right now, but I can imagine it being useful to have one that is guaranteed to run:
- before the main containers in the restored pod start ; and
- after the PVCs have been mounted.
This would allow us to do what neil-hickey suggests in the event of a corrpted DB: replace the bad data before starting a database process.
Having said all that, I don't think there's any need at all for pre-restore-hooks in an MVP. I just like the idea of leaving the option open for future work.
## Alternatives Considered | ||
|
||
Wait for all restored Pods to report Ready, then execute the first hook in all applicable Pods simultaneously, then proceed to the next hook, etc. | ||
That could introduce deadlock, e.g. if an API pod cannot be ready until the DB pod is restored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have some option to enable execution simultaneously, since in some cases simultaneous execution may not required, for example running db repair or file system repair can be executed sequentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments here. I think going over this again at a community meeting would be helpful.
design/restore-hooks.md
Outdated
|
||
### Implementation | ||
|
||
The types and function in pkg/backup/item_hook_handler.go will be moved to a new package (pkg/util/hooks) and exported so they can be used for both backups and restores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat minor, but I'd promote the package to be pkg/hooks
vs being inside pkg/util
.
design/restore-hooks.md
Outdated
The Restore spec will have a `spec.hooks` section matching the same section on the Backup spec except no `pre` hooks can be defined - only `post`. | ||
Annotations comparable to the annotations used during backup can also be set on pods. | ||
For each restored pod, the Velero server will check if there are any hooks applicable to the pod. | ||
If a restored pod has any applicable hooks, Velero will wait for the pod to reach status Ready and then execute the hooks in the pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect the annotations are included to mirror the current backup hooks design, but it certainly looks like there's some cases where it's likely to cause trouble.
Perhaps we don't include them?
design/restore-hooks.md
Outdated
- post.hook.restore.velero.io/container | ||
- post.hook.restore.velero.io/command | ||
- post.hook.restore.velero.io/on-error | ||
- post.hook.restore.velero.io/timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also need a timeout to configure how long Velero should wait for the pod to be come running.
@marccampbell FYI, we've added this to our roadmap for v1.5. See https://github.com/vmware-tanzu/velero/blob/master/ROADMAP.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of this design as stated.
@ashish-amarnath and @carlisia if you can get some time this week, please take another pass over this. Ideally, if there are no major objections we can get it merged this week and start breaking up the work for execution.
@jimbo459 @neil-hickey @aclevername Feel free to add approvals on this as well if you've read it and are ok with the design. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these restore hooks are run after the application has started, this can potentially lead to data-loss for most stateful applications and data corruption for stateful applications using transaction semantics.
For that reason, IMO, we should be supporting running these hooks as init containers as well.
@ashish-amarnath with regards to running the stateful applications in init containers- what would be the additional work required to support this? And, would we want to add this to the current design doc? (asked on velero slack as well) |
@ashish-amarnath When you say "as well," are you advocating for the option to use them as init containers or as annotations as currently proposed? I think your point about potential data loss is a valid one, and we definitely need to address it in some fashion, even if that isn't necessarily by implementing init containers and documenting why we did it this way. |
@stephbman I haven't had a chance to think about what the design for supporting init containers would look like. But this could follow how restic restores are performed. IMO, the current proposal caters to the proverbial 20% of the use cases so implementing this may not benefit all of our users. Not sure if that is enough bang-for-buck to implement this. @nrb I agree that init containers may not be the only way to address this. But, I think we agree that this needs to be addressed. Ideally, we would design both at once and implement it in two phases. |
As an alternative to using init containers, we can also consider running the restore hooks in three steps:
|
@jimbo459 @neil-hickey @aclevername @marccampbell @areed Would you be able to attend the community meeting on June 23 2020 for discussion on this? Also, it's likely we'll schedule a separate, dedicated call for this too. |
Yes. I can be there to discuss on Tuesday. The init container support actually makes sense to me. In our product, we've implemented restore hooks as custom scripts (until the work in this PR is done), and had this problem too. Interestingly, we had to solve this by running the custom restore hooks in an init container (https://github.com/replicatedhq/kots/blob/e5351f44c164efd0a904eda9bd728058b90b8016/pkg/kotsadm/kotsadm_objects.go#L258) to prevent services from writing to the database before/during restore. |
Yep, lgtm :) |
Hi @marccampbell @areed outside of the community meeting (where we'll have a more limited amount of time to discuss this PR) - we were also thinking of setting up a deep dive session for another day this week. Are there some days/times that you all are available to meet with us? We generally have 9-11 am CST available each week (one team member involved is located in London so the earlier time frame accommodates that). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @marccampbell and @areed for getting these updates in!
@ashish-amarnath and @carlisia Please take another pass on this when you can |
@marccampbell Looks like the merge commit will need the DCO sign off added via |
|
||
If initContainers are specified for a pod, Velero will inject the containers into the beginning of the pod's initContainers list. | ||
If a restic initContainer is also being injected, the restore initContainers will be injected directly after the restic initContainer. | ||
The restore will use a RestoreItemAction to inject the initContainers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a RestoreItemAction
that applies to pods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, only applies to pods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM.
We have everything in place to move this forward into implementation.
This should be good for merge once the commits are signed. |
2f9b8a6
to
d9c785e
Compare
@marccampbell Looks like we may need a cherry-pick or something other than a merge, as the sign is still failing :( |
Signed-off-by: Marc Campbell <[email protected]>
Signed-off-by: Marc Campbell <[email protected]>
50db1db
to
3f86324
Compare
Change post-restore exec hooks to wait for container running status instead of pod ready status. Add separate exec timeout and wait timeouts for post-restore exec hooks. Signed-off-by: Marc Campbell <[email protected]>
3f86324
to
c4c786e
Compare
@nrb thanks. The DCO is passing now. |
@marccampbell @areed Thanks so much for driving this forward! I've broken out the most visible work items that I could see on the ZenHub board and use the ZenHub dependencies to split that out. Please take a look at it: https://app.zenhub.com/workspaces/velero-5c59c15e39d47b774b5864e3/board?labels=restore%20hooks&milestones=v1.5%232020-08-31&repos=99143276,112385197,213946861,190224441,214524700,214524630 I think we have a few small issues that can be knocked out by pretty much anyone, and then the larger implementation of init container injection/restore hook execution, which may yet be broken into smaller chunks. |
This file contains the MVP use cases and product requirements tied the Restore Hooks Design Proposal #2465. This requirements document ties to Epic: #2116 And is related to issue: #2678 Signed-off-by: Stephane Bauman <[email protected]>
This file contains the MVP use cases and product requirements tied the Restore Hooks Design Proposal #2465. This requirements document ties to Epic: #2116 And is related to issue: #2678 Signed-off-by: Stephane Bauman <[email protected]>1
This file contains the MVP use cases and product requirements tied the Restore Hooks Design Proposal #2465. This requirements document ties to Epic: #2116 And is related to issue: #2678 Signed-off-by: Stephane Bauman <[email protected]> Co-authored-by: Ashish Amarnath <[email protected]>
This file contains the MVP use cases and product requirements tied the Restore Hooks Design Proposal #2465. This requirements document ties to Epic: #2116 And is related to issue: #2678 Signed-off-by: Stephanie Bauman <[email protected]> Co-authored-by: Ashish Amarnath <[email protected]>
* Add design proposal for restore hooks Signed-off-by: Marc Campbell <[email protected]> * Add details to restore hooks design Signed-off-by: Marc Campbell <[email protected]> * Restore initContainers and requested changes Change post-restore exec hooks to wait for container running status instead of pod ready status. Add separate exec timeout and wait timeouts for post-restore exec hooks. Signed-off-by: Marc Campbell <[email protected]> Co-authored-by: Andrew Reed <[email protected]>
See #2116
closes #2609