Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design proposal for Restore hooks #2465

Merged
merged 3 commits into from
Jul 20, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 143 additions & 0 deletions design/restore-hooks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Restore Hooks

This document proposes a solution that allows a user to specify Restore Hooks, much like Backup Hooks, that can be executed during the restore process.

## Goals

- Enable custom commands to be run during a restore in order to mirror the commands that are available to the backup process.
- Provide observability into the result of commands run in restored pods.

## Non Goals

- Handling any application specific scenarios (postgres, mongo, etc)

## Background

Velero supports Backup Hooks to execute commands before and/or after a backup.
This enables a user to, among other things, prepare data to be backed up without having to freeze an in-use volume.
An example of this would be to attach an empty volume to a Postgres pod, use a backup hook to execute `pg_dump` from the data volume, and back up the volume containing the export.
The problem is that there's no easy or automated way to include an automated restore process.
After a restore with the example configuration above, the postgres pod will be empty, but there will be a need to manually exec in and run `pg_restore`.

## High-Level Design

The Restore spec will have a `spec.hooks` section matching the same section on the Backup spec except no `pre` hooks can be defined - only `post`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there are different parts of the pod lifecycle that we would want to model/enable hooks for here. For example, we could theoretically support both (1) running a restore hook as an init container, so guaranteed to run before the main containers in the pod; and (2) running a restore hook after the pod is running, so as an exec hook inside the main container. Does anyone have thoughts on if something like this would be useful?

If we don't model this, then I believe you're proposing that we only support (2) from above, i.e. run the hook inside one of the existing containers in the pod once it's up and running.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our use case (working on TAS backup-and-restore), this use-case (2) looks like exactly what we need: we'd like to be able to re-hydrate a database after the pod running the database server is ready.

Use-case (1) looks potentially useful to us in future, but no where near as immediately useful as use-case (2). So we're totally in agreement with adding use-case (2) right now, and maybe opening a new design PR later if we end up with a real-world use for (1).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that the primary purpose of these hooks are for application data restoration and the fact that these hooks will be run once the pod becomes ready. Most applications would then need to be restarted to recognize the restored data. For this reason, the init container approach may be better suited.
As we are designing this, we should add both flavors of this restore hook and not just running commands. The approach of init containers will also have better error handling. In that, the pod doesn't run if the data isn't restored correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help with the design if we had a concrete example of when initContainers would be useful during restore. The closest thing I could come up with is a scenario like this:

You have an API deployment that uses Postgres as its DB. You have your migrations in an initContainer for that deployment. You have a pre-snapshot hook that executes against the API pod and runs pg_dump to a volume in the pod. Then you would want an initContainer to be injected before the migrations initContainer to run pg_restore.

I'm not sure the above scenario would appear in the field since most API deployments have multiple replicas and it would be weird to run pg_dump/pg_restore multiple times for snapshot and restore. Do you have another use case in mind for initContainers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@areed
The scenario I had was restoring of the stateful application itself and most stateful applications maintain state by writing files to disk.

In the case of postgres, if postgres were to start before running pg_dump/pg_restore, it would start by writing data into files that pg_dump/pg_restore would write when the restore hooks are run. The hooks will then overwrite the data that postgres initialized itself with. For postgres to heal, it may require a restart of postgres itself. Which means there will be a container restart. When this happens, it is possible for the pod, running the application, to be relocated and the new pod wouldn't be controlled by Velero for it to run the restore hooks to restore data. This is scope for data-loss.

Furthermore, when restoring a stateful application, say like Apache Zookeeper, which uses transaction semantics to maintain/manage state, overwriting data (performed in the restore hook) that the application started with will lead to data corruption causing the application to crash. Making the restore unsuccessful.

For this reason, the restore hooks should be run before the application can start.

Annotations comparable to the annotations used during backup can also be set on pods.
For each restored pod, the Velero server will check if there are any hooks applicable to the pod.
If a restored pod has any applicable hooks, Velero will wait for the container where the hook is to be executed to reach status Running.
The Restore log will include the results of each post-restore hook and the Restore object status will incorporate the results of hooks.
The Restore log will include the results of each hook and the Restore object status will incorporate the results of hooks.

A new section at `spec.hooks.resources.initContainers` will allow for injecting initContainers into restored pods.
Annotations can be set as an alternative to defining the initContainers in the Restore object.

## Detailed Design

Post-restore hooks can be defined by annotation and/or by an array of resource hooks in the Restore spec.

The following annotations are supported:
- post.hook.restore.velero.io/container
- post.hook.restore.velero.io/command
- post.hook.restore.velero.io/on-error
- post.hook.restore.velero.io/exec-timeout
- post.hook.restore.velero.io/wait-timeout

Init restore hooks can be defined by annotation and/or in the new `initContainers` section in the Restore spec.
The initContainers schema is `pod.spec.initContainers`.

The following annotations are supported:
- init.hook.restore.velero.io/timeout
- init.hook.restore.velero.io/initContainers

This is an example of defining hooks in the Restore spec.

```yaml
apiVersion: velero.io/v1
kind: Restore
spec:
...
hooks:
resources:
-
name: my-hook
includedNamespaces:
- '*'
excludedNamespaces:
- some-namespace
includedResources:
- pods
excludedResources: []
labelSelector:
matchLabels:
app: velero
component: server
post:
-
exec:
container: postgres
command:
- /bin/bash
- -c
- rm /docker-entrypoint-initdb.d/dump.sql
onError: Fail
timeout: 10s
readyTimeout: 60s
init:
timeout: 120s
initContainers:
- name: restore
image: postgres:12
command: ["/bin/bash", "-c", "mv /backup/dump.sql /docker-entrypoint-initdb.d/"]
volumeMounts:
- name: backup
mountPath: /backup
```

As with Backups, if an annotation is defined on a pod then no hooks from the Restore spec will be applied.

### Implementation

The types and function in pkg/backup/item_hook_handler.go will be moved to a new package (pkg/hooks) and exported so they can be used for both backups and restores.

The post-restore hooks implementation will closely follow the design of restoring pod volumes with restic.
The pkg/restore.context type will have new fields `hooksWaitGroup` and `hooksErrs` comparable to `resticWaitGroup` and `resticErr`.
The pkg/restore.context.execute function will start a goroutine for each pod with applicable hooks and then continue with restoring other items.
Each hooks goroutine will create a pkg/util/hooks.ItemHookHandler for each pod and send any error on the context.hooksErrs channel.
The ItemHookHandler already includes stdout and stderr and other metadata in the Backup log so the same logs will automatically be added to the Restore log (passed as the first argument to the ItemHookhandler.HandleHooks method.)

The pkg/restore.context.execute function will wait for the hooksWaitGroup before returning.
Any errors received on context.hooksErrs will be added to errs.Velero.

One difference compared to the restic restore design is that any error on the context.hooksErrs channel will cancel the context of all hooks, since errors are only reported on this channel if the hook specified `onError: Fail`.
However, canceling the hooks goroutines will not cancel the restic goroutines.
nrb marked this conversation as resolved.
Show resolved Hide resolved
In practice the restic goroutines will complete before the hooks since the hooks do not run until a pod is ready, but it's possible a hook will be executed and fail while a different pod is still in the pod volume restore phase.

Failed hooks with `onError: Continue` will appear in the Restore log but will not affect the status of the parent Restore.
Failed hooks with `onError: Fail` will cause the parent Restore to have status Partially Failed.

If initContainers are specified for a pod, Velero will inject the containers into the beginning of the pod's initContainers list.
If a restic initContainer is also being injected, the restore initContainers will be injected directly after the restic initContainer.
The restore will use a RestoreItemAction to inject the initContainers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a RestoreItemAction that applies to pods?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, only applies to pods

Stdout and stderr of the restore initContainers will not be added to the Restore logs.
InitContainers that fail will not affect the parent Restore's status.

## Alternatives Considered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple other related upstream efforts to consider:

https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/20190120-execution-hook-design.md
https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/20200120-generic-data-populators.md

I don't think that either of those are in a state where we'd want to change our approach here for now, but worth look at/keeping an eye on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The execution-hook proposal is now obsolete and is being replaced by Draft ContainerNotifier Proposal


Wait for all restored Pods to report Ready, then execute the first hook in all applicable Pods simultaneously, then proceed to the next hook, etc.
That could introduce deadlock, e.g. if an API pod cannot be ready until the DB pod is restored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have some option to enable execution simultaneously, since in some cases simultaneous execution may not required, for example running db repair or file system repair can be executed sequentially.


Put the restore hooks on the Backup spec as a third lifecycle event named `restore` along with `pre` and `post`.
That would be confusing since `pre` and `post` would appear in the Backup log but `restore` would only be in the Restore log.

Copy link

@totherme totherme May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like calling this hook a "post" hook, since that leaves us open to add "pre" hooks in future if we want them (along the lines of skriss' comments above) 👍

Copy link

@neil-hickey neil-hickey May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of a pre hook might be:

  • I have a corrupted database / invalid state of a DB I am restoring into. My pre-hook will delete the corrupted data prior to restore, or reconcile a good state of the world prior to restore. (This can be done via one all encompassing hook also. But the separation may be cleaner)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be the lifecycle phase of the pod for running a pre-restore-hook?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have a need for pre-restore-hooks right now, but I can imagine it being useful to have one that is guaranteed to run:

  • before the main containers in the restored pod start ; and
  • after the PVCs have been mounted.

This would allow us to do what neil-hickey suggests in the event of a corrpted DB: replace the bad data before starting a database process.

Having said all that, I don't think there's any need at all for pre-restore-hooks in an MVP. I just like the idea of leaving the option open for future work.

Execute restore hooks in parallel for each Pod.
That would not match the behavior of Backups.

Wait for PodStatus ready before executing the post-restore hooks in any container.
There are cases where the pod should not report itself ready until after the restore hook has run.

Include the logs from initContainers in the Restore log.
Unlike exec hooks where stdout and stderr are permanently lost if not added to the Restore log, the logs of the injected initContainers are available through the K8s API with kubectl or another client.

## Security Considerations

Stdout or stderr in the Restore log may contain sensitive information, but the same risk already exists for Backup hooks.