-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Support for Restoring Specific Backups #8824
Conversation
By default Vitess will only make practical use of the latest backup of a given shard. While this makes perfect sense for the common use cases there are times where you need to restore a specific backup. For example: 1. In order to extract a portion of the data that can then be merged with the current state. For example if you later realize that you accidentally deleted some records in a table that you shouldn't have last week, and you need to perform a restore so that you can copy those specific records back to the live data set. 2. To perform validation, forensics, analysis on the system state at that time. 3. A specific PITR for whatever reason ... This is a continuation of: vitessio#7998 This solves: vitessio#4905 Co-authored-by: Guido Iaquinti <[email protected]> Signed-off-by: Matt Lord <[email protected]>
597aa29
to
50938fd
Compare
Once tests are added LGTM. Thank you for moving this forward! |
1caad45
to
863df25
Compare
I've tested and verified the tablet flag as well as the vtctlclient flag: Backups
Updated help output
Invalid timestamp
Uses latest / 2021-09-17.034524
Uses the older one based on the timestamp
Now I have a good idea of what to test and how in the test suite. 🙂 |
40a66cf
to
0d9f2c8
Compare
Signed-off-by: Matt Lord <[email protected]>
0d9f2c8
to
bf78488
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, except protobuf definition is missing.
e61794f
to
4b80776
Compare
And add missing protobuf change Signed-off-by: Matt Lord <[email protected]>
9db70ac
to
25fa195
Compare
dff915a
to
f8ec301
Compare
mm == minutes (two digits) MM == month (two digits) Signed-off-by: Matt Lord <[email protected]>
ee31291
to
a7a951b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Let's get another set of eyes before merging.
Can you please create a website PR to document the new usage (vttablet flag & vtctl command)?
err = localCluster.VtctlclientProcess.ExecuteCommand("DeleteTablet", "-allow_primary=true", replica2.Alias) | ||
require.Nil(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a clever way to simulate the test scenario 👍
This documents the work done in: vitessio/vitess#8824 Signed-off-by: Matt Lord <[email protected]>
vitessio/website#834 (noted in the description now and linked in comments) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to bring up one comment about the types we're using, but if you don't feel it's worth it to change then I'm happy to merge as-is.
// Check if we should use the latest (default) or a specified backup timestamp for the restore | ||
if restoreFromBackupTs != "" { | ||
startTime, err = time.Parse(mysqlctl.BackupTimestampFormat, restoreFromBackupTs) | ||
if err != nil { | ||
return vterrors.New(vtrpcpb.Code_INVALID_ARGUMENT, fmt.Sprintf("unable to parse the backup timestamp value provided of '%s'", restoreFromBackupTs)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed the rest of the PR and it doesn't seem like anyone brought this up. Another way to do this would be to push the parsing further up to the cli, and enforce stricter types closer to this function. In this way we could:
commandRestoreBackup
ingo/vt/vtctl/vtctl.go
parses itsbackupTimestampStr
into atime.Time
, usingmysqlctl.BackupTimestampFormat
- we pass
time.Time
to all the tmc functions - when making the gRPC request, we change that protobuf definition to be a
vttime.Time
, and callprotoutil.TimeToProto
to convert thetime.Time
to avttime.Time
, and convert back on the tabletmanager side.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I briefly considered this but wasn't sure of the benefits (avoiding the RPC call with an invalid vtctl CLI parameter is one). I don't mind making this change if you prefer it? Sounds like you do or you wouldn't have mentioned it 😄 ... so I'll work on that now. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, after trying it again I'm remembering WHY I didn't go this route last time. 🙂
What makes it feel awkward/clunky to use time.Time
coming from the vtctl
side is that there's also a corresponding vttablet
flag that's used directly in the tmc:
restoreFromBackupTs = flag.String("restore_from_backup_ts", "", "(init restore parameter) if set, restore the latest backup taken at or before this timestamp. Example: '2021-04-29.133050'") |
As things are now, it's uniform handling (always a string) on the tmc side -- whether the value came from a vtctl call and RPC or from the vttablet flag. Maybe I'm missing something though?
I understand the pull to use a more precise type... but it also feels like it makes the code more complex with little practical benefit. If you feel strongly about it though I can continue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I just switched to this method. Thank you for the nudge! I feel better about it too and It wasn't really that awkward in the end. :-)
0920d09
to
67db03f
Compare
Signed-off-by: Matt Lord <[email protected]>
67db03f
to
f9c6938
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
@@ -636,9 +636,21 @@ func (tm *TabletManager) handleRestore(ctx context.Context) (bool, error) { | |||
// Open the state manager after restore is done. | |||
defer tm.tmState.Open() | |||
|
|||
// Zero date will cause us to use the latest, which is the default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Follow-up to: vitessio#8824 I noticed this oversight after it was merged. :-) Signed-off-by: Matt Lord <[email protected]>
Description
By default Vitess will only make practical use of the latest backup of a given shard. While this makes perfect sense for the common use cases there are times where you may want to restore a specific backup. For example:
This work supports that in two different ways:
-restore_from_backup -restore_from_backup_ts 2021-04-29.133050
vtctlclient -server=<vtctld-server>:<vtctld-port> RestoreFromBackup -backup_timestamp=2021-04-29.133050 <tablet-alias>
This compared with PITR
While Vitess supports a method to accomplish PITR, it's a fairly involved process with the details left up to the user (e.g. binlog servers). This work offers another method that can potentially be used in various ways/circumstances as an alternative. For example, let's say we realize that 1-2 days ago we made a mistake and want to look at the previous state of the data on one tablet in the shard... we could do that with:
This will stop MySQL replication on the tablet ASAP after the restore completes and prevent it from being automatically re-started (via tablet repair). This allows you to do whatever you like on that tablet before deciding to later:
vtctlclient StartReplication
)Co-authored-by: Guido Iaquinti [email protected]
Signed-off-by: Matt Lord [email protected]
Related Issue(s)
This is a continuation of: #7998
This solves: #4905
Checklist