Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental logical backup and point in time recovery #11097

Merged
merged 111 commits into from
Nov 29, 2022

Conversation

shlomi-noach
Copy link
Contributor

@shlomi-noach shlomi-noach commented Aug 25, 2022

Description

This PR introduces incremental backups, towards point in time recovery (which may or may not happen in this PR).

Incremental backups are only available for:

  • builtin backup method
  • Running MySQL GTID

As reminder, builtin backup is a full backup taken by shutting down the server and copying over files.
The incremental backup is done by creating a backup of binary log files. The MySQL server is not stopped nor interrupted by this operation.

An incremental backup is taken like so:

vtctlclient -- Backup --incremental_from_pos "MySQL56/16b1039f-22b6-11ed-b765-0a43f95f28a3:1-615" zone1-0000000102 

In the above we ask Vitess to create an incremental backup that covers MySQL56/16b1039f-22b6-11ed-b765-0a43f95f28a3:1-615. Since the backup method is to copy complete binlog files, it's possible that the backup actually starts at an earlier position. Vitess does not impose that the position is at a binary log rotation.

The backup fails if:

  • The first available binlog file does not cover required position (ie binlog files have been rotated and purged, and the incremental GTID do not exist anymore)
  • No binary log file contains the requested position
  • There is nothing to backup (there are no new GTID entries in the binary logs on top of requested position, ie the requested position is either at the end of existing binary logs or even more futuristic).

The backup's manifest has been updated with new fields. Here's a new manifest:

{
  "BackupMethod": "builtin",
  "Position": "MySQL56/16b1039f-22b6-11ed-b765-0a43f95f28a3:1-883",
  "FromPosition": "MySQL56/16b1039f-22b6-11ed-b765-0a43f95f28a3:1-867",
  "Incremental": true,
  "BackupTime": "2022-08-25T12:55:05Z",
  "FinishedTime": "2022-08-25T12:55:05Z",
  "ServerUUID": "1ea0631b-22b6-11ed-933f-0a43f95f28a3",
  "TabletAlias": "zone1-0000000102",
  "CompressionEngine": "pargzip",
  "FileEntries": [
     ..
  ]
}
  • The above is an incremental backup's manifest. Clearly indicated by "Incremental": true,
  • "FileEntries" will only list binary logs
  • "FromPosition" indicates the first position covered by the backup. It is smaller than or equal to the requested --incremental_from_pos. This value is empty for full backup.
  • ServerUUID is new and self explanatory, added for convenience
  • TabletAlias is new and self explanatory, added for convenience

Added some unit tests. WIP to add endtoend tests.

Related Issue(s)

#11227

Checklist

  • "Backport me!" label has been added if this change should be backported
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…mPosition in the manifest. Add 'Incremental' (bool) to the manifest

Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach shlomi-noach added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Backup and Restore labels Aug 25, 2022
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Aug 25, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

Comment on lines 340 to 343
params.Logger.Infof("Restore: %v", restorePath.String())
if params.DryRun {
return nil, vterrors.Errorf(vtrpc.Code_CANCELED, "Restore: dry run, aborting operation")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the dry run case, do we need to return an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a way to break through all layers up, without anyone attempting to actually use the backup result. Do you think we should return a nil for error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is dry run is used by the user to see what steps will be taken and does all the pre-condition checks pass or not, before actually running it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct. Well, let me look into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change was made. No longer returning an error, and now exiting gracefully. The log looks something like this:

restore.go:202] Restore: original tablet type=RDONLY
backup.go:289] Restore: looking for a suitable backup to restore
backup.go:340] Restore: RestorePath: [full:2022-11-28.063331.zone1-0000000102, incremental:2022-11-28.063340.zone1-0000000102, incremental:2022-11-28.063343.zone1-0000000102, incremental:2022-11-28.063348.zone1-0000000102]
restore.go:230] Restore: got a restore manifest: <nil>, err=<nil>, waitForBackupInterval=0s
restore.go:289] Dry run. No changes made
restore.go:311] Restore: changing tablet type to RDONLY

@shlomi-noach
Copy link
Contributor Author

I still need to write release notes, as well as website docs.

Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach
Copy link
Contributor Author

Release notes added.

@shlomi-noach shlomi-noach removed the release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) label Nov 29, 2022
@shlomi-noach shlomi-noach merged commit bd0a5b8 into vitessio:main Nov 29, 2022
@shlomi-noach shlomi-noach deleted the backup-pitr branch November 29, 2022 09:06
@shlomi-noach
Copy link
Contributor Author

bwahahaha! It is merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Backup and Restore NeedsWebsiteDocsUpdate What it says Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants