Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restore: point in time recovery #16161

Closed
flibustenet opened this issue May 26, 2017 · 8 comments
Closed

restore: point in time recovery #16161

flibustenet opened this issue May 26, 2017 · 8 comments
Assignees
Milestone

Comments

@flibustenet
Copy link

Point in time recovery is a blocker for me. Will it be possible in the architecture of cockroach ?
I couldn't find any documentation about it, i think it will be nice to include it in the comparison table.

@dt
Copy link
Member

dt commented May 26, 2017

@filbustenet While @danhhz can probably give you a more detailed answer (probably after the weekend), the short answer is that our mvcc storage model keeps timestamped revisions of every write (for a default of 24h), so a near instantaneous rollback to a previous state within that window is theoretically easy to add, and is on our near-term roadmap.

Beyond that 24h GC window, those historical revisions have been pruned from the running DB, and thus a restore from a backup is required -- a policy of frequent, small incremental backups would ensure the most flexibility there.

@danhhz
Copy link
Contributor

danhhz commented Jun 1, 2017

@dt is exactly right here. The only thing I have to add is that this is currently targeted for the 1.1 release

@petermattis petermattis added this to the 1.1 milestone Jun 1, 2017
@petermattis petermattis changed the title PITR Point in time recovery restore: point in time recovery Jun 1, 2017
@petermattis
Copy link
Collaborator

Point in time recovery is getting pushed to 1.2 due to engineering constraints.

@petermattis petermattis modified the milestones: 1.2, 1.1 Jun 30, 2017
@danhhz danhhz removed their assignment Aug 2, 2017
@dianasaur323
Copy link
Contributor

@flibustenet Thanks for the feature request. We are actively working on spec-ing out this feature. You can follow our progress through RFCs. It would be great if you could share with us your specific requirements - what is your expected recovery time for a given set of data? How large do you expect that data set to be? How often do you use point-in-time recovery?

@dianasaur323
Copy link
Contributor

dianasaur323 commented Oct 15, 2017

Acceptance Testing

  • Does point-in-time restore work in a 24 hour time period?
    • Run with ~1GB of data
    • Add table, perform a schema change, insert some rows -> try to go back in time again and see if that works
    • Run incremental backup on an existing full backup and observe if that works
    • These will all be run on the PM cluster
  • Does point-in-time recovery work in a >24 hour time period?
    • Take a full mvcc backup
    • Add table, drop table, perform schema change, insert some rows -> try an incremental backup
    • Try a restore of full mvcc backup
    • Try a restore of incremental + full mvcc backup

@maddyblue maddyblue removed their assignment Oct 16, 2017
@dt
Copy link
Member

dt commented Mar 1, 2018

@dianasaur323 can you close this if the above checklist is good?

@dt dt removed their assignment Mar 1, 2018
@dianasaur323
Copy link
Contributor

yes, can do! I'm just going to run the states one last time after the fixes went in, and then close out. Probably will do so early next week.

@dianasaur323
Copy link
Contributor

dianasaur323 commented Mar 12, 2018

Results of my actual testing:

  • I was testing on a ~1GB dataset of TPCC.
  • Ran a full backup that completed at 1:00 ET
  • I dropped the history table at 1:02 ET
  • I ran create table history_2 (test_id INT, test_value INT); at 1:05 ET
  • I ran an incremental backup at 1:09 ET
  • I ran a full mvcc at 1:12 ET
  • I inserted three rows into history_2 at 1:12 ET
  • I inserted another three rows into history_2 at 1:16ET
  • I ran another incremental backup at 1:20 ET
  • I ran an incremental mvcc at 1:23 ET

Results

  • Restored from FULL + INC -> expected empty history_table -> check!
  • Restored from FULL + INC + INC -> expected six rows in history_table -> check!
  • Tried to run FULL + INC + INC with MVCC -> shouldn't work -> hm... running with AS OF SYSTEM TIME at a time after all the backups worked and showed 6 rows. This is odd because I was able to get the error message I wanted (pq: incompatible RESTORE timestamp (BACKUP needs option 'revision_history')). On further investigation, it looks like if you have more than one incremental backup, the checking for revision_history doesn't work? Since putting in a time of 1:20 triggered the behavior I don't want.

Onto point-in-time restore

  • Restoring after 1:12 on the first mvcc did show the history_2 table, which was empty -> check!
  • Restoring before 1:02 did show the history table -> check!
  • Restoring after 1:12 did show the three rows on the incremental mvcc -> check!
  • Restoring after 1:16 did show the six rows on the incremental mvcc -> check!
  • Restored an incremental mvcc on top of a full backup without mvcc, and that didn't work. yay! -> check!

Note: it's becoming clear to me that we need some way to see the revision history of a row, and perhaps also be able to see the revision history of table descriptors. Otherwise, I'm manually writing these times down, and what happens if I didn't? Also, how do I figure out what time periods the backup covers? Is there a way to query this?

Overall, everything looks good. Opening that one final issue about the error message here: #23715

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants