-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[table info][4/4] add db snapshot restore logic #11795
Conversation
⏱️ 1h 21m total CI duration on this PR
|
}, | ||
}; | ||
|
||
let backup_restore_operator: Arc<GcsBackupRestoreOperator> = Arc::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does it need to be an Arc
?
// after reading db metadata info and deciding to restore, drop the db so that we could re-open it later | ||
close_db(db); | ||
|
||
sleep(Duration::from_millis(DB_OPERATION_INTERVAL_MS)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you see if you don't sleep?
use async sleep
.await; | ||
|
||
// a different path to restore backup db snapshot to, to avoid db corruption | ||
let restore_db_path = node_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably you wanna clean up the dest folder (deleting it)?
rename_db_folders_and_cleanup(&db_path, &tmp_db_path, &restore_db_path) | ||
.expect("Failed to operate atomic restore in file system."); | ||
|
||
sleep(Duration::from_millis(DB_OPERATION_INTERVAL_MS)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use async sleep
.expect("Failed to restore snapshot"); | ||
|
||
// Restore to a different folder and replace the target folder atomically | ||
let tmp_db_path = db_root_path.join("tmp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably want to clean up target folder
} | ||
} | ||
|
||
pub fn last_restored_timestamp(self) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming: expect_timestamp()
sounds more right
pub mod table_info; | ||
|
||
use aptos_schemadb::ColumnFamilyName; | ||
|
||
pub const DEFAULT_COLUMN_FAMILY_NAME: ColumnFamilyName = "default"; | ||
/// TODO(jill): to be deleted once INDEXER_METADATA_V2_CF_NAME is deployed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: you can't remove a column family from code unless you redo all the DB instances because RocksDB insists all existing CFs be mentioned in the open call. (but you can truncate the CF and rename the variable DEPRECATED_x)
This issue is stale because it has been open 45 days with no activity. Remove the |
This issue is stale because it has been open 45 days with no activity. Remove the |
This PR
Overview
[1/4] separate indexer async v2 db from aptosdb : #11799
[2/4] add gcs operational utils to set up for backup and restore to gcs: #11793
[3/4] add epoch based backup logic: #11794
[4/4] add db snapshot restore logic: #11795
Goal
https://www.notion.so/aptoslabs/Internal-Indexer-Deprecation-Next-Steps-824d5af7f16a4ff3aeccc4a4edee2763?pvs=4
Context
This effort is broken down into two parts:
Detailed Changes
Tradeoffs
backup based on epoch or transaction version? what frequency?
Pros of backup using version:
Cons of backup using version:
Pros of backup using epoch:
Cons of backup using epoch:
Decided to use epoch, because on testnet we have little over 10k epoch, divided by total txns, it gives us on average 70k txns per epoch, this sounds about the frequency of txns we'd like to backup.
when to restore
Decided to restore when both conditions are met:
version_diff
.RESTORE_TIME_DIFF_SECS
.This is to prevent fullnode crashlooping and constantly try to restore without luck, and when version difference is not that big we don't need to spam the gcs service but rather directly state syncing from that close to head version.
structure of the gcs bucket
I followed the similar structure as of indexer's filestore, where we keep a
metadata.json
file in the bucket to keep track of the chain id and newest backed up epoch. and then a files folder to keep all the epoch based backup. Each db snapshot is first compressed into a tar file from folder, and then gzipped to compress to the best size possible. based on larry's point, alternative compression like bzip2 is less performant.threads
Using a separate thread for backup only, base on the past experience, gcs upload could be as slow as minutes.
gcs prunning
Couple options we could pursue, since each backup file is a full db backup,
Decided to go with gcs own policy with the proper configuration setup in the gcs deployment. Reason behind is that deploying and maintaining another service is overhead and costs more money, especially this service's responsibility is very singular; writing code for gcs object deletion is not ideal, since we're writing and deleting, need to handle different multitude of edge cases; constantly writing to the same file is def not gonna work, since gcs has strict limitation on single object write limit, only once per second.
Test Plan
Concerns
There's a bottleneck on the size of db snapshot. Currently this db on testnet is around 250mb, based on the nature of db, the compression could get it to be 50-150mb per db snapshot. It's still too big to upload to gcs as its too slow.
TODO
E2E test
from rustie:
5.1. Spins up a fullnode based on the latest build. This would be the latest main nightly for instance
3.2. We can quit immediately after verifying that the restore was successful
3.3. The cost should be manageable, assuming that the restore process is quick.
integration test
load test
Couple things i want to verify with load testing