[table info][4/4] add db snapshot restore logic #11795

jillxuu · 2024-01-26T06:03:40Z

This PR

add db snapshot restore logic

Overview

[1/4] separate indexer async v2 db from aptosdb : #11799
[2/4] add gcs operational utils to set up for backup and restore to gcs: #11793
[3/4] add epoch based backup logic: #11794
[4/4] add db snapshot restore logic: #11795

Goal

https://www.notion.so/aptoslabs/Internal-Indexer-Deprecation-Next-Steps-824d5af7f16a4ff3aeccc4a4edee2763?pvs=4

Migrating internal indexer off of aptos db critical path, move it into its own standalone runtime service.
Still provide the table info mapping to both API and indexer services.
improve table info parsing performance to unblock indexer perf bottleneck

Context

This effort is broken down into two parts:

part 1: [indexer-grpc-table-info] add table info parsing logic to indexer grpc fn #10783. Add moving the table info service out of the critical path and convert it into multithread to concurrently process the request.
part 2 is this PR. Now we have the table info service, but this service will only be enabled by a handful FNs, so when a new FN wants to join the network but they don't want to sync from the genesis, they should be able to restore the db from a cloud service. In order to provide such a cloud service for others to download the db snapshot, this pr focuses on two things: backup and restore. backup is optional in the config, but restore logic is written in the code.

before	after

Detailed Changes

Tradeoffs

backup based on epoch or transaction version? what frequency?

Pros of backup using version:

We have more control over the backup frequency as frequency is tunable.
Cons of backup using version:
Overhead of managing and comparing backed up versions and current processing versions

Pros of backup using epoch:

When to backup logic is much cleaner and less error prone
Cons of backup using epoch:
not very configurable but still tunable by setting frequency on how many epochs behind

Decided to use epoch, because on testnet we have little over 10k epoch, divided by total txns, it gives us on average 70k txns per epoch, this sounds about the frequency of txns we'd like to backup.

when to restore

Decided to restore when both conditions are met:

the difference btw next versions to be processed and current version from ledger is greater than a version_diff.
and time difference btw last restored timestamp in db and current timestamp is greater than a RESTORE_TIME_DIFF_SECS.
This is to prevent fullnode crashlooping and constantly try to restore without luck, and when version difference is not that big we don't need to spam the gcs service but rather directly state syncing from that close to head version.

structure of the gcs bucket

I followed the similar structure as of indexer's filestore, where we keep a metadata.json file in the bucket to keep track of the chain id and newest backed up epoch. and then a files folder to keep all the epoch based backup. Each db snapshot is first compressed into a tar file from folder, and then gzipped to compress to the best size possible. based on larry's point, alternative compression like bzip2 is less performant.

threads

Using a separate thread for backup only, base on the past experience, gcs upload could be as slow as minutes.

gcs prunning

Couple options we could pursue, since each backup file is a full db backup,

create another service to constantly clean up the backup files
use gcs own policy to delete files based on time, and other conditions
programmatically delete old files while upload
constantly writing to the same file

Decided to go with gcs own policy with the proper configuration setup in the gcs deployment. Reason behind is that deploying and maintaining another service is overhead and costs more money, especially this service's responsibility is very singular; writing code for gcs object deletion is not ideal, since we're writing and deleting, need to handle different multitude of edge cases; constantly writing to the same file is def not gonna work, since gcs has strict limitation on single object write limit, only once per second.

Test Plan

passing all the written unit tests on file system operation
locally tested and verified backup and restore, as well as table info read

Concerns

There's a bottleneck on the size of db snapshot. Currently this db on testnet is around 250mb, based on the nature of db, the compression could get it to be 50-150mb per db snapshot. It's still too big to upload to gcs as its too slow.

TODO

E2E test

from rustie:

set up a long-running fullnode in a new k8s project. We can use the same data-staging-us-central1 cluster, but use a new namespace, like indexer-fullnode-testnet-test or something. This is to isolate it from everything else, but we can use the same cluster for simplicity
Set up a job in the same namespace that does the backup to GCS.
We can set up a continuous job in aptos-core CI that:
5.1. Spins up a fullnode based on the latest build. This would be the latest main nightly for instance
3.2. We can quit immediately after verifying that the restore was successful
3.3. The cost should be manageable, assuming that the restore process is quick.

integration test

load test

Couple things i want to verify with load testing

10 fns boostrapping, can restore work for all of them
fns keep crashlooping, is gcs spammed based on egress & ingress
when file gets bigger, will backup still work and how long it could take

trunk-io · 2024-01-26T06:03:43Z

⏱️ 1h 21m total CI duration on this PR

Job	Cumulative Duration	Recent Runs
windows-build	32m	🟥 🟥
rust-unit-tests	13m	🟥 🟥
rust-lints	10m	🟥 🟥
check	8m	🟩 🟩
run-tests-main-branch	8m	🟥 🟥
general-lints	5m	🟩 🟩
check-dynamic-deps	4m	🟩 🟩
semgrep/ci	37s	🟩 🟩
file_change_determinator	26s	🟩 🟩
file_change_determinator	20s	🟩 🟩
permission-check	8s	🟩 🟩
permission-check	6s	🟩 🟩
permission-check	6s	🟩 🟩
permission-check	4s	🟩 🟩

_{settings ⋅ feedback ⋅ docs ⋅ learn more about trunk.io}

msmouse · 2024-02-13T19:46:34Z