-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sui-tool] Introduce formal snapshot restore #13794
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Ignored Deployments
|
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
September 15, 2023 05:18
d91f759
to
78595ee
Compare
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
September 18, 2023 18:00
78595ee
to
607a9b9
Compare
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
September 18, 2023 18:21
607a9b9
to
525537e
Compare
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
September 19, 2023 13:56
22fb8aa
to
221a33c
Compare
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
September 19, 2023 13:59
221a33c
to
d7ee4e0
Compare
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
September 21, 2023 00:11
d7ee4e0
to
fb63b91
Compare
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
September 21, 2023 23:56
21468d1
to
6aaf139
Compare
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
September 22, 2023 00:24
6aaf139
to
59768ba
Compare
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
September 22, 2023 01:09
59768ba
to
6831f76
Compare
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
September 22, 2023 01:17
6831f76
to
84cc493
Compare
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
September 22, 2023 01:36
84cc493
to
ec9dc76
Compare
williampsmith
force-pushed
the
formal-snapshot-dl
branch
2 times, most recently
from
October 13, 2023 19:37
8e59d41
to
636a069
Compare
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
October 13, 2023 19:38
636a069
to
d0c911b
Compare
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
October 13, 2023 19:42
d0c911b
to
19c3a71
Compare
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
October 13, 2023 19:51
19c3a71
to
3d8a753
Compare
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
October 13, 2023 20:00
3d8a753
to
08eba2d
Compare
7 tasks
sadhansood
approved these changes
Oct 26, 2023
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
October 26, 2023 17:55
d59916f
to
7b3e6b8
Compare
## Description - Optimize checkpoint summary sync + verification - Rather than blocking on verification during summary sync, which can be slow as it requires that we sync in order, instead sync all checkpoint summaries, and then locally verify. - This optimization moves checkpoint summary sync and verification from 3.5 hours to ~20 minutes, as measured against testnet `epoch_125` snapshot - Optimize state accumulation - Parallel divide and conquer partial accumulators (per file partition), then union - This speeds up accumulation from 3.2 hours to 20 minutes (for same benchmark as above) - Introduce early termination on snapshot verification failure - Introduce `verbose` flag, which, when not set, sets log level to `off` for cleaner status output - Factor out snapshot accumulation and object download/bulk-load for easier readability ## Test Plan Ran formal snapshot restore from `sui-tool` and verified improvements --- If your changes are not user-facing and not a breaking change, you can skip the following section. Otherwise, please indicate what changed, and then add to the Release Notes section as highlighted during the release process. ### Type of Change (Check all that apply) - [ ] protocol change - [ ] user-visible impact - [ ] breaking change for a client SDKs - [ ] breaking change for FNs (FN binary must upgrade) - [ ] breaking change for validators or node operators (must upgrade binaries) - [ ] breaking change for on-chain data layout - [ ] necessitate either a data wipe or data migration ### Release notes
williampsmith
force-pushed
the
formal-snapshot-dl
branch
from
October 26, 2023 18:18
7b3e6b8
to
69ed5a8
Compare
jonas-lj
pushed a commit
to jonas-lj/sui
that referenced
this pull request
Nov 2, 2023
## Description Extend sui-tool snapshot downloader to download, verify and restore from formal snapshot. Note that if `--verify` is set to true, only protocol versions where `commit_root_state_digest` is true are eligible, as this relies on root state hash commitment at end of epoch. The following tasks are orchestrated: * Performing checkpoint summary sync (with verification) to the end of the target epoch via archival store * Downloading all snapshot object refs * Checksumming all object refs to verify there is no discrepancy between the object store manifest and the contents * Accumulating all object refs and comparing against consensus checkpoint commitment (root state hash). This protects against restoring from a compromised snapshot and ensures that the state after restore is consistent with the network * Downloading and loading into perpetual store the end of epoch live object set contents from the snapshot * Setting other critical state necessary for node to startup and join the network (create committee store, create epoch start configuration, set checkpoint watermarks, etc) ## Test Plan 1. Run the following to perform snapshot restore ``` GCS_SNAPSHOT_SERVICE_ACCOUNT_FILE_PATH=<path> AWS_ARCHIVE_ACCESS_KEY_ID=<key> AWS_ARCHIVE_SECRET_ACCESS_KEY=<key> AWS_ARCHIVE_REGION=us-west-2 sui-tool download-db-snapshot --epoch 125 --genesis /opt/sui/config/genesis.blob --formal --network testnet --path /opt/sui/db/authorities_db/full_node_db --num-parallel-downloads 50 ``` 2. Startup `sui-node` and observe that node is able to execute checkpoints successfully and ultimately reconfig to the next epoch. ``` [00:07:01] ████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 977 out of 977 .ref files done [03:34:42] ████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 10670050/10670050(Checkpoint summary download is complete) [00:02:36] ██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 977 out of 977 ref files checksummed (Checksumming complete) [02:20:05] ██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 977 out of 977 ref files accumulated (Accumulation complete) [02:20:05] ████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 977 out of 977 .obj files done (Objects download complete) 2023-10-11T02:35:15.060244Z INFO sui_archival::reader: Terminating the manifest sync loop 2023-10-11T02:35:15.060334Z INFO sui_tool: Formal snapshot state verification complete! 2023-10-11T02:35:15.468872Z INFO sui_storage::mutex_table: Stopping mutex table cleanup! 2023-10-11T02:35:15.510039Z INFO sui_storage::mutex_table: Stopping mutex table cleanup! 2023-10-11T02:35:19.644477Z INFO sui_tool: Successfully restored state from snapshot at end of epoch 125 ubuntu@fullnode-compat-test-03:/opt/sui/db/authorities_db/full_node_db$ systemctl status sui-node ● sui-node.service - Sui Node Loaded: loaded (/etc/systemd/system/sui-node.service; disabled; vendor preset: enabled) Active: active (running) since Wed 2023-10-11 02:39:26 UTC; 6s ago Main PID: 344206 (sui-node) Tasks: 110 (limit: 308692) Memory: 1.1G (high: 246.0G max: 251.0G swap max: 0B available: 244.8G) CPU: 7.651s CGroup: /system.slice/sui-node.service └─344206 /opt/sui/bin/sui-node --config-path /opt/sui/config/sui-node.yaml ubuntu@fullnode-compat-test-03:/opt/sui/db/authorities_db/full_node_db$ curl -s http://localhost:9184/metrics | grep 'current_epoch ' current_epoch 126 ubuntu@fullnode-compat-test-03:/opt/sui/db/authorities_db/full_node_db$ curl -s http://localhost:9184/metrics | grep 'last_executed_checkpoint ' last_executed_checkpoint 10678488 ubuntu@fullnode-compat-test-03:/opt/sui/db/authorities_db/full_node_db$ curl -s http://localhost:9184/metrics | grep 'last_executed_checkpoint ' last_executed_checkpoint 10679365 # after some time ubuntu@fullnode-compat-test-03:/opt/sui/db/authorities_db/full_node_db$ curl -s http://localhost:9184/metrics | grep 'current_epoch ' current_epoch 129 ``` ### Type of Change (Check all that apply) - [ ] protocol change - [ ] user-visible impact - [ ] breaking change for a client SDKs - [ ] breaking change for FNs (FN binary must upgrade) - [ ] breaking change for validators or node operators (must upgrade binaries) - [ ] breaking change for on-chain data layout - [ ] necessitate either a data wipe or data migration ### Release notes
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Extend sui-tool snapshot downloader to download, verify and restore from formal snapshot. Note that if
--verify
is set to true, only protocol versions wherecommit_root_state_digest
is true are eligible, as this relies on root state hash commitment at end of epoch.The following tasks are orchestrated:
Test Plan
sui-node
and observe that node is able to execute checkpoints successfully and ultimately reconfig to the next epoch.Type of Change (Check all that apply)
Release notes