-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable local statesync/snapshot restore #13521
Conversation
hey looks like there was an issue with the branch you have locally. any way you could rebase to main? |
8b44455
to
82f708f
Compare
Ok, with that frame, I will re-review. I also would encourage we name this "feature" (I think it's a feature?) something like "Lazy State Sync". WDYT? |
server/start.go
Outdated
@@ -244,6 +247,53 @@ func startStandAlone(ctx *Context, appCreator types.AppCreator) error { | |||
return WaitForQuitSignals() | |||
} | |||
|
|||
func restoreLocalSnapshot(ctx *Context, app types.Application, ssRestoreHeight uint64) error { | |||
ctx.Logger.Info(fmt.Sprintf("Local Statesync Restore Snapshot With Height: %d", ssRestoreHeight)) | |||
ctx.Config.StateSync.RestoreHeight = ssRestoreHeight // Should be a one-shot command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to update the config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove (maybe can't, see next comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: Used here in the tendermint layer to detect a statesync, but there might be a way to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussing with the tendermint maintainers if there's a more preferred way to send a signal that cosmos-sdk performed a "Lazy State Sync".
They are asking to consider a slightly different code path which might require a variable in a different place (or no variable at all).
tendermint/tendermint#9541 (comment)
I'm having some trouble implementing the idea they provided due to some objects not being in scope to call so I'll post back here as soon as I get some assistance.
Thank you for the review. Will clean up the code as recommended in a day or two. |
"Lazy State Sync" works. Earlier thought of "Local State Sync", but perhaps the word "Local" is a bit overused. |
85a2277
to
49866ff
Compare
Regarding the usage of ctx.Logger.Error, do you have any recommendations on how to return an error after Logging it to console? Latest changes based on your feedback are added to this branch. |
49866ff
to
d104aa7
Compare
IIRC, error should not cause the process to halt. You can return the error after you log it. |
Appreciate all the reviews and recommendations. Please hold off merging until we finalize the tendermint side of this effort. :) |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hey who's calling us stale?! |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@chillyvee this feature is really awesome and I personally would love to see this land. I know this is contingent upon TM work being delivered first. Any progress on that? Or is there anyway we can define or merge this w/o waiting on TM s.t. when TM finally does include the necessary changes, this will just automatically work? Also, I'm more than happy to push this over the finish line myself too if you're too busy. Either way, just lmk 👍 |
On TM - Do you know if we're switching the official tendermint repo? If so we may need to issue a new PR. There was a bit of talk about how to structure the code in TM (how to take the implementation, restructure where various lines of code go). In the process, a lot of intermediate changes were requested, which raised questions, but not all were answered because the maintainer there had other priorities. Also TM wants to merge against main/head and backport, so it will take a few more iterations before we can reference from cosmos-sdk. Do you think it's possible to work together on both sides of cosmos-sdk/tendermint repo? |
Will check again :) There are a few more follow in features that might be worth adding in, but trying to keep code scope small so we can merge "quickly" without additional delays |
it could be useful even without the tm part, for example transform an archive node to a non-archive one fast, by restore to a previous snapshot and replay the blocks from tm. |
So the TM repo is moved to Informal's org and will be renamed in the coming days/weeks. Work will continue there, so yes, be sure to make sure all relevant issues and PRs are there now @chillyvee |
hey, really love this work, I will close it until we have the change in comet. We dont have a timeline for that, so dont want to leave this pr open for a unknown amount of time. Once we have a timeline i will contact you. Sorry for the situation. The comet team is working hard so i can see this happening soon. |
Seeking guidance for commit:
82f708f (rebased on main for ease of viewing, not tested, but good for comment)
Branch which works with Juno Uni Testnet https://github.com/chillyvee/cosmos-sdk/tree/cv_local_statesync_juno_uni
This PR enables cosmos-sdk to initiate local snapshot restore with tendermint finishing up the stateStore and blockStore data restoration.
This development is based on juno's uni testnet (cosmos-sdk v0.45.6, tendermint v0.34.20) and pairs with tendermint PR enhancement tendermint/tendermint#9541
This PR enables a validator quickly restore without needing to search for remote snapshot servers. In basis, if a validator already takes periodic snapshots, it is wasteful in time and bandwidth to search over P2P in hopes of finding a snapshot.
However, an RPC server is still required to fetch the last, current, and next blocks to fully restore stateStore and blockStore.
statesync/stateprovider.go
The use case is to quickly reduce disk usage which normally grows over time to an arguably large size without the need to slowly prune IAVL trees.
Before further development, we'd like to ask:
Do you prefer such work to be done on the main branch and then backport?
If testing is required, how much additional verification is preferred? For example, juno/uni was chosen because this proves that cosmwasm stores are properly restored. However it would not be possible to verify with just cosmos-sdk tests.
Please take a look at the structure of code and indicate whether any rules regarding ABCI or API/Responsibilities have been broken.
The current code design is meant for ease of review/comment before adding multiple additional layers of abstraction to put each bit of code in the right place.
What is an efficient way to check if local stores already have existing height data prior to restoring the snapshot?
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
!
to the type prefix if API or client breaking changeCHANGELOG.md
Reviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
I have...
!
in the type prefix if API or client breaking change