-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EPIC: non-breaking but big UX improvements on IAVL #647
Comments
Now the rollback speedup is merged thanks @marko, the deleting orphan records don't need code changes to current version. |
What is the trade off of making lazy default vs not? I saw you have a pr to make it optional |
Yeah, no need to make it default, just configurable, let user choose, a pruning node operator maybe don't bother to try it, but an archive node operator who spend half hour just to start the node would be desperately want it.
|
makes sense, if its optional lets support it and document it. Should also make sure we remove it in the future. |
once this lands, I can test the release branch on a mainnet |
@yihuang hey, I think that there's an underlying issue that you're hitting with the half-hour startup times -- the limit on the number of keys that can be handled efficiently in goleveldb. |
all items have been merged so we can close this |
There are several huge UX improvements that can be done in a non-breaking way, which is also pretty easy to implement.
Faster Startup
Currently the node startup is very slow because of needing to load all roots, it takes half an hour on our production network just to start an archive node.
There is a lazy mode in current code base, it can reduce the startup time to seconds, but the comments warns about using for write operations, but we have tried it on production nodes and don't find any issues yet, reading code also don't show obvious issues with it, so we think it's worthy to give it try.
We propose to do some small adjustments to
LazyLoadVersion
to make it do the same thing asLoadVersion
1 and add an optional flag2 in cosmos-sdk to enable the lazy mode for all operations. It really solves a big UX issue.Faster Rollback
There are several trivial but significant performance improvement opportunities3 in method
LoadVersionForOverwriting
:These changes are backward compatible, it just removes some wasted parts.
With the above improvements, together with
LazyLoadVersion
and disable the fast node index, rollback can finished in 1 second, that's a huge UX improvement for recovering from app-hash mismatch situation.But when you enable the fast node index, it'll take the time to rebuild the index, that's unavoidable.
Delete Orphan Records To Save Space
Currently iavl tree stores the orphaned nodes for each version which is used to prune versions, it's proven by 4 and 5 that it's possible to do pruning without this information. Although the new approach can reduce db size and improvement main operations, but slows down the pruning operations, so I don't recommend to do this refactoring in
v0.19.x
, it'll work better with the new node key format. (TODO we can wait for some benchmark numbers to decide later).But existing archive node operators can already delete the orphan records now to save space now, our testing on testnet archive node shows it can save 24% in
application.db
, here's an example to use compaction filter to do it for rocksdb backend6.We can possibly provide the new pruning approach as a separate cli tool for users who need it though.
Footnotes
https://github.com/cosmos/iavl/pull/638 ↩
https://github.com/cosmos/cosmos-sdk/pull/14189 ↩
https://github.com/cosmos/iavl/pull/636 ↩
https://github.com/cosmos/iavl/pull/646 ↩
https://github.com/cosmos/iavl/pull/641 ↩
https://gist.github.com/yihuang/7031416e592f0a2f85201a775269f6a3 ↩
The text was updated successfully, but these errors were encountered: