-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge block databases #2829
Merge block databases #2829
Conversation
Had a bootstrap in progress, stopped and upgraded the database with this PR (~21M block count, 30M unchecked) on Ryzen 3600, 16GB RAM and NVME SSD:
Though painful the benefits are significant. I agree with vacuuming pre-upgrade, possibly even a rebuild as we've seen that speeds up upgrades considerably and reduces the maximum size reached during the upgrade. @zhyatt Should this have the removal label instead of semantic? |
@guilhermelawless Yes, just swapped out the labels. |
Windows 10 - SSD (NVME), Ryzen 3700X, 64GB RAM (11 minutes)
I did some experimenting with a rebuild vacuum before the upgrade and afterwards. After first rebuild |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending doc updates.
Ubuntu 20.04 - NVMe SSD Samsung 970 EVO Plus 512GB, Ryzen 3900X, 64GB RAM (5 mins)
|
Ubuntu 20.04 - NVMe Optane, Ryzen 3900X, 64GB RAM (2 mins)
|
send
/change
/open
/receive
&state
block databases have been removed. They are merged into ablocks
database, and are serialized as follows:block_type -> block -> sideband
The
block_type
is a new addition which allows the block and sideband to be correctly deserialized. There isblock_details
insideband
which could be used, but it would require shrinking the number of bits for theepoch
and alot more other interface changes which didn't seem worth it, an extra byte for each block is used instead for theblock_type
. We already follow this approach for serializing blocks in unchecked for instance so it allowed code re-use there too.Removes
block_count_type
RPC as it's pretty useless now as there's no distinction possible in LMDB without extra IO to store the counts. This is possible with RocksDB as we are storing the count anyway, but I think having consistent RPC interface is more important.The upgrade path is interesting because we do not have a way to merge 2 databases with different value types. For this I used in-memory sorting of smallish databases (legacy open/receive/change) first and then creating temporary databases for that and the send/state blocks which added the new value type (extra
block_type
). The smallest databases were then merged before the larger ones to reduce as much iteration.The benefits this provides, will mean RocksDB does not need to worry about stale memtables anymore. Checking if a block exists sometimes took up to 5 database reads, now only 1 is required. It has also reduced complexity in various areas.
The database upgrade was tested on a few systems, logging info shown below:
Windows 10 - SSD, Ryzen 3700X, 64GB RAM (20 minutes)
Windows 10 - SSD (NVME), Ryzen 3700X, 64GB RAM (11 minutes)
Ubuntu - SSD, Ryzen 2600, 16GB RAM (19 mins)
The starting ledger was 36GB, unvacuumed after upgrade it becomes 61GB, vacuumed 22GB. Currently it is set up to automatically vacuum after upgrade, however this might be difficult for some users with storage constraints, perhaps we should make this step optional, or also vacuum the pre-upgraded ledger first?
Also did some benchmarking of LMDB/RocksDB performance & ledger size when using fixed and variable sized keys:
Didn't see any real difference between fixed and variable sized values from LMDB or RocksDB, or a difference in ledger size. All
gets
were recorded after a computer restart to prevent any OS caching affected the results.