-
Notifications
You must be signed in to change notification settings - Fork 793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds RocksDB support #2197
Adds RocksDB support #2197
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tested with rocksdb 6.1.2 (brew)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, tested with vcpkg pulled rocksdb6.1.2 static with msvc-14.1x64
LGTM, tested with rocksdb 6.2.2 static git build, Debian 10 |
…tabase to explicitly flush them
This PR allows the node to use RocksDB instead of LMDB for the ledger by setting the CMake flag
-DNANO_ROCKSDB=ON
(config/flag options will be added later). There is no upgrade path, so bootstrapping will have to be done from scratch.RocksDB does not allow getting counts (only estimates), due to the way LSM trees work, multiple
put
s to the same key will result in duplications before the compaction phase. From: google/leveldb#119 it is recommended to store counts separately which I've done in a separatecached_counts
table.The write transactions used by RocksDB will fail on
commit ()
if there is a conflicting key being modified. It is now a requirement to state which ones are modified during atx_begin_write ()
call.Each column family has a lock, and if there can be concurrent modifications of the same keys then they must be added to the "requires lock" collection, otherwise they can be added to the "no lock required" collection. This now requires some thought, and it is also more visible what transactions are modifying. Confirmation height is one such table which does not require a lock, even though there is concurrent modification of the table, only new ones are added in 1 thread, and modifications to existing ones in another thread. There may be other ones like this, but they have not been investigated yet (a pessimistic approach was taken).
Notes: When backing up using
--snapshot
CLI option, it is currently set up to do incremental backups, which reduces copying the need to copy the whole database. However if deleting the original files then the backup directory should also be deleted otherwise there can be inconsistencies (Put in documentation?)RocksDB is an external dependency as there was no clear way to incorporate it in our build.
Windows:
Recommended way:
set (VCPKG_LIBRARY_LINKAGE static)
to the top of%VCPKG_DIR%\ports\rocksdb\portfile.cmake
vcpkg install rocksdb:x64-windows
Ubuntu:
apt-get install librocksdb-dev
apt-get install zlib1g-dev
(may be required if system does not havezlib
installed already)Or
Note: If running into:
/usr/bin/ld: ../node/libnode.a(rocksdb.cpp.o):(.data.rel.ro._ZTIN7rocksdb11StackableDBE[_ZTIN7rocksdb11StackableDBE]+0x10): undefined reference to 'typeinfo for rocksdb::DB'
Then it may be necessary to build with
USE_RTTI=1
:facebook/rocksdb#4329
Mac:
brew install rocksdb
For more instructions:
https://github.com/facebook/rocksdb/blob/master/INSTALL.md
The following CMake options can be used to specify where the RocksDB and zlib libraries are (another dependency):
I've noted some things that may/should be done on top of this:
TSAN/ASAN/Valgrind and if appropriate use suppressions (similarly to LMDB). TSAN/ASAN has been run on Clang/Ubuntu and there is 1 recurring TSAN error, however the same is reported here ThreadSanitizer reports data race in LevelDB during sync ethereum/aleth#4740 so I don't think there is necessarily anything to worry about. However I haven't been able to suppress it, but leaving that for another task.cached_counts
table more efficiently without requiring separate get/put operations.Delete
has to do aget
currently to make sure it exists, this may not be necessary in some places.get
always makes a copyget
s are one offs so they can be turned off for block caching.rocksdb_iterator.hpp
ongoing_peer_store
12 threads 3.4GHZ CPU, 16GB RAM, SSD. Time taken (only taking into account bootstrapping):
1vCPU, 2GB RAM, SSD. (Note: There was 6tps going on during later half of the LMDB sync)
Comparison
Some test notes, try:
Run tests with LMDB & RocksDB tests
Test snapshot/vacuum with RocksDB compaction
RPC calls with RocksDB
CLI write commands, e.g
peer_clear
should give an appropriate error message if the node is already running.