-
Notifications
You must be signed in to change notification settings - Fork 850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Healing Mechanism for Flat Database in Besu #5319
Conversation
Signed-off-by: Karim TAAM <[email protected]>
|
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
...ger/besu/ethereum/eth/sync/snapsync/request/heal/AccountFlatDatabaseHealingRangeRequest.java
Fixed
Show fixed
Hide fixed
...ger/besu/ethereum/eth/sync/snapsync/request/heal/StorageFlatDatabaseHealingRangeRequest.java
Fixed
Show fixed
Hide fixed
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
a13fece
to
35af584
Compare
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
a1a1bb3
to
a3173ee
Compare
Signed-off-by: Karim TAAM <[email protected]>
return false; | ||
} | ||
|
||
public abstract Stream<SnapDataRequest> getChildRequests( | ||
final SnapWorldDownloadState downloadState, | ||
final WorldStateStorage worldStateStorage, | ||
final SnapSyncState snapSyncState); | ||
final SnapSyncProcessState snapSyncState); |
Check notice
Code scanning / CodeQL
Useless parameter
} | ||
} | ||
|
||
public HashSet<Bytes> getAccountsToBeRepaired() { |
Check failure
Code scanning / CodeQL
Inconsistent synchronization of getter and setter
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
7413945
to
c2c5214
Compare
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets 🚢
Signed-off-by: garyschulte <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
This reverts commit 180c751.
This reverts commit 180c751. Signed-off-by: Stefan <[email protected]>
The proposed pull request introduces a feature that allows healing of the flat database by streaming the flat database data and validating it by generating a proof from the trie structure. If the proof is found to be invalid, the code traverses the trie to fix the invalid range. To optimize the process and avoid checking the entire flat database, the PR includes enhancements such as tracking the accounts that need to be repaired during SnapSync. By implementing these optimizations, the PR aims to significantly reduce the time and resources required for repairing the flat database. Signed-off-by: Karim TAAM <[email protected]>
The proposed pull request introduces a feature that allows healing of the flat database by streaming the flat database data and validating it by generating a proof from the trie structure. If the proof is found to be invalid, the code traverses the trie to fix the invalid range. To optimize the process and avoid checking the entire flat database, the PR includes enhancements such as tracking the accounts that need to be repaired during SnapSync. By implementing these optimizations, the PR aims to significantly reduce the time and resources required for repairing the flat database. Signed-off-by: Karim TAAM <[email protected]>
This reverts commit d8bb218.
The proposed pull request introduces a feature that allows healing of the flat database by streaming the flat database data and validating it by generating a proof from the trie structure. If the proof is found to be invalid, the code traverses the trie to fix the invalid range. To optimize the process and avoid checking the entire flat database, the PR includes enhancements such as tracking the accounts that need to be repaired during SnapSync. By implementing these optimizations, the PR aims to significantly reduce the time and resources required for repairing the flat database. Signed-off-by: Karim TAAM <[email protected]>
The proposed pull request introduces a feature that allows healing of the flat database by streaming the flat database data and validating it by generating a proof from the trie structure. If the proof is found to be invalid, the code traverses the trie to fix the invalid range. To optimize the process and avoid checking the entire flat database, the PR includes enhancements such as tracking the accounts that need to be repaired during SnapSync. By implementing these optimizations, the PR aims to significantly reduce the time and resources required for repairing the flat database. Signed-off-by: Karim TAAM <[email protected]>
PR description
Sync time
I did severals test and this is the time I needed to sync on m6A.2xlarge
For non-developers:
This feature aims to improve the performance of your node by enabling faster block processing time. It achieves this by reducing the number of disk accesses during block processing. The cost of this feature is a slightly larger database size and a slightly longer sync time (an additional 2 hours on m6.2xlarge).
Regarding storage It can vary, depending on your database. However, with the current state of the world state, we anticipate a maximum increase of 55 GB.
Regarding sync time, we are also working on other optimizations to reduce the overall sync time, so this will be quickly offset.
Since the processing time will be faster, you should further reduce the chances of missing attestations, making your validator even better.
You need to run a Bonsai Besu node with this flag
--Xsnapsync-synchronizer-flat-db-healing-enabled=true
and resync the worldstate.To do that you can delete your database and resync from scratch or you can just call this RPC endoint in an already synced Besu. This RPC call will trigger a resync for the worldstate only without downloading alll the blocks again, like that your sync will be faster (sometimes you need to restart besu after this call to really triggger the resync).
For developers:
Why ?
Besu uses a rocksdb-based database to store the state of Ethereum as a Merkle tree and a flat database. The flat database contains the leaf nodes of the Merkle tree, allowing direct access through accountHash or slotHash without the need to traverse the entire tree.
During the snapsync process, Besu switches pivot blocks multiple times, resulting in a mix of several blocks in the state. While there exists a healing step to correct the tree (similar to fastsync), there was previously no mechanism to heal the flat database. Consequently, it was necessary to clear and rebuild the flat database post-sync, which significantly impacted SLOAD performance.
The proposed pull request introduces a feature that allows healing of the flat database by streaming the flat database data and validating it by generating a proof from the trie structure. If the proof is found to be invalid, the code traverses the trie to fix the invalid range. To optimize the process and avoid checking the entire flat database, the PR includes enhancements such as tracking the accounts that need to be repaired during SnapSync. By implementing these optimizations, the PR aims to significantly reduce the time and resources required for repairing the flat database.
Healing of the flat database is running at the end of SnapSync or Checkpoint Sync. This process is expected to take approximately 2/3 hours on the mainnet and is designed to improve the processing time of blocks by reducing the number of read database operations for SLOAD. By performing this database repair, it is anticipated that overall system performance will be enhanced, resulting in more efficient block processing and improved responsiveness for some rpc calls.
You need to see something like that during the healing step
Tested
Healing Mechanism for Flat Database in Besu
The purpose of the healing mechanism is to ensure a complete flat database after the sync process and eliminate the need for fallbacks. This documentation outlines the steps involved in healing the flat database and improving SLOAD performance.
Healing Process
The healing process for the flat database involves the following steps:
Tree Healing: Before healing the flat database, the Merkle tree is healed using the existing process, ensuring the tree accurately represents the state of Ethereum.
Flat Database Verification: After the tree healing process, the flat database is verified for validity by traversing it in ranges and comparing the data with the Merkle tree. The purpose is to identify any inconsistencies between the flat database and the tree.
Healing the Flat Database: If any inconsistencies are found during the verification process, the flat database needs to be corrected. To achieve this, the following steps are performed:
Completing the Healing Process: Once all identified inconsistent ranges have been corrected, the healing process for the flat database is complete. The flat database now accurately reflects the state of Ethereum after the sync process.
Performance Improvements
The healing mechanism for the flat database provides significant performance improvements, particularly for SLOAD operations and READ ZERO operations. With a complete and accurate flat database, the need for fallbacks to the Merkle tree is eliminated.
Previously, when data was not present in the flat database, Besu had to fallback to the Merkle tree for each SLOAD operation, resulting in multiple database accesses. This fallback process was unnecessary and had a significant impact on performance. Similarly, READ ZERO operations incurred redundant fallbacks.
By healing the flat database and ensuring its completeness, the number of fallbacks to the Merkle tree is reduced to 0, resulting in improved SLOAD performance. READ ZERO operations also benefit from the elimination of unnecessary fallbacks.
Range Proof Explanation:
A range proof is a cryptographic proof that allows verification of a range of leaf nodes in a Merkle trie. It demonstrates that a specific range of leaf nodes is part of the trie and that their hashes contribute to the root hash.
With a range proofs, a verifier can reconstruct the path from the root to the specified leaf nodes and calculate the root hash. If the calculated root hash matches the actual root hash , the range is considered valid, indicating that the corresponding data in the flat database is correct.
Block processing performance:
We can notice a 20% improvement (274 ms instead of 344 ms on 50th percentile) on the node running the flat database feature compared the node running current main branch.
Additionally, there has been a significant enhancement in outliers (99th and 100th percentiles), which will enhance the attestions' performance impacted by such outliers.
CPU profiling:
The improvement is pretty clear when checking the profiling of both nodes, especially on the SLOAD operation
Without this PR (current main)
With this PR
Fixed Issue(s)