-
Notifications
You must be signed in to change notification settings - Fork 844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add experimental --Xsnapsync-bft-enabled
which enables snap sync for BFT chains
#7140
Conversation
d9240ee
to
4002539
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. It is probably worth having @matkt have a look also, but this behavior seems well contained for *bft. We might want to push the "snap-sync minimum distance" up into PivotSelectorFromPeers. That is a worthwhile distinction to make on any consensus mechanism imo
// pivot distance | ||
if (bestPeer.get().chainState().getEstimatedHeight() | ||
<= syncConfig.getFastSyncPivotDistance()) { | ||
throw new NoSyncRequiredException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇 This alone could solve a majority of the snap at genesis problems. I think griefing could be an issue, but for a private chain, a bootstrap node should help solve issues of only finding peers with no chain history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just not sure why we need snapsync for a new chain. a fullsync will work well
&& validatorProvider | ||
.getValidatorsAtHead() | ||
.contains(Util.publicKeyToAddress(nodeKey.getPublicKey()))) { | ||
throw new NoSyncRequiredException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// pivot distance | ||
if (bestPeer.get().chainState().getEstimatedHeight() | ||
<= syncConfig.getFastSyncPivotDistance()) { | ||
throw new NoSyncRequiredException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just not sure why we need snapsync for a new chain. a fullsync will work well
@@ -80,7 +80,7 @@ public CompletableFuture<FastSyncState> start() { | |||
if (!running.compareAndSet(false, true)) { | |||
throw new IllegalStateException("SyncDownloader already running"); | |||
} | |||
LOG.info("Starting sync"); | |||
LOG.info("Starting fast sync"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not clean but this class is also used by snapsync. I think it's why we removed the fast
...um/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/fastsync/FastSyncDownloader.java
Outdated
Show resolved
Hide resolved
...um/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/fastsync/FastSyncDownloader.java
Outdated
Show resolved
Hide resolved
...um/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/snapsync/SnapSyncDownloader.java
Outdated
Show resolved
Hide resolved
Please test these changes with a Mainnet or Holesky / sepolia node to look for unintended side-effects as called out by Karim. Or insane logging or strange behavior. A Holesky snap sync should complete in a few hours and be considered an acceptance test for this PR. |
Getting back into this PR now. I have some refactoring planned but pleased to see that the general direction seems to be OK. @matkt I think there are benefits of snap sync to permissioned chains in the same way as public chains. Adding a new node (validator or otherwise) to a permissioned chain will take a long time to sync if the chain has been running for a while. I'll look at your comments on the log changes I made. I found it clearer to see what was going on and to match that to the code if the log entries were more specific to the class they were coming from. |
syncState.markInitialSyncPhaseAsDone(); | ||
|
||
if (result instanceof NoSyncRequiredState) { | ||
LOG.info("Sync completed (no sync required)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wonder in terms of readability if we say that the sync is finished and then we have a backward sync or a fullsync which starts. I don't know if we could slightly modify the log, but honestly I haven't yet found a word that would allow snap/fast that download directly the state to be qualified differently from the other syncs that execute the blocks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I completely agree - I've been struggling to come up with a term that refers to all of the non-full sync methods. It would be really useful if we could come up with a consistent term that we can use. "Checkpoint" feels like a good common term for it, but since there's a specific sync method called "checkpoint" we can't really use that. Maybe "pivot-based sync" is a a reasonable description of them all? That would make the above log message "Sync completed (no pivot-based sync required)" or similar. Perhaps to make it clear that no sync took place I should change it to "Sync ended (no pivot-based sync required)". I think "ended" is clearer that nothing happened, where as "completed" suggests something did happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is checkpoint sync appropriate for BFT networks? It should work based on a specified checkpoint but users would probably use a full snap sync in this case? Maybe "no snap sync required" is still valuable. If we are attempting to deprecate fast, we can likely avoid that language.
This page lays out some language we could try to mirror.
...reum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/sync/SynchronizerConfiguration.java
Show resolved
Hide resolved
--Xsnapsync-bft-enabled
which enables snap sync for BFT chains
Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
…ental flag to enabled snap + BFT Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
Completed a Holesky snap-sync with no unintended consequences (with the flag set). Some logging noise for cleanup. |
…account range handler. Add pipeline name and logs Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Karim Taam <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
…falls back to full sync if peer status isn't received quickly enough Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
…-based sync types Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
@garyschulte @matkt think I've tidied up remaining loose ends:
|
I won't die on the Would mark as 🚢 with that minor change. |
Yeah that's reasonable. I'll refactor the |
I think it will be better to test also a mainnet node just to be sure ( we have more pivot block swtich, more interesting usecases, etc). I don't think the modification will add regression but I will prefer to test (checkpoint is enough) |
@@ -143,13 +143,6 @@ public static DataStorageOptions create() { | |||
*/ | |||
public void validate(final CommandLine commandLine, final SyncMode syncMode) { | |||
if (DataStorageFormat.BONSAI == dataStorageFormat && bonsaiLimitTrieLogsEnabled) { | |||
if (SyncMode.FULL == syncMode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we are removing this one ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to get refactored into a separate PR (see Gary's comment #7140 (comment)).
But the reason for removing it is I've changed the logic to set --bonsai-limit-trie-logs-enabled=false
if --sync-mode=FULL
, rather than force the user to set --bonsai-limit-trie-logs-enabled=false
. See
if (syncMode == SyncMode.FULL |
It's a bit like how we automatically set --tx-pool-price-bump=0
if --min-gas-price=0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks for the info 🙏
@@ -99,7 +99,7 @@ private int preloadQueue() { | |||
} | |||
} | |||
|
|||
public void addToPruneQueue(final long blockNumber, final Hash blockHash) { | |||
public synchronized void addToPruneQueue(final long blockNumber, final Hash blockHash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain why this modification is needed ? I think if it was not needed on mainnet, I think it should be the same on qbft ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is an existing bug that hasn't been hit on mainnet because the rate of receiving and handling new trie-logs from peers isn't fast enough to hit the threading issue.
I was getting ConcurrentModificationExceptions
on the trieLogBlocksAndForksByDescendingBlockNumber
map when a QBFT node was syncing from several peers due to addToPruneQueue
modifying the map while pruneFromQueue
was iterating over it.
Since it hasn't been hit on mainnet before, I'm assuming the making the methods synchronized
isn't going to affect mainnet syncing performance at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's useful I can try recreating it to show the full ConcurrentModificationException
stack, but it was definitely an intermittent (but reliable) exception once we had got the basic syncing logic working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I just ping @siladu who has coded this part and can verify that we are good
@@ -94,7 +94,7 @@ protected CompletableFuture<FastSyncState> start(final FastSyncState fastSyncSta | |||
onBonsai.clearFlatDatabase(); | |||
onBonsai.clearTrieLog(); | |||
}); | |||
LOG.debug("Start sync with initial sync state {}", fastSyncState); | |||
LOG.debug("Start fast sync with initial sync state {}", fastSyncState); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can do the same as here https://github.com/hyperledger/besu/pull/7140/files#diff-c971b57a1d994b1c08fd5f43fd3cf282a41ccd524cafa8421c6eb1bad9087b08R83
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it genuinely is fast
sync. The start()
method is common to this class and the SnapSyncDownload
subclass.
The start(final FastSyncState fastSyncState)
method is actually doing fast
sync in this class, and is doing snap
sync in SnapSyncDownloader.java
so in SnapSyncDownloader
the log says "Start snap sync..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok missed that thanks 🙏
…f sync-mode = FULL (moving to another PR) Signed-off-by: Matthew Whitehead <[email protected]>
The latest commit removes the trie-log change and I'll raise a separate PR for that shortly where any additional discussion can take place |
Signed-off-by: Matt Whitehead <[email protected]>
Signed-off-by: Matt Whitehead <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GTG by me, 🚢
…r BFT chains (hyperledger#7140) * Create a BFT-specific pivot block handler Signed-off-by: Matthew Whitehead <[email protected]> * Change visibility Signed-off-by: Matthew Whitehead <[email protected]> * Refactor sync-peer-count internal variable to match name, add experimental flag to enabled snap + BFT Signed-off-by: Matthew Whitehead <[email protected]> * Merge with main Signed-off-by: Matthew Whitehead <[email protected]> * Fix uppercase Signed-off-by: Matthew Whitehead <[email protected]> * Address synchronization issue with trie pruning. Create BFT-specific account range handler. Add pipeline name and logs Signed-off-by: Matthew Whitehead <[email protected]> * Remove debug log Signed-off-by: Matthew Whitehead <[email protected]> * fixing snapsync for empty state Signed-off-by: Karim Taam <[email protected]> * Don't queue up events we can't handle Signed-off-by: Matthew Whitehead <[email protected]> * Fix timing window where a validator with an empty data dir sometimes falls back to full sync if peer status isn't received quickly enough Signed-off-by: Matthew Whitehead <[email protected]> * Remove BFT-specific account request class. Not needed Signed-off-by: Matthew Whitehead <[email protected]> * Refactor some more 'fast' sync variables that are common to all pivot-based sync types Signed-off-by: Matthew Whitehead <[email protected]> * In FULL sync mode, disable bonsai-limit-trie-logs-enabled instead of failing to start Signed-off-by: Matthew Whitehead <[email protected]> * Add javadoc comments, clarify overriding bonsai-limit-trie-logs-enabled Signed-off-by: Matthew Whitehead <[email protected]> * Add BFT pivot block selector tests Signed-off-by: Matthew Whitehead <[email protected]> * Fix failure error message Signed-off-by: Matthew Whitehead <[email protected]> * Remove the unnamed Pipe constructor and update tests to set a pipe name Signed-off-by: Matthew Whitehead <[email protected]> * Revert some info logs back to debug given the feedback on noise in the logs syncing with holesky Signed-off-by: Matthew Whitehead <[email protected]> * Refactor fastSyncPivotDistance to syncPivotDistance Signed-off-by: Matthew Whitehead <[email protected]> * Incomplete refactoring Signed-off-by: Matthew Whitehead <[email protected]> * Update BFT event queueing tests Signed-off-by: Matthew Whitehead <[email protected]> * Event queue test fixes Signed-off-by: Matthew Whitehead <[email protected]> * Remove automatic setting of bonsai-limit-trie-logs-enabled to false if sync-mode = FULL (moving to another PR) Signed-off-by: Matthew Whitehead <[email protected]> --------- Signed-off-by: Matthew Whitehead <[email protected]> Signed-off-by: Karim Taam <[email protected]> Signed-off-by: Matt Whitehead <[email protected]> Co-authored-by: Karim Taam <[email protected]>
…r BFT chains (hyperledger#7140) * Create a BFT-specific pivot block handler Signed-off-by: Matthew Whitehead <[email protected]> * Change visibility Signed-off-by: Matthew Whitehead <[email protected]> * Refactor sync-peer-count internal variable to match name, add experimental flag to enabled snap + BFT Signed-off-by: Matthew Whitehead <[email protected]> * Merge with main Signed-off-by: Matthew Whitehead <[email protected]> * Fix uppercase Signed-off-by: Matthew Whitehead <[email protected]> * Address synchronization issue with trie pruning. Create BFT-specific account range handler. Add pipeline name and logs Signed-off-by: Matthew Whitehead <[email protected]> * Remove debug log Signed-off-by: Matthew Whitehead <[email protected]> * fixing snapsync for empty state Signed-off-by: Karim Taam <[email protected]> * Don't queue up events we can't handle Signed-off-by: Matthew Whitehead <[email protected]> * Fix timing window where a validator with an empty data dir sometimes falls back to full sync if peer status isn't received quickly enough Signed-off-by: Matthew Whitehead <[email protected]> * Remove BFT-specific account request class. Not needed Signed-off-by: Matthew Whitehead <[email protected]> * Refactor some more 'fast' sync variables that are common to all pivot-based sync types Signed-off-by: Matthew Whitehead <[email protected]> * In FULL sync mode, disable bonsai-limit-trie-logs-enabled instead of failing to start Signed-off-by: Matthew Whitehead <[email protected]> * Add javadoc comments, clarify overriding bonsai-limit-trie-logs-enabled Signed-off-by: Matthew Whitehead <[email protected]> * Add BFT pivot block selector tests Signed-off-by: Matthew Whitehead <[email protected]> * Fix failure error message Signed-off-by: Matthew Whitehead <[email protected]> * Remove the unnamed Pipe constructor and update tests to set a pipe name Signed-off-by: Matthew Whitehead <[email protected]> * Revert some info logs back to debug given the feedback on noise in the logs syncing with holesky Signed-off-by: Matthew Whitehead <[email protected]> * Refactor fastSyncPivotDistance to syncPivotDistance Signed-off-by: Matthew Whitehead <[email protected]> * Incomplete refactoring Signed-off-by: Matthew Whitehead <[email protected]> * Update BFT event queueing tests Signed-off-by: Matthew Whitehead <[email protected]> * Event queue test fixes Signed-off-by: Matthew Whitehead <[email protected]> * Remove automatic setting of bonsai-limit-trie-logs-enabled to false if sync-mode = FULL (moving to another PR) Signed-off-by: Matthew Whitehead <[email protected]> --------- Signed-off-by: Matthew Whitehead <[email protected]> Signed-off-by: Karim Taam <[email protected]> Signed-off-by: Matt Whitehead <[email protected]> Co-authored-by: Karim Taam <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]>
* EIP 7702 first draft Signed-off-by: Daniel Lehrner <[email protected]> * added CHANGELOG.md entry Signed-off-by: Daniel Lehrner <[email protected]> * bug fixes, added first tests Signed-off-by: Daniel Lehrner <[email protected]> * container verify GitHub workflow (#7239) Container verification step in release process automated with the container verify GitHub workflow. New workflow is triggered at the end of the release workflow which will check the release container images starts successfully. Verification test only checks container starts and reach the Ethereum main loop Signed-off-by: Chaminda Divitotawela <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Investigate chain halts when syncing (#7162) Fix some reasons for chain download halts when syncing Signed-off-by: [email protected] <[email protected]> Signed-off-by: Stefan Pingel <[email protected]> Co-authored-by: Sally MacFarlane <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Check for EOFCreate subcontainer rules (#7232) Check and test for the unused container rule, and only returncontract targets can have truncated data rule. Also test the other subcontainer rules in unit tests. Signed-off-by: Danno Ferrin <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Remove deprecation message for `--Xp2p-peer-lower-bound` (#7247) Signed-off-by: Gabriel-Trintinalia <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * less invasive code injection approach Signed-off-by: Daniel Lehrner <[email protected]> * added missing java doc & fixed test Signed-off-by: Daniel Lehrner <[email protected]> * added (currently non-working) acceptance test, some bug fixes in the transaction validation and tx pool logic Signed-off-by: Daniel Lehrner <[email protected]> * fix spotless Signed-off-by: Daniel Lehrner <[email protected]> * updated acceptance test, still not working, newPayload request seems to be necessary before final fork choice update Signed-off-by: Daniel Lehrner <[email protected]> * use correct world state to inject temporary code, inject code in existing accounts as well Signed-off-by: Daniel Lehrner <[email protected]> * renamed test service to prague, because the engine versions used are only available in the prague hard fork Signed-off-by: Daniel Lehrner <[email protected]> * fixed acceptance test, some bug fixes if authorized account does not yet exist Signed-off-by: Daniel Lehrner <[email protected]> * Add build version option to prefix git hash with custom version property (#7222) * Add build version option to prefix git hash with custom version property * Refactor to make appending the git hash a boolean property. Include a commented-out example of how to use the properties in the gradle file Signed-off-by: Matthew Whitehead <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Handle invalid snap getTrieNode requests with empty paths gracefully (#7221) Signed-off-by: Jason Frame <[email protected]> Co-authored-by: Sally MacFarlane <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * fix typos in CHANGELOG (#7226) Signed-off-by: Ties <[email protected]> Co-authored-by: Matt Nelson <[email protected]> Co-authored-by: Sally MacFarlane <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * feat: Add network option for LUKSO Mainnet (#7223) * Add option for LUKSO network Signed-off-by: Wolmin <[email protected]> * Add tests for LUKSO Signed-off-by: Wolmin <[email protected]> * Apply spotless Signed-off-by: Wolmin <[email protected]> * Add changelog entry Signed-off-by: Wolmin <[email protected]> * Fix duplicate func Signed-off-by: Wolmin <[email protected]> * Fix changelog Signed-off-by: Wolmin <[email protected]> * Add bootnodes to genesis Signed-off-by: Wolmin <[email protected]> --------- Signed-off-by: Wolmin <[email protected]> Signed-off-by: Wolmin <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Update Docker base image to Ubuntu 24.04 (#7251) Signed-off-by: Fabio Di Fabio <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Reconfigure how Protocol Specs are created WRT EVM condiguration (#7245) Make the max code size and max initcode size a part of the EVM configuration. As part of the change we need to move the tasks CodeFactory once handled as a static class and move it into the EVM. This has a nice follow on effect that we don't need to pass in max EOF versions or max code sizes anymore. Signed-off-by: Danno Ferrin <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Fix the wrong 'Identifier' and 'Synchronizer' usage (#7252) * fix the synchronizer usage Signed-off-by: Leni <[email protected]> * fix Identifier usage Signed-off-by: Leni <[email protected]> --------- Signed-off-by: Leni <[email protected]> Co-authored-by: Sally MacFarlane <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Fix flaky SECP256R1 test (#7249) Signed-off-by: Daniel Lehrner <[email protected]> Co-authored-by: Sally MacFarlane <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * update to work with the new max retries value (#7253) Signed-off-by: Justin Florentine <[email protected]> Co-authored-by: Sally MacFarlane <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Temporary CancunEOF fork for EOF testing. (#7227) Add Genesis ("CancunEOFTime") and reference test ("CancunEOF") support for a temporary Cancun+EOF fork, in anticipation of potential devnets. Signed-off-by: Danno Ferrin <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * fixed bug introduced through merge of main, made acceptance test easier to understand Signed-off-by: Daniel Lehrner <[email protected]> * added missing java docs Signed-off-by: Daniel Lehrner <[email protected]> * removed unnecessary tag Signed-off-by: Daniel Lehrner <[email protected]> * make encodeSingleSetCode public again Signed-off-by: Daniel Lehrner <[email protected]> * Snapserver responses to return at least one response (#7190) Signed-off-by: Jason Frame <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * copy setCodeTransactionPayloads as well Signed-off-by: Daniel Lehrner <[email protected]> * fixed bug during tests with forrest db Signed-off-by: Daniel Lehrner <[email protected]> * Snapserver GetTrieNodes request to handle short hash for storage (#7264) Signed-off-by: Jason Frame <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * javadoc: Adding javadoc for ethstats module (#7269) * javadoc: Adding javadoc for ethstats module --------- Signed-off-by: Usman Saleem <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Fix javadoc for ethereum:core top level package (#7270) * javadoc - Apply javadoc to ethereum core top level package --------- Signed-off-by: Usman Saleem <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Disable Flaky tests - permissioning (#7256) * disable some flaky tests Signed-off-by: Sally MacFarlane <[email protected]> * correct name for test Signed-off-by: Sally MacFarlane <[email protected]> * formatting Signed-off-by: Sally MacFarlane <[email protected]> * disable some flaky tests Signed-off-by: Sally MacFarlane <[email protected]> --------- Signed-off-by: Sally MacFarlane <[email protected]> Co-authored-by: Usman Saleem <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Add bootnodes to the maintained peer list (#7257) * Add bootnodes to the maintained peer list Signed-off-by: Matthew Whitehead <[email protected]> * Update unit tests Signed-off-by: Matthew Whitehead <[email protected]> * Add entry in changelog Signed-off-by: Matthew Whitehead <[email protected]> * Tweak unit test Signed-off-by: Matthew Whitehead <[email protected]> * Refactor to keep common steps the same for both cases Signed-off-by: Matthew Whitehead <[email protected]> * Add debug log, call sanitizePeers() only once Signed-off-by: Matthew Whitehead <[email protected]> --------- Signed-off-by: Matthew Whitehead <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Fix javadoc for ethereum api module, graphql package (#7272) * javadoc - Adding missing javadocs ethereum:api graphql package Signed-off-by: Usman Saleem <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * T8n support for isStateTest and empty accounts (#7275) Update t8n executor to support new isStateTest env flag that will disable extra-transactional processing such as block rewards and beacon root. Also, make sure such extra-transactional commits don't create empty accounts. Signed-off-by: Danno Ferrin <[email protected]> Co-authored-by: Usman Saleem <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Promote storage x-trie-log subcommand to trie-log (#7278) Signed-off-by: Simon Dudley <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Evm tool readme update (#7274) * update paths to binary. update docker build to use java 21 * updated suggested jdk --------- Signed-off-by: Justin Florentine <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * javadoc - Add missing javadoc for evmtool module (#7277) Signed-off-by: Usman Saleem <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Rename ValidatorPublicKey to ValidatorPubKey (#7280) Adapt to EIP-7002 name change for validator public key in all places. Signed-off-by: Danno Ferrin <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Add info-level diagnostic logs to aid with resolving stalled BFT chains (#7271) * Add info-level diagnostic logs to aid with resolving stalled BFT chains Signed-off-by: Matthew Whitehead <[email protected]> * Add javadoc Signed-off-by: Matthew Whitehead <[email protected]> --------- Signed-off-by: Matthew Whitehead <[email protected]> Signed-off-by: Matt Whitehead <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Update EIP-2935 contract (#7281) Use the updated contract and address for EIP-2539. Signed-off-by: Danno Ferrin <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * add evmtool compability, fixing bugs related to sender recovery of 7702 txs and handling authorizations to empty accounts Signed-off-by: Daniel Lehrner <[email protected]> * Deeper tracing of self-destructed accounts (#7284) Consider previously self-destructed accounts when creating accounts. Signed-off-by: Danno Ferrin <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * next release version after 24.7.0 (#7285) Signed-off-by: garyschulte <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Add experimental `--Xsnapsync-bft-enabled` which enables snap sync for BFT chains (#7140) * Create a BFT-specific pivot block handler Signed-off-by: Matthew Whitehead <[email protected]> * Change visibility Signed-off-by: Matthew Whitehead <[email protected]> * Refactor sync-peer-count internal variable to match name, add experimental flag to enabled snap + BFT Signed-off-by: Matthew Whitehead <[email protected]> * Merge with main Signed-off-by: Matthew Whitehead <[email protected]> * Fix uppercase Signed-off-by: Matthew Whitehead <[email protected]> * Address synchronization issue with trie pruning. Create BFT-specific account range handler. Add pipeline name and logs Signed-off-by: Matthew Whitehead <[email protected]> * Remove debug log Signed-off-by: Matthew Whitehead <[email protected]> * fixing snapsync for empty state Signed-off-by: Karim Taam <[email protected]> * Don't queue up events we can't handle Signed-off-by: Matthew Whitehead <[email protected]> * Fix timing window where a validator with an empty data dir sometimes falls back to full sync if peer status isn't received quickly enough Signed-off-by: Matthew Whitehead <[email protected]> * Remove BFT-specific account request class. Not needed Signed-off-by: Matthew Whitehead <[email protected]> * Refactor some more 'fast' sync variables that are common to all pivot-based sync types Signed-off-by: Matthew Whitehead <[email protected]> * In FULL sync mode, disable bonsai-limit-trie-logs-enabled instead of failing to start Signed-off-by: Matthew Whitehead <[email protected]> * Add javadoc comments, clarify overriding bonsai-limit-trie-logs-enabled Signed-off-by: Matthew Whitehead <[email protected]> * Add BFT pivot block selector tests Signed-off-by: Matthew Whitehead <[email protected]> * Fix failure error message Signed-off-by: Matthew Whitehead <[email protected]> * Remove the unnamed Pipe constructor and update tests to set a pipe name Signed-off-by: Matthew Whitehead <[email protected]> * Revert some info logs back to debug given the feedback on noise in the logs syncing with holesky Signed-off-by: Matthew Whitehead <[email protected]> * Refactor fastSyncPivotDistance to syncPivotDistance Signed-off-by: Matthew Whitehead <[email protected]> * Incomplete refactoring Signed-off-by: Matthew Whitehead <[email protected]> * Update BFT event queueing tests Signed-off-by: Matthew Whitehead <[email protected]> * Event queue test fixes Signed-off-by: Matthew Whitehead <[email protected]> * Remove automatic setting of bonsai-limit-trie-logs-enabled to false if sync-mode = FULL (moving to another PR) Signed-off-by: Matthew Whitehead <[email protected]> --------- Signed-off-by: Matthew Whitehead <[email protected]> Signed-off-by: Karim Taam <[email protected]> Signed-off-by: Matt Whitehead <[email protected]> Co-authored-by: Karim Taam <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Turn off CicleCI for Besu (#7291) All the CI jobs run in GitHub actions and Circle CI it no longer needed in Besu project Signed-off-by: Chaminda Divitotawela <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Check for snap server (#6609) * EthPeer add isServingSnap to be able to make sure that we have enough snap servers connected when we are snap syncing Signed-off-by: [email protected] <[email protected]> Signed-off-by: Sally MacFarlane <[email protected]> Co-authored-by: Sally MacFarlane <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Implement System Calls (#7263) Signed-off-by: Gabriel-Trintinalia <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * wrap WorldUpdater inside a WorldUpdaterService to inject the authorized code whenever needed Signed-off-by: Daniel Lehrner <[email protected]> * Update limit trie logs validation message for sync-mode=FULL (#7279) Signed-off-by: Simon Dudley <[email protected]> Co-authored-by: Sally MacFarlane <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Execute requests before block persist (#7295) Signed-off-by: Gabriel-Trintinalia <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * fixed MainnetTransactionProcessor retrieval of correctn `to` account with injected code, fixed code injection Signed-off-by: Daniel Lehrner <[email protected]> * only first authorization is accepted, all the following ones are ignored Signed-off-by: Daniel Lehrner <[email protected]> * don't cache account with empty code Signed-off-by: Daniel Lehrner <[email protected]> * revert wrapping of world updater, as its `updater()` method creates a type of nesting that is not compatible with wrapping it. Instead a service is injected in the world updater to inject the code into the authorized accounts Signed-off-by: Daniel Lehrner <[email protected]> * Fixed outdated tech redirect link. (#7297) * fix wiki link Signed-off-by: Snazzy <[email protected]> * fix format Signed-off-by: Snazzy <[email protected]> * change knownHash Signed-off-by: Snazzy <[email protected]> --------- Signed-off-by: Snazzy <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Increment private nonce even if transaction failed. (#6593) Increment private nonce even if transaction failed Signed-off-by: George Tebrean <[email protected]> Signed-off-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Stefan Pingel <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * feat: Enhance --profile to load external profiles (#7292) * feat: --profile can load external profiles * fix external profile name method * fix ProfilesCompletionCandidate * test: Add unit tests * changelog: Update changelog * test: Fix TomlConfigurationDefaultProviderTest * test: Fix BesuCommandTest --------- Signed-off-by: Usman Saleem <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Fix status badge for documentation (#7304) Documentation has been moved to GitHub pages and no longer use readthedocs. Updated the README status badge for docs with correct link Signed-off-by: Chaminda Divitotawela <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * [MINOR] Fixed some typos (#7299) * typos Signed-off-by: Sally MacFarlane <[email protected]> --------- Signed-off-by: Sally MacFarlane <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * refactored to share one AuthorizedAccountService between the different instances of the world updater, renamed some classes Signed-off-by: Daniel Lehrner <[email protected]> * spotless Signed-off-by: Daniel Lehrner <[email protected]> * load code for authorization at the beginning of the transaction Signed-off-by: Daniel Lehrner <[email protected]> * ignore authorization if chain id doesn't match Signed-off-by: Daniel Lehrner <[email protected]> * cache authority address, evmtool: do not fail if sender address is wrong Signed-off-by: Daniel Lehrner <[email protected]> * Add evmtool block-test subcommand (#7293) * Add evmtool block-test subcommand Add an evmtool subcommand that will run non-hive blockchain tests. Signed-off-by: Danno Ferrin <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Make the retrying snap tasks switching (#7307) * make snap tasks switching Signed-off-by: [email protected] <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * 6612: Remove deprecated sync modes and related helper methods (#7309) * 6612: Remove deprecated sync modes and related helper methods Signed-off-by: Matilda-Clerke <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * EOF Reference Test Fixes (#7306) Fix a number of issues found in reference tests and evmone tests. - Be tolerant of more nulls in json - Support ContainerKind in reference tests - re-order EXTCALL oeprands - correct return value for REVERT in EXT*CALL - re-order EOFCREATE code validation Signed-off-by: Danno Ferrin <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * test template refactor, bump besu-native to 0.9.2 (#7315) Signed-off-by: garyschulte <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Feature/use gnark-crypto for eip-2537 (#7316) * use gnark-crypto for bls precompiles Signed-off-by: garyschulte <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * 6612 update changelog with removed syncmodes (#7320) * 6612: Update changelog with removal of deprecated sync modes Signed-off-by: Matilda Clerke <[email protected]> * 6612: Update changelog with removal of deprecated sync modes Signed-off-by: Matilda Clerke <[email protected]> * 6612: Update changelog with removal of deprecated sync modes Signed-off-by: Matilda Clerke <[email protected]> --------- Signed-off-by: Matilda Clerke <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Update datacopy (#7319) Check for OOG earlier in DataCopy. Add unit tests to cover operation branches. Signed-off-by: Danno Ferrin <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * disable flaky test (#7308) * disable flaky test Signed-off-by: Sally MacFarlane <[email protected]> * disable flaky test Signed-off-by: Sally MacFarlane <[email protected]> --------- Signed-off-by: Sally MacFarlane <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Update unit test (#7317) * Update parameterized unit tests so the enumerate with --dry-run * Update the prague-withdrawal.json unit test to handle current code Signed-off-by: Danno Ferrin <[email protected]> Co-authored-by: Sally MacFarlane <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * persist accounts that have storage updates, but no nonce, balance nor code Signed-off-by: Daniel Lehrner <[email protected]> * Revert "persist accounts that have storage updates, but no nonce, balance nor code" This reverts commit 9c9121a. Signed-off-by: Daniel Lehrner <[email protected]> * removed PKI backed QBFT (#7310) * removed PKI backed QBFT Signed-off-by: Sally MacFarlane <[email protected]> * changelog Signed-off-by: Sally MacFarlane <[email protected]> --------- Signed-off-by: Sally MacFarlane <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * EIP-7251 add consolidation request type (#7266) * add request type for consolidations, encoder, decoder and tests * added raw tx for consolidation * add consolidation reqs to EngineGetPayloadResultV4 * set storage slot value to 0 initially and value for tx * updates plugin api Signed-off-by: Justin Florentine <[email protected]> Signed-off-by: Sally MacFarlane <[email protected]> --------- Signed-off-by: Sally MacFarlane <[email protected]> Signed-off-by: Justin Florentine <[email protected]> Co-authored-by: Justin Florentine <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * fix: `eth_call` deserialization to correctly ignore unknown fields in the transaction object (#7323) * fix: Use Builder for JsonCallParameter * changelog * add additional unit tests * fix: Update builder to withGas to match the json eth_call --------- Signed-off-by: Usman Saleem <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * Stop transaction selection on TX_EVALUATION_TOO_LONG (#7330) Signed-off-by: Fabio Di Fabio <[email protected]> Signed-off-by: Daniel Lehrner <[email protected]> * message frame buider will create AuthorizedCodeService by itsef if it isn't set Signed-off-by: Daniel Lehrner <[email protected]> * get correct nonce for authorization Signed-off-by: Daniel Lehrner <[email protected]> * nonce only returns a vaid nonce, new method nonceList returns all the nonces Signed-off-by: Daniel Lehrner <[email protected]> * plugs leaky abstraction Signed-off-by: Justin Florentine <[email protected]> * some renaming, acceptance tests checks for exact balance of tx sponsor at the end. Signed-off-by: Daniel Lehrner <[email protected]> * inject the 7702 code into DiffBasedWorldStateUpdateAccumulator.createAccount Signed-off-by: Daniel Lehrner <[email protected]> * spotless Signed-off-by: Justin Florentine <[email protected]> * spotless fix, removed todos Signed-off-by: Daniel Lehrner <[email protected]> * make AuthorityProcessor & chain id for it optional Signed-off-by: Daniel Lehrner <[email protected]> --------- Signed-off-by: Daniel Lehrner <[email protected]> Signed-off-by: Chaminda Divitotawela <[email protected]> Signed-off-by: [email protected] <[email protected]> Signed-off-by: Stefan Pingel <[email protected]> Signed-off-by: Danno Ferrin <[email protected]> Signed-off-by: Gabriel-Trintinalia <[email protected]> Signed-off-by: Matthew Whitehead <[email protected]> Signed-off-by: Jason Frame <[email protected]> Signed-off-by: Ties <[email protected]> Signed-off-by: Wolmin <[email protected]> Signed-off-by: Wolmin <[email protected]> Signed-off-by: Fabio Di Fabio <[email protected]> Signed-off-by: Leni <[email protected]> Signed-off-by: Justin Florentine <[email protected]> Signed-off-by: Usman Saleem <[email protected]> Signed-off-by: Sally MacFarlane <[email protected]> Signed-off-by: Simon Dudley <[email protected]> Signed-off-by: Matt Whitehead <[email protected]> Signed-off-by: garyschulte <[email protected]> Signed-off-by: Karim Taam <[email protected]> Signed-off-by: Snazzy <[email protected]> Signed-off-by: George Tebrean <[email protected]> Signed-off-by: Matilda-Clerke <[email protected]> Signed-off-by: Matilda Clerke <[email protected]> Co-authored-by: Chaminda Divitotawela <[email protected]> Co-authored-by: Stefan Pingel <[email protected]> Co-authored-by: Sally MacFarlane <[email protected]> Co-authored-by: Danno Ferrin <[email protected]> Co-authored-by: Gabriel-Trintinalia <[email protected]> Co-authored-by: Matt Whitehead <[email protected]> Co-authored-by: Jason Frame <[email protected]> Co-authored-by: Ties <[email protected]> Co-authored-by: Matt Nelson <[email protected]> Co-authored-by: Wolmin <[email protected]> Co-authored-by: Fabio Di Fabio <[email protected]> Co-authored-by: leniram159 <[email protected]> Co-authored-by: Justin Florentine <[email protected]> Co-authored-by: Usman Saleem <[email protected]> Co-authored-by: Simon Dudley <[email protected]> Co-authored-by: garyschulte <[email protected]> Co-authored-by: Karim Taam <[email protected]> Co-authored-by: gringsam <[email protected]> Co-authored-by: George Tebrean <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Matilda-Clerke <[email protected]>
PR description
This PR adds various changes/fixes that allow snap-sync and BFT to be used together.
Since this is a relatively new combination of configuration I have added an experimental flag
--Xsnapsync-bft-enabled
which must be set in order for Besu to start.My general opinion for enterprise chains is that it is probably reasonable for a BFT chain to have
sync-min-peers
set to1
. The trust requirements of a public chain are not as applicable to an enterprise chain, so the need to have e.g.5
peers before you are willing to consider their pivot-block data is too restrictive. At the very least it makes it impossible to have a chain of 4 validators, or a chain of 6 validators with one unavailable. I've broadly outlined below thesnap
sync logic that this PR uses, some of which assumes that an enterprise user will set--sync-min-peers=1
if their chain only has a small number of nodes. I think there's an argument for the--profile=enterprise
defaults to set--sync-min-peers=1
, but I'm going to save that for a separate PR.The basic logic used for sync is as follows:
--Xsynchronizer-fast-sync-pivot-distance
(50
by default) it skips any attempt atsnap
sync. Quitting the snap sync process allows the node to start contributing to BFT voting which ensures the chain isn't stalled waiting for this node. (The state for the most recent 50 blocks is obtained usingfull
sync, which is existing behaviour)snap
sync but believes itself to be the only validator it exitssnap
sync. As above, this ensures that the node can then start its BFT mining coordinator and continue producing new blocks, rather than get stuck in a loop of failedsnap
sync attempts which it will probably never get out of. This is particularly useful for the case where a single BFT validator node is started as a new chain, to which other validators will be added later on. Without this logic it would wait for at least 1 other peer, which may never come along.sync-min-peers
peers who are validators, but none of them has any usable pivot data, it quits thesnap
sync process, again to allow its BFT mining coordinator to start producing new blocks. This is necessary in a case where a number of new validators have been created for a brand new chain, but because they are all new and have no state to share they would get stuck in a loop of failedsnap
sync attempts.(It's also worth noting that I've continued the process of renaming the fast-sync internal variables to be more generic as they also apply to
snap
andcheckpoint
sync)(Note: see related PR #7204 which fixes a follow on issue)
Documentation
The broad topic of sync-ing is covered in the public section of the docs here: https://besu.hyperledger.org/development/public-networks/get-started/connect/sync-node
I think the private section of the docs would do with its own sync-ing topic, which refers to the article above for the basic "how do the different sync modes work" but explains the permissioned-chain specific behaviours that I've implemented in this PR. I think the explanation in the description above is probably a reasonable starting point to write those docs, but I'm happy to review and edit the new topic once an initial draft is there.