Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snap server rebase #6640

Merged
merged 16 commits into from
Mar 30, 2024
Merged

Conversation

garyschulte
Copy link
Contributor

@garyschulte garyschulte commented Feb 28, 2024

PR description

Initial implementation of snap server using streaming flat database. Obseletes #5887

This PR adds snap server functionality behind a feature flag --Xsnapsync-server-enabled which enables besu to serve snap/1 protocol fully.

Snap server has a specific set of storage requirements:

  • --data-storage-format=BONSAI, snap server only works with bonsai storage
  • --Xbonsai-code-using-code-hash-enabled, snap server nodes need to store code by code hash in order to serve snap code requests
  • --Xsnapsync-synchronizer-flat-db-healing-enabled, Snap server nodes must be snap synced with full flat db enabled, (or the node has to have been full-synced to have a fully-flattened db)

Given two of these requrirements (code-by-code-hash and flat-db-healing), most nodes will have to resync in order to serve snap data, with these parameters in the config file:

data-storage-format="BONSAI"
Xsnapsync-server-enabled=true
Xbonsai-code-using-code-hash-enabled=true
Xsnapsync-synchronizer-flat-db-healing-enabled=true

Performance Impact

Testing the initial implementation has shown serving snap data has slight impact to execution performance in normal network conditions, on the order of 10%

Screenshot 2024-03-22 at 5 12 25 PM

Load Test

To get an idea of the cost to execution performance in less than ideal conditions, besu was tested with the load of 2 static syncing peers (besu serving all of the sync load to both clients simulatenously). In this case besu showed roughly 2x the block execution times of besu nodes which were not serving snap data.

Normally besu would only serve some fraction of the load of syncing clients. We could push this further with more static syncing clients, but this was sufficient to get signal about the performance impact.

Once the snap clients completed their worldstate sync, besu performance returned to the mean immediately

2 client sync perf

Read vs write during this time:

Screenshot 2024-03-22 at 5 06 39 PM

Testing Clients

Besu 23.4.x
Geth 13.14
Nethermind 1.25.4

What's Next

  • Optimizations. This is an initial implementation of snap server and now that we are satisfied in its correctness, we will turn to optimizing to minimize its impact on block execution times.

  • A follow-on PR will add a storage subcommand to convert existing databases to fully flattened and code-by-code-hash storage, without the need for a resync.

Fixed Issue(s)

fixes #3165

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Most advanced CI tests are deferred until PR approval, but you could:

  • locally run all unit tests via: ./gradlew build
  • locally run all acceptance tests via: ./gradlew acceptanceTest
  • locally run all integration tests via: ./gradlew integrationTest
  • locally run all reference tests via: ./gradlew ethereum:referenceTests:referenceTests

@garyschulte garyschulte force-pushed the snap-server-rebase branch 5 times, most recently from 6fb37f2 to 80eb1ac Compare March 16, 2024 00:10
@garyschulte garyschulte marked this pull request as ready for review March 16, 2024 00:27
@garyschulte garyschulte requested review from matkt and jflo March 16, 2024 00:27
@garyschulte garyschulte force-pushed the snap-server-rebase branch 5 times, most recently from b4fc969 to a0ec424 Compare March 16, 2024 03:35
@garyschulte garyschulte force-pushed the snap-server-rebase branch 5 times, most recently from f571c45 to b51f3c6 Compare March 22, 2024 23:20
Copy link
Contributor

@matkt matkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some questions but in general I like the implementation 💯

worldStateStorageCoordinator.worldStateKeyValueStorage())
.getFlatDbStrategy();
if (!flatDbStrategy.isCodeByCodeHash()) {
LOGGER.warn("SnapServer requires code stored by codehash, but it is not enabled");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what we are doing in this case ? not sure to understand if we are stopping the snapserver ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, snap server will operate as normal with just a warning. Essentially we will just serve empty code requests if the storage hasn't migrated. If we think it is necessary we can refuse to start snap server if we are not in code-by-code hash storage mode. I was thinking that we would address this corner case via the storage migration subcommand PR

}
}

MessageData constructGetTrieNodesResponse(final MessageData message) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to limite the number of trie nodes to serve by request as Geth. Because it seems to be a possible dos vector

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will 👀 what geth is doing here. Just to clarify, what about trie nodes queries makes you think they are disproportionately expensive? I would think range requests would represent more resource consumption, especially considering we are not directly using an iterator but rather collecting in memory to serve the requests.

Copy link
Contributor

@matkt matkt Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ask you 30000 trie nodes that does not exist. You will still have to read in base to find out if the trie node exists. So I can spam you as much as I want. The Range is not the same I ask you for a range and the peers will control what it returns in this range according to the number of leaves. To test. it is an intuition that I have it could be wrong

@garyschulte garyschulte force-pushed the snap-server-rebase branch 3 times, most recently from 82a5de6 to 3560866 Compare March 29, 2024 19:41
Copy link
Contributor

@macfarla macfarla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few nits mostly for logging

.map(DefaultSynchronizer.class::cast)
.ifPresentOrElse(
z -> this.listenerId.set(z.subscribeInitialSync(this)),
() -> LOGGER.warn("SnapServer created without reference to sync status"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be nice for logs to be consistent with snap server vs SnapServer


var resp = StorageRangeMessage.create(collectedStorages, proofNodes);
LOGGER.debug(
"returned in {} storage {} to {} range {} to {} with {} storages and {} proofs, resp size {} of max {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this log message is duplicated, could be refactored to a method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to skip this one for now since all of the return messages at least subtly different.


/** See https://github.com/ethereum/devp2p/blob/master/caps/snap.md */
@SuppressWarnings("unused")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this annotation be removed now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently yes 👍

garyschulte and others added 15 commits March 29, 2024 17:22
Signed-off-by: garyschulte <[email protected]>
…nap request takes too long

Signed-off-by: garyschulte <[email protected]>
Signed-off-by: garyschulte <[email protected]>
Co-authored-by: Sally MacFarlane <[email protected]>
Signed-off-by: garyschulte <[email protected]>
@garyschulte garyschulte merged commit 34fc5ee into hyperledger:main Mar 30, 2024
42 checks passed
@garyschulte garyschulte deleted the snap-server-rebase branch April 3, 2024 19:24
amsmota pushed a commit to Citi/besu that referenced this pull request Apr 16, 2024
* initial snap server implementation

Signed-off-by: garyschulte <[email protected]>
Co-authored-by: Sally MacFarlane <[email protected]>
Signed-off-by: amsmota <[email protected]>
amsmota pushed a commit to Citi/besu that referenced this pull request Apr 16, 2024
* initial snap server implementation

Signed-off-by: garyschulte <[email protected]>
Co-authored-by: Sally MacFarlane <[email protected]>
Signed-off-by: amsmota <[email protected]>
matthew1001 pushed a commit to kaleido-io/besu that referenced this pull request Jun 7, 2024
* initial snap server implementation

Signed-off-by: garyschulte <[email protected]>
Co-authored-by: Sally MacFarlane <[email protected]>
jflo pushed a commit to jflo/besu that referenced this pull request Jun 7, 2024
* initial snap server implementation

Signed-off-by: garyschulte <[email protected]>
Co-authored-by: Sally MacFarlane <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement snapsync as a server
3 participants