Peering - disconnects refactor #6968

macfarla · 2024-04-19T02:51:26Z

PR description

Before this PR, when we get to max peers, we refuse ALL incoming connections regardless of any properties of the incoming peer (with UNKNOWN reason). There were a few spots where we were disconnecting peers for various reasons including repeated timeouts and "useless" responses. This PR aims to consolidate the disconnects, as well as only disconnecting an established peer when we have a better peer to replace it with.

Move the disconnection decision from PeerReputation/Peer to EthPeers so the totality of current peers can be considered.
use reputation score to sort peers within EthPeers (bestPeerComparator)
- note in a few places in tests I've explicitly set the comparator to what was used before (uses chain height estimate) to avoid having to update a heap of tests that are dependent on the decision made by the comparator
only disconnect "worst" peer if we have max peers (think this will help on holesky for "useless" disconnects but not for socket closed etc - where the connection is already gone).
on incoming connection, compare the incoming peer to the current collection of peers and (if at max peers) disconnect whichever compares least favourably - in effect this will be the incoming peer if all our current peers are giving us good responses (reputation score), or it will be an existing peer if any are not giving good responses

example debug log (yes there is a TODO to remove this log)
{"@timestamp":"2024-04-19T00:53:31,297","level":"DEBUG","thread":"nioEventLoopGroup-3-9","class":"EthPeers","message":"comparing worstCurrentPeer PeerId: 0x024e106a70572288... PeerReputation score: 87, timeouts: {3=5}, useless: 0, validated? true, disconnected? false, client: erigon/v2.58.2-125509e4/linux-amd64/go1.21.8, [Connection with hashCode 1276176835 inboundInitiated? true initAt 1713481827797], enode://024e106a70572288701e97724610d120a682f69f24881eeee0ab1f6379646780d93daf11d9226593faf71deb699f24020dbecf801eb3d0da779ef2be641590fa@3.38.172.157:30304 with connectingPeer PeerId: 0x1ec3a5e247e616a3... PeerReputation score: 100, timeouts: {}, useless: 0, validated? true, disconnected? false, client: Nethermind/v1.25.4+20b10b35/linux-x64/dotnet8.0.2, [Connection with hashCode 1951644955 inboundInitiated? false initAt 1713488011205], enode://1ec3a5e247e616a347038b9c35a1328529bdf354408fe3c968433df542eb1b2a7c7d1b7b7d481a5b6465baac64877ee53e1396ddf43a3ad0f5f0adbaed659145@188.40.67.160:30303","throwable":""}

Have seen decent results on holesky and mainnet. See screenshots

Fixed Issue(s)

Refs #6805 and #6842

Thanks for sending a pull request! Have you done the following?

Checked out our contribution guidelines?
Considered documentation and added the doc-change-required label to this PR if updates are required.
Considered the changelog and included an update if required.
For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

unit tests: ./gradlew build
acceptance tests: ./gradlew acceptanceTest
integration tests: ./gradlew integrationTest
reference tests: ./gradlew ethereum:referenceTests:referenceTests

Signed-off-by: Sally MacFarlane <[email protected]>

…til we actually compare Signed-off-by: Sally MacFarlane <[email protected]>

Signed-off-by: Sally MacFarlane <[email protected]>

macfarla · 2024-04-19T02:54:50Z

3 x mainnet nodes
1 is a bit flat

macfarla · 2024-04-19T02:56:12Z

3 x holesky

Signed-off-by: Sally MacFarlane <[email protected]>

…nnects

Beanow · 2024-05-05T17:50:34Z

As mentioned in #6945 do see a lot less UNKNOWN disconnects once hitting our peer limit.

Holesky here.

Signed-off-by: Sally MacFarlane <[email protected]>

…nnects

macfarla · 2024-05-22T00:49:27Z

burn-in of 10 bonsai/checkpoint mainnet nodes (started in 2 batches)

macfarla · 2024-05-22T00:53:09Z

peering graph from the first batch of mainnet nodes

and the second

pinges · 2024-06-05T05:54:59Z

ethereum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/manager/EthPeer.java

@@ -215,7 +215,7 @@ public void recordRequestTimeout(final int requestCode) {
        .addArgument(this::getLoggableId)
        .log();
    LOG.trace("Timed out while waiting for response from peer {}", this);
-    reputation.recordRequestTimeout(requestCode, this).ifPresent(this::disconnect);
+    reputation.recordRequestTimeout(requestCode, this);


I think that we should disconnect these, because otherwise we might waist time by using these peers for requests, because they are likely to timeout as well.

pinges · 2024-06-05T05:55:34Z

ethereum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/manager/EthPeer.java

@@ -224,7 +224,7 @@ public void recordUselessResponse(final String requestType) {
        .addArgument(requestType)
        .addArgument(this::getLoggableId)
        .log();
-    reputation.recordUselessResponse(System.currentTimeMillis(), this).ifPresent(this::disconnect);
+    reputation.recordUselessResponse(System.currentTimeMillis(), this);


pinges · 2024-06-05T06:23:06Z

...n/java/org/hyperledger/besu/ethereum/eth/manager/task/AbstractRetryingSwitchingPeerTask.java

@@ -132,6 +132,7 @@ private Stream<EthPeer> remainingPeersToTry() {
  }

  private void refreshPeers() {
+    // TODO this duplicates EthPeers.disconnectWorst


Looking at line 141, I think we could just not filter on !is.disconnected?

Signed-off-by: Sally MacFarlane <[email protected]>

…seless-disconnects

Signed-off-by: Sally MacFarlane <[email protected]>

…nnects

Signed-off-by: Sally MacFarlane <[email protected]>

macfarla · 2024-06-27T01:44:26Z

around 1h for all mainnet nodes to get to 100% peers

macfarla · 2024-06-27T01:45:14Z

syncing progress

macfarla · 2024-08-19T23:39:33Z

@pinges I have made the changes you requested, can you review

macfarla · 2024-09-15T21:50:38Z

going to close this and reprise the worthy changes into separate PRs

macfarla added 11 commits April 18, 2024 17:10

peers don't do their own disconnects

4ced6a0

Signed-off-by: Sally MacFarlane <[email protected]>

only disconnect if at capacity; use peer reputation for peer comparator

2d4badf

Signed-off-by: Sally MacFarlane <[email protected]>

delay the decision about whether to connect if we have full peers, un…

632a3e0

…til we actually compare Signed-off-by: Sally MacFarlane <[email protected]>

delay the tru decision

3dccfaf

Signed-off-by: Sally MacFarlane <[email protected]>

delay the tru decision

bb695b5

Signed-off-by: Sally MacFarlane <[email protected]>

remove scheduled task

b275d52

Signed-off-by: Sally MacFarlane <[email protected]>

use simple comparator for ethPeers in tests

de4db90

Signed-off-by: Sally MacFarlane <[email protected]>

removed unused method

d9e840f

Signed-off-by: Sally MacFarlane <[email protected]>

fix some tests

ad927c3

Signed-off-by: Sally MacFarlane <[email protected]>

formatting

f808a72

Signed-off-by: Sally MacFarlane <[email protected]>

changelog

f623e72

Signed-off-by: Sally MacFarlane <[email protected]>

macfarla added 2 commits April 19, 2024 17:40

fixed block propagation tests

6afebd3

Signed-off-by: Sally MacFarlane <[email protected]>

merge

40405a3

Signed-off-by: Sally MacFarlane <[email protected]>

macfarla mentioned this pull request Apr 22, 2024

Analyse Inbound Disconnect Reasons Per Client #6945

Closed

macfarla added 2 commits April 22, 2024 18:12

allow some trailing peers

ab1809c

Signed-off-by: Sally MacFarlane <[email protected]>

Merge branch 'main' of github.com:hyperledger/besu into useless-disco…

60aa98d

…nnects

macfarla added the peering label Apr 24, 2024

macfarla added 3 commits May 15, 2024 12:32

merge

e02c1ab

Signed-off-by: Sally MacFarlane <[email protected]>

simplify the logic around enforcing connection limits

5fa7599

Signed-off-by: Sally MacFarlane <[email protected]>

Merge branch 'main' of github.com:hyperledger/besu into useless-disco…

9857dce

…nnects

Merge branch 'main' into useless-disconnects

7d29573

Merge branch 'main' into useless-disconnects

a2d8181

macfarla requested a review from pinges May 29, 2024 02:27

macfarla added 2 commits May 29, 2024 14:39

Merge branch 'main' into useless-disconnects

d40cfaf

Merge branch 'main' into useless-disconnects

2871bea

pinges requested changes Jun 5, 2024

View reviewed changes

macfarla added 9 commits June 12, 2024 15:12

merge

0f82c92

Signed-off-by: Sally MacFarlane <[email protected]>

review feedback

3420cc3

Signed-off-by: Sally MacFarlane <[email protected]>

Merge branch 'useless-disconnects' of github.com:macfarla/besu into u…

fdc4099

…seless-disconnects

merge

428c5d6

Signed-off-by: Sally MacFarlane <[email protected]>

fixed method rename lost in merge

cd6a448

Signed-off-by: Sally MacFarlane <[email protected]>

fixed method rename lost in merge

be1b2d7

Signed-off-by: Sally MacFarlane <[email protected]>

formatting

7859906

Signed-off-by: Sally MacFarlane <[email protected]>

Merge branch 'main' of github.com:hyperledger/besu into useless-disco…

1dc6f15

…nnects

merge

6c66577

Signed-off-by: Sally MacFarlane <[email protected]>

macfarla mentioned this pull request Jun 25, 2024

confirm pivot block - removed disconnect based on chain height estimate #6889

Closed

8 tasks

macfarla closed this Sep 15, 2024

This was referenced Sep 15, 2024

use peer reputation to compare current peers #7616

Closed

Peering - compare incoming connection #7617

Closed

Fast sync downloader - allow max 5 trailing peers #7621

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Peering - disconnects refactor #6968

Peering - disconnects refactor #6968

macfarla commented Apr 19, 2024 •

edited

Loading

macfarla commented Apr 19, 2024

macfarla commented Apr 19, 2024

Beanow commented May 5, 2024

macfarla commented May 22, 2024

macfarla commented May 22, 2024

pinges Jun 5, 2024

pinges Jun 5, 2024

pinges Jun 5, 2024

macfarla Jul 16, 2024

macfarla commented Jun 27, 2024

macfarla commented Jun 27, 2024

macfarla commented Aug 19, 2024

macfarla commented Sep 15, 2024

Peering - disconnects refactor #6968

Peering - disconnects refactor #6968

Conversation

macfarla commented Apr 19, 2024 • edited Loading

PR description

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

Locally, you can run these tests to catch failures early:

macfarla commented Apr 19, 2024

macfarla commented Apr 19, 2024

Beanow commented May 5, 2024

macfarla commented May 22, 2024

macfarla commented May 22, 2024

pinges Jun 5, 2024

Choose a reason for hiding this comment

pinges Jun 5, 2024

Choose a reason for hiding this comment

pinges Jun 5, 2024

Choose a reason for hiding this comment

macfarla Jul 16, 2024

Choose a reason for hiding this comment

macfarla commented Jun 27, 2024

macfarla commented Jun 27, 2024

macfarla commented Aug 19, 2024

macfarla commented Sep 15, 2024

macfarla commented Apr 19, 2024 •

edited

Loading