[NC-875] Implement iterative peer search #268

smatthewenglish · 2018-11-15T21:22:54Z

Recursive Lookup
A 'lookup' locates the k closest nodes to a node ID.

The lookup initiator starts by picking α closest nodes to the target it knows of. The initiator then sends concurrent FindNode packets to those nodes. α is a system-wide concurrency parameter, such as 3. In the recursive step, the initiator resends FindNode to nodes it has learned about from previous queries. Of the k nodes the initiator has heard of closest to the target, it picks α that it has not yet queried and resends FindNode to them. Nodes that fail to respond quickly are removed from consideration until and unless they do respond.

If a round of FindNode queries fails to return a node any closer than the closest already seen, the initiator resends the find node to all of the k closest nodes it has not already queried. The lookup terminates when the initiator has queried and gotten responses from the k closest nodes it has seen. [1]

[1] https://github.com/ethereum/devp2p/blob/master/discv4.md

CLAassistant · 2018-11-15T21:23:01Z

All committers have signed the CLA.

ajsutton · 2018-11-27T11:13:47Z

...in/java/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/RecursivePeerRefreshState.java

+  private final BondingAgent bondingAgent;
+  private final NeighborFinder neighborFinder;
+  private final HashMap<Peer, Integer> anteMap;
+  private final SortedSet<Map.Entry<Peer, Integer>> distanceSortedPeers;


We shouldn't be using Map.Entry here. It needs to be copied to a custom type. Partly because Map.Entry is just too generic a type so becomes unclear very quickly and partly because Map implementations are allowed to reuse the Map.Entry instance when iterating so it may actually be changed underneath us.

...in/java/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/RecursivePeerRefreshState.java

ajsutton · 2018-11-27T11:20:30Z

...in/java/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/RecursivePeerRefreshState.java

+    BytesValue peerId;
+
+    OutstandingRequest(final BytesValue peerId) {
+      this.creation = Instant.now();


We shouldn't use Instant.now() directly because it makes it too difficult to test the time handling. Instead we should inject a java.time.Clock instance - then in production we can use Clock.systemUTC() and in tests we can use Clock.fixed, a mock or any other implementation we want so we can control time.

I added some mechanism for testing the expiration of the outstanding requests, but maybe it's not rigorous enough. I'm happy to add this in, I just wanted to get some feedback on how it might work with the way that the expiration is being tested atm, since this comment was prior to that being implemented.

We definitely need to abstract the clock - we don't want to have to sleep for 60 seconds in the test as it slows down the build too much. Having the scheduling of the periodic job be external to this class and an abstracted clock would let us just advance the clock and trigger another check without any sleeping.

ajsutton · 2018-11-27T11:37:45Z

...in/java/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/RecursivePeerRefreshState.java

+  }
+
+  private List<Peer> determineFindNodeCandidates() {
+    distanceSortedPeers.addAll(anteMap.entrySet());


Does this need to clear distanceSortedPeers first? If we just want to have each peer in anteMap in a sorted list, it would be better to just iterate through anteMap to create an ArrayList then use Collections.sort to sort it in place rather than going via a sorted set.

Or, third option since we want a fixed number of items, iterate through anteMap adding each item to a TreeMap (need to create a custom class as we shouldn't be reusing Map.Entry instances). After adding each item check if the size of our set is > CONCURRENT_REQUEST_LIMITif so usepollLastto remove the last item. pollLastis fromNavigableMap, a sub-type of SortedSetwhichTreeSet` implements.

ajsutton · 2018-11-27T11:38:23Z

...in/java/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/RecursivePeerRefreshState.java

+    }
+
+    boolean isExpired() {
+      Duration duration = Duration.between(creation, Instant.now());


Avoid Instant.now here too.

ajsutton · 2018-11-30T02:16:42Z

...main/java/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/PeerDiscoveryController.java

                    if (peerBlacklist.contains(neighbor) || peerTable.get(neighbor).isPresent()) {
                      continue;
                    }
-                    bond(neighbor, false);
+                    bond((DiscoveryPeer) neighbor, false);


If we're going to have to cast things back to DiscoveryPeer we should just be honest and go back to using the DiscoveryPeer type all the way through. It should only be declared Peer if it really can be any implementation of Peer.

ajsutton · 2018-11-30T02:19:11Z

...in/java/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/RecursivePeerRefreshState.java

+        TIMEOUT_TASK_DELAY,
+        v -> {
+          List<OutstandingRequest> outstandingRequestListCopy =
+              new ArrayList<>(outstandingRequestList);


Why are we making a copy here? It doesn't provide any thread safety because making the copy requires iterating the list.

For thread safety through this class, I suspect we'll need to use synchronized blocks so that only one thread can be anywhere in this class at a time.

ajsutton · 2018-11-30T02:29:52Z

...in/java/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/RecursivePeerRefreshState.java

+    this.outstandingRequestList = new ArrayList<>();
+    this.contactedInCurrentExecution = new ArrayList<>();
+
+    commenceTimeoutTask();


We probably don't want to kick off the timeout task in the constructor - otherwise it will kick off in every test and be out of the test's control.

Also would a new instance of this class be created each time we run discovery? If so, something needs to clean up the timeout task.

I would probably separate responsibility for running the timer and move it out of this class. So just expose the expiry checking logic as a public method of this class that PeerDiscoveryController would call periodically. That makes testing the actual logic much easier and avoids a lot of the lifecycle work.

If the plan is to only ever have one instance of this class, then a static init method could be used to create the class and setup the periodic polling at once but tests would still be able to create the class via the constructor and controlling the timeout check directly.

ajsutton · 2018-11-30T02:31:06Z

...in/java/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/RecursivePeerRefreshState.java

+              outstandingRequestList.remove(outstandingRequest);
+            }
+          }
+        });


nit: Most of the variables in here should be final. I suggest enabling IntelliJ's inspections "Local variable or parameter can be final" and "Field may be final".

ajsutton · 2018-11-30T02:33:57Z

...in/java/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/RecursivePeerRefreshState.java

+    }
+  }
+
+  void digestNeighboursPacket(final NeighborsPacketData neighboursPacket, final Peer peer) {


I'd suggest moving this and kickstratBootstrapPeers up to right under the constructor since they're the key entry point to the class. It's a minor thing but makes it easier to see how the class is used.

I'd probably also call this onNeighboursPacketReceived to be a bit more idiomatic.

ajsutton · 2018-11-30T02:37:29Z

...in/java/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/RecursivePeerRefreshState.java

+    return anteList.subList(0, threshold).stream().map(PeerDistance::getPeer).collect(toList());
+  }
+
+  private void performIteration() {


Is this really queryNearestNodes?

ajsutton · 2018-11-30T02:38:33Z

...in/java/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/RecursivePeerRefreshState.java

+
+  private void performIteration() {
+    if (outstandingRequestList.isEmpty()) {
+      List<Peer> queryCandidates = determineFindNodeCandidates(CONCURRENT_REQUEST_LIMIT);


If we query 3 other peers for each response we receive, the number of concurrent requests will be 3 in the first round, 9 in the second and increase exponentially. Should we be querying the 3 closest from each response or the 3 closest from all the responses from that round?

The current behaviour may be ok - having a lot of outstanding requests isn't that big a deal since we don't have to do anything while we wait for them but if do increase exponentially we need a fairly low limit on the number of rounds we do so we don't flood the network.

As it's currently configured, I'm pretty sure the algorithm will query the 3 closest peers on a per round basis, so there wouldn't be more than 3 OutstandingRequests at any given time.

If we take (A) as a bootstrap node, and (A) sends us back a packet that tells us about (B), (C), (D) & (E), we'll consider these peers and deposit them all into the anteList.

As we had one OutstandingRequest, to (A), which we've received a response from, our OutstandingRequestList will be empty, so we'll initiate another "round", i.e. set of FindNode requests.

Let's say that our closest peers are (B), (C) and (D), we'll issue requests and (assuming none of them expire) we'll received responses from each one of them, each of which with n peers, all of whom will be deposited into the anteList.

When we've gotten responses from all of them, that is when our OutstandingRequestList will again be empty, which is what will precipitate a fresh round of 3 requests.

That was my intention, and that's how I think it works, but maybe there's something I've overlooked?

ajsutton · 2018-11-30T02:43:58Z

...in/java/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/RecursivePeerRefreshState.java

+    BytesValue peerId;
+
+    OutstandingRequest(final BytesValue peerId) {
+      this.creation = Instant.now();


We definitely need to abstract the clock - we don't want to have to sleep for 60 seconds in the test as it slows down the build too much. Having the scheduling of the periodic job be external to this class and an abstracted clock would let us just advance the clock and trigger another check without any sleeping.

ajsutton · 2018-11-30T02:45:16Z

...ava/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/RecursivePeerRefreshStateTest.java

+    verify(neighborFinder).issueFindNodeRequest(peer_012);
+    verify(neighborFinder).issueFindNodeRequest(peer_013);
+
+    TimeUnit.SECONDS.sleep(60);


We definitely don't want this sleep. :)

ajsutton · 2018-11-30T02:46:32Z

...ava/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/RecursivePeerRefreshStateTest.java

+  @After
+  public void cleaUp() {
+    vertx.close();
+  }


I'd move this up to under init so that you can see more clearly that we start a Vertx instance and close it rather than having the two so separated. Although with the timer becoming external you'll get rid of the need for Vertx in this test entirely which is another nice benefit.

ajsutton

Fix up the missing finals and then LGTM.

ajsutton · 2018-12-03T23:42:20Z

...in/java/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/RecursivePeerRefreshState.java

+  }
+
+  void kickstartBootstrapPeers(final List<Peer> bootstrapPeers) {
+    for (Peer bootstrapPeer : bootstrapPeers) {


nit: final - right through this class. :) I suggest enabling the IntelliJ warning for non-final fields and then you can tell it to fix all.

...in/java/tech/pegasys/pantheon/ethereum/p2p/discovery/internal/RecursivePeerRefreshState.java

smatthewenglish force-pushed the search-peers-iteratively-0101 branch 11 times, most recently from 34baf46 to db6ac93 Compare November 15, 2018 23:55

ajsutton added the enhancement New feature or request label Nov 19, 2018

smatthewenglish force-pushed the search-peers-iteratively-0101 branch 2 times, most recently from 5cecfbc to d891a4d Compare November 22, 2018 23:34

smatthewenglish force-pushed the search-peers-iteratively-0101 branch from 38a1f53 to 39323ac Compare November 27, 2018 04:50

ajsutton reviewed Nov 27, 2018

View reviewed changes

smatthewenglish force-pushed the search-peers-iteratively-0101 branch 8 times, most recently from cc6bd63 to 3e0c60c Compare November 30, 2018 02:07

ajsutton reviewed Nov 30, 2018

View reviewed changes

smatthewenglish force-pushed the search-peers-iteratively-0101 branch 3 times, most recently from b4f04fe to a5f4dcf Compare November 30, 2018 19:44

ajsutton approved these changes Dec 3, 2018

View reviewed changes

smatthewenglish added 20 commits December 7, 2018 12:53

basic updatdes

f15bd56

spotless inter alia

e1d9121

building successfully

8e7158b

funtioning

97ec3f5

minor update to docs

42a8230

rebased in previous commit, attempt to pass build server

bd732b0

eliminate distanceSortedPeers

e3feb67

spotless update

0076efd

revamp outstanding requests

73288e2

implementation of timeoutTask and corresponding test

2a368b4

use setPeriodic

c33a902

testing with DiscoveryPeer

c07834d

remove commenceTimeoutTask from constructor

418781d

isolate clock functionality out of recursive state

dddcdb2

update to docs

fba9f01

validate size of outstandingrequestlist

ccac09a

improve sanity check test

ef6cb49

remove extraneous copy

8da1a25

add accurate interface parameters

cce4b7a

finalize

3acaa50

smatthewenglish force-pushed the search-peers-iteratively-0101 branch from a5f4dcf to 3acaa50 Compare December 7, 2018 18:16

smatthewenglish merged commit df88b69 into PegaSysEng:master Dec 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NC-875] Implement iterative peer search #268

[NC-875] Implement iterative peer search #268

smatthewenglish commented Nov 15, 2018

CLAassistant commented Nov 15, 2018 •

edited

Loading

ajsutton Nov 27, 2018

ajsutton Nov 27, 2018

smatthewenglish Nov 30, 2018

ajsutton Nov 30, 2018

ajsutton Nov 27, 2018

ajsutton Nov 27, 2018

ajsutton Nov 30, 2018

ajsutton Nov 30, 2018

ajsutton Nov 30, 2018

ajsutton Nov 30, 2018

ajsutton Nov 30, 2018

ajsutton Nov 30, 2018

ajsutton Nov 30, 2018

smatthewenglish Nov 30, 2018 •

edited

Loading

ajsutton Nov 30, 2018

ajsutton Nov 30, 2018

ajsutton Nov 30, 2018

ajsutton left a comment

ajsutton Dec 3, 2018

[NC-875] Implement iterative peer search #268

[NC-875] Implement iterative peer search #268

Conversation

smatthewenglish commented Nov 15, 2018

CLAassistant commented Nov 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smatthewenglish Nov 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajsutton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CLAassistant commented Nov 15, 2018 •

edited

Loading

smatthewenglish Nov 30, 2018 •

edited

Loading