feat(kad): implement automatic client mode #3877

thomaseizinger · 2023-05-04T08:46:35Z

Description

Currently, the kademlia behaviour can only learn that the remote node supports kademlia on a particular connection if we successfully negotiate a stream to them.

Using the newly introduced abstractions from #3651, we don't have to attempt to establish a stream to the remote to learn whether they support kademlia on a connection but we can directly learn it from the ConnectionEvent::RemoteProtocolsChange event. This happens directly once a connection is established which should overall benefit the DHT.

Clients do not advertise the kademlia protocol and thus we will immediately learn that a given connection is not suitable for kadmelia requests. We may receive inbound messages from it but this does not affect the routing table.

Resolves: #2032.

Notes & open questions

~~Draft because it is~~ stacked on top of:

Is Mode::Server a good default? Should it be Mode::Client to be "safer"?
Should we even have an overall mode? With how we are now thinking of supported protocols, we could be a server on one connection and a client on another.

Depends-On: #3954.

Change checklist

I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
A changelog entry has been made in the appropriate crates

Co-authored-by: Max Inden <[email protected]>

…0-explore

thomaseizinger · 2023-05-30T13:27:55Z

@mxinden All comments addressed.

thomaseizinger · 2023-05-30T13:32:52Z

@mxinden Feel free to merge if this is ready in your eyes.

mxinden

Ready to go apart from the smaller suggestion below.

swarm/src/behaviour/external_addresses.rs

mxinden · 2023-05-31T02:04:34Z

@rkuhn in https://github.com/Actyx/:

Do you use libp2p-kad?
Do you use it with private IPv4 addresses?
Do you call Swarm::add_external_address?

Feel free to ignore if !one || !two || three.

mxinden · 2023-06-01T02:29:02Z

Commit updating kademlia-exporter project: mxinden/kademlia-exporter@b958a02

mxinden · 2023-06-03T13:12:33Z

Overall results look promising @thomaseizinger:

https://kademlia-exporter.max-inden.de/d/Pfr0Fj6Mk/rust-libp2p?orgId=1&refresh=30s&from=now-7d&to=now

thomaseizinger · 2023-06-03T14:22:00Z

Overall results look promising @thomaseizinger:

https://kademlia-exporter.max-inden.de/d/Pfr0Fj6Mk/rust-libp2p?orgId=1&refresh=30s&from=now-7d&to=now

Cool! So essentially, the improvement we are seeing here by just deploying it to one node is that we are learning about the other node supporting kademlia via identify and add it to our routing table immediately instead of waiting for an inbound/outbound request.

Consequently, we have a better local routing table and can answer queries more quickly.

Is that analysis correct?

dariusc93 · 2023-06-11T11:40:43Z

Was Kademlia::set_mode removed? From what it looks like, it looks like the mode is determined based on if the node has an external address. If so, how would one set the mode to advertise the kad protocol without adding an external address, or say if a node only wants to stay as a client and query the DHT, how would it do that while also having an external address?

thomaseizinger · 2023-06-11T11:58:45Z

Was Kademlia::set_mode removed?

Because we didn't want to place the burden on configuring this on the user.

From what it looks like, it looks like the mode is determined based on if the node has an external address.

Yes, that is correct.

If so, how would one set the mode to advertise the kad protocol without adding an external address

If you don't have an external address, how would your node be reachable and thus of use to the kademlia network?

or say if a node only wants to stay as a client and query the DHT, how would it do that while also having an external address?

Why would you want to do that?

dariusc93 · 2023-06-11T12:29:47Z

If you don't have an external address, how would your node be reachable and thus of use to the kademlia network?

More under the assumption that the address that is listened on would be its external such as a public ip as some nodes may not have autonat in place or want to add/remove external addresses manually on each change at runtime, although I did see the commit removing the scoring system so that make sense on that.

Why would you want to do that?

Some nodes may only want to receive information from DHT without being queried themselves. Eg, a node may want to query the DHT for records without the node being a server itself and receiving request while allowing it to change to a server later on in its runtime, while the same can also be said on if the node wish to switch from being a server to a client later on if it wishes not to handle anymore queries. Under the implementation here, the node dont have a choice but to be a client until its reachable with no option to set it back later on unless it remove its external address (assuming it does switch back to client), but then that may trigger other behaviours to act on those events too if they rely on an external address event.

thomaseizinger · 2023-06-11T13:28:41Z

Why would you want to do that?

Some nodes may only want to receive information from DHT without being queried themselves. Eg, a node may want to query the DHT for records without the node being a server itself and receiving request while allowing it to change to a server later on in its runtime, while the same can also be said on if the node wish to switch from being a server to a client later on if it wishes not to handle anymore queries. Under the implementation here, the node dont have a choice but to be a client until its reachable with no option to set it back later on unless it remove its external address (assuming it does switch back to client), but then that may trigger other behaviours to act on those events too if they rely on an external address event.

All this sounds very constructed to me. Do you have a concrete usecase for which the current implementation does not work?

I understand that we are making some assumptions here with how things are now implemented. I don't mind if those get invalidated and we have to adapt the implementation. But the vision I want to work towards is that things work out of the box and people don't have to become experts at the library to make use of its (advanced) features.

dariusc93 · 2023-06-11T20:07:32Z

All this sounds very constructed to me. Do you have a concrete usecase for which the current implementation does not work?

I do maintain a fork of rust-ipfs and one case would be that nodes may wish to only find what peer is providing what over dht, or may wish to fetch records (eg resolving ipns records). Such nodes may not wish to operate as a dht server at that moment especially if the intention is to locate contents or records provided by peers, and may later want to switch over and operate as a server.

Another case would be in embedded/mobile environments. Under these environments, they may be reachable by other nodes directly (ip being public or upnp is enabled), or via relay, however the nodes may not have the resources to operate as a dht server. This could be memory related, network restrictions, etc. Client mode would be suitable under those environments while under the current implementation the nodes would have no choice but to operate as a server upon having an external address or dont have kad enabled at all.

I understand that we are making some assumptions here with how things are now implemented. I don't mind if those get invalidated and we have to adapt the implementation. But the vision I want to work towards is that things work out of the box and people don't have to become experts at the library to make use of its (advanced) features.

I dont believe having the option to set the mode to client or server would be overly complex internally. Maybe have a flag in the configuration to rely on external addresses to determine its mode or to be determined manually by setting the mode to client or server (with the default being client) and send an event to the handlers when the mode is changed at runtime, either as it does in Kademlia::on_swarm_event or in Kademlia::set_mode.

mxinden · 2023-06-22T09:05:38Z

Overall results look promising @thomaseizinger:

https://kademlia-exporter.max-inden.de/d/Pfr0Fj6Mk/rust-libp2p?orgId=1&refresh=30s&from=now-7d&to=now

Cool! So essentially, the improvement we are seeing here by just deploying it to one node is that we are learning about the other node supporting kademlia via identify and add it to our routing table immediately instead of waiting for an inbound/outbound request.

Consequently, we have a better local routing table and can answer queries more quickly.

Is that analysis correct?

I don't think that is the reason. Instead I think that the routing table is significantly more healthy now. I.e. previously we would add go-libp2p based Kademlia client nodes to our routing table once they did an inbound request to us. Given that they are behind a firewall or NAT (thus Kademlia clients), reaching out to them for requests would fail and thus delay outbound query durations. Now we no longer add them to our routing table and thus, for outbound requests, only reach out to the Kademlia server nodes, thus significantly reducing outbound request latency.

thomaseizinger · 2023-06-22T11:08:01Z

Cool! So essentially, the improvement we are seeing here by just deploying it to one node is that we are learning about the other node supporting kademlia via identify and add it to our routing table immediately instead of waiting for an inbound/outbound request.

Consequently, we have a better local routing table and can answer queries more quickly.

Is that analysis correct?

I don't think that is the reason. Instead I think that the routing table is significantly more healthy now. I.e. previously we would add go-libp2p based Kademlia client nodes to our routing table once they did an inbound request to us.

I don't think that is true, we never add inbound connections to our routing table.

mxinden · 2023-06-22T12:52:57Z

You are right. I have to dig deeper. Your suggestion makes the most sense.

we are learning about the other node supporting kademlia via identify and add it to our routing table immediately instead of waiting for an inbound/outbound request.

folex · 2023-07-07T14:30:56Z

Do I understand correctly, that a remote peer can basically force my peer to switch from Client mode to Server mode by triggering NetworkBehaviourAction::ReportObservedAddr via Identify? Is that a desired behaviour?

On that note, what do you think about allowing to configure mode manually via configuration parameters? For example, as a part of KademliaConfig.

I think that makes a lot of sense, because if I know my peers are high uptime and on powerful intances, I want to be sure that they operate in Server mode, controlling it directly, without having to think about calling add_external_address, depending on the Identify and other non-direct side effects.

Controlling non-direct side effects in a big codebases is hard.

folex · 2023-07-07T14:37:00Z

To clarify. I have external_addresses as a configuration parameter in my backend's configuration file.

So administrator of the backend is free to set external_address to be empty, I have no control over that.

And that's totally OK, right? If a peer doesn't know its own external address – they can still accept connections, other peers still can see their socket addresses and try to use them.

But with this change, peer now behaves entirely different based on the user configuration. Until Identify comes, which happens unpredictable.

I want to have more control over how Kademlia behaves, at least I think I do :)

folex · 2023-07-07T14:40:57Z

protocols/kad/src/handler.rs

@@ -368,6 +365,9 @@ pub enum KademliaHandlerIn {
    /// for the query on the remote.
    Reset(KademliaRequestId),

+    /// Change the connection to the specified mode.


What's the effect of that? Is there a network message sent all my connections when I learn about external_addresses?

I think it makes sense to elaborate on that in the comment. Describe effects of other peers receiving this message.

Also, isn't that a lot of overhead? What if my addresses go on and off often? Not sure why that might happen, but now how do I make sure that doesn't happen?

Nobody receives any messages. Client vs server mode essentially just means, "do we accept inbound kademlia streams" which translates to rust-libp2p as "do we offer kademlia via ConnectionHandler::listen_protocols".

folex · 2023-07-07T14:43:43Z

protocols/kad/src/handler.rs

-                .map_upgrade(Either::Left)
-        } else {
-            SubstreamProtocol::new(Either::Right(upgrade::DeniedUpgrade), ())
+        match self.mode {


Can you please add some comments on what happens here?

That sounds like a rather impactful thing. Now Peer can allow and disallow accepting connections on its own, and I as a developer have no control over that.

I'll try to outline my understanding of the effects of this change. I'm certainly can be wrong here, so forgive me if I am :)

So, previously if I had allow_listening = true, my peer would accept incoming connections even before Identify happened.

Now Kademlia handshake behaviour depends on whether Identify already happened or not. So instead of 1 handshake on Kademlia protocol and 1 handshake on Identify, we now have (depending on handshakes order) several scenarios:

PeerA connects to PeerB. PeerA is in Server mode, PeerB has not yet learned her external addresses, so it is in Client mode.

Scenario 1: failed connection

PeerA tries to connect to PeerB

PeerA attempts Kademlia handhsake with PeerB, PeerB rejects because Mode::Client => upgrade::DeniedUpgrade

PeerA performs a succesful handshake via Identify with PeerB

PeerB learns its external address, switches to Server mode

PeerB sends ReconfigureMode to all its kademlia connections, but PeerA is not among them

Kademlia handhsake between A and B has failed

Connection failed

If PeerA tries again, connection will succeed, even though nothing has changed in the setup. Technically, nothing prevents PeerB of accepting PeerA's connection the first time. But now PeerA has to retry if she's sure she brought external_addresses to PeerB.

Scenario 2: succesful connection

PeerA tries to connect to PeerB

PeerA performs a succesful handshake via Identify with PeerB

PeerB switches to Server mode

PeerA attempts Kademlia handshake with PeerB

PeerB accepts

Connection succesful

If I understand correctly, there's no strict order in which handshakes happen, right? So it is basically a race condition.

PeerA attempts Kademlia handhsake with PeerB, PeerB rejects because Mode::Client => upgrade::DeniedUpgrade

Can you elaborate on how that handshake is triggered? At least rust-libp2p does not attempt to speak the kademlia protocol to a peer that did not advertise it first. This is the whole point of this PR: By filtering peers based on whether the advertise kademlia, we get healthier routing tables and as a result, kademlia queries finish a lot quicker. See #3877 (comment).

dariusc93 · 2023-07-07T14:49:13Z

To clarify. I have external_addresses as a configuration parameter in my backend's configuration file.

So administrator of the backend is free to set external_address to be empty, I have no control over that.

And that's totally OK, right? If a peer doesn't know its own external address – they can still accept connections, other peers still can see their socket addresses and try to use them.

But with this change, peer now behaves entirely different based on the user configuration. Until Identify comes, which happens unpredictable.

I want to have more control over how Kademlia behaves, at least I think I do :)

This was resolved in #4132 to allow setting the mode

thomaseizinger · 2023-07-09T08:42:22Z

Do I understand correctly, that a remote peer can basically force my peer to switch from Client mode to Server mode by triggering NetworkBehaviourAction::ReportObservedAddr via Identify?

That is not the way to think about it. Whether or not your node operates in server or client mode solely depends on whether you have an external address. Using other nodes to learn you external address is one way to do that although identify by itself does not result in a verified external address, only a candidate. It is your own node's responsibility to turn such candidates in the verified external addresses and only that will trigger a switch from client to server mode.

thomaseizinger · 2023-07-09T09:06:18Z

I think that makes a lot of sense, because if I know my peers are high uptime and on powerful intances, I want to be sure that they operate in Server mode, controlling it directly, without having to think about calling add_external_address, depending on the Identify and other non-direct side effects.

Controlling non-direct side effects in a big codebases is hard.

It is a trade-off, right? I appreciate you sharing your view! We have received quite a bit of feedback on this actually but I do still think that this is the correct design. In the past, rust-libp2p was designed with many explicit config switches. This required (and still requires) people to become experts at the library before they can use it effectively.

To clarify. I have external_addresses as a configuration parameter in my backend's configuration file.

So administrator of the backend is free to set external_address to be empty, I have no control over that.

Perhaps your configuration system is too powerful? Have you considered to changing your config file to mean additional external addresses? With protocols like identify and AutoNAT, discovering and verifying an external address should work flawlessly.

Let's assume we are a talking about an always-online bootnode and other nodes are hard-wired to connect to that one on startup. We definitely want that node to speak kademlia, right?

Without the above design, we'd essentially have to rely on other nodes "probing" for kademlia. They would have to just try whether the node they connect to speaks kademlia. Given that the kademlia protocol itself doesn't really know where a connection came from, we'd have to probe every connection for kademlia support.

With this design, we can instead make a very good educated guesses that a node supports kademlia by relying on its (self-reported) supported protocols. A bootnode that is configured with a static, external IP address would advertise kademlia from the moment it boots and any new peer that connects would immediately put it in its routing table once identify reported the supported protocols (which also happens immediately one every new connection).

I still believe this is a much cleaner design because it is deterministic in behaviour which makes it easier to understand despite the many moving parts compared to random of protocols.

That being said, I am open to alternative designs if you have any ideas!

Previously, a `NetworkBehaviour` could report an `AddressScore` for an external address. This score was a `u32` and addresses would be ranked amongst those. In reality, an address is either confirmed to be publicly reachable (via a protocol such as AutoNAT) or merely represents a candidate that might be an external address. In a way, addresses are guilty (private) until proven innocent (publicly reachable). When a `NetworkBehaviour` reports an address candidate, we perform address translation on it to potentially correct for ephemeral ports of TCP. These candidates are then injected back into the `NetworkBehaviour`. Protocols such as AutoNAT can use these addresses as a source for probing their NAT status. Once confirmed, they can emit a `ToSwarm::ExternalAddrConfirmed` event which again will be passed to all `NetworkBehaviour`s. This simplified approach will allow us implement Kademlia's client-mode (libp2p#2032) without additional configuration options: As soon as an address is reported as publicly reachable, we can activate server-mode for that connection. Related: libp2p#3877. Related: libp2p#3953. Related: libp2p#2032. Related: libp2p/go-libp2p#2229. Co-authored-by: Max Inden <[email protected]> Pull-Request: libp2p#3954.

thomaseizinger and others added 30 commits March 24, 2023 15:48

Report changes in supported protocols back to ConnectionHandler

f2801fb

Allow reporting of supported protocols by remote

9a28cd2

Consume supported protocols in identify

e536dea

Report a remote's protocols to other handlers

7699a1e

Add test for kademlia client mode

450fc1e

Implement kademlia client-mode

a972b9c

Add tests for two servers connecting

f9cf33f

Report additions and removals of protocols instead

deb30f3

Report listen protocols on startup to connection

3c1bf5f

Introduce SupportedProtocols type

9e70729

Deduplicate code and propagate supported protocols only once

f1328c5

Extend docs

a4fbcda

Merge branch 'master' into 2680-explore

3e33108

Fix compile errors

572ed90

Report changes to listen/external address set

586394b

Simplify identify protocol

b329a09

Check for changes in inbound protocols on every poll

a63af89

Fix clippy lints

72510dd

Report changes in SupportedProtocols

8ffafdd

Only report changes to handler if there actually was a change

ea1a087

Do not implicitly dial peers upon push

27bc507

Merge branch 'master' into 2680-explore

58039d9

Remove unused import

74dd94a

Fix compile error

628b519

Remove dbg!

c7b5011

Update swarm/src/handler.rs

a02ca55

Co-authored-by: Max Inden <[email protected]>

Merge branch '2680-explore' of github.com:libp2p/rust-libp2p into 268…

c347c8a

…0-explore

Combine match arms where possible

f3e5e71

Add comment explaining static hashset

b7fa7ef

Add docs

2bd9d73

thomaseizinger requested a review from mxinden May 30, 2023 13:24

mxinden approved these changes May 31, 2023

View reviewed changes

swarm/src/behaviour/external_addresses.rs Outdated Show resolved Hide resolved

thomaseizinger added 2 commits May 31, 2023 07:54

Merge branch 'master' into feat/kademlia-automatic-client-mode

9fa7590

Add log for dropping previously confirmed external address

cdef2d5

thomaseizinger added the send-it label May 31, 2023

mergify bot merged commit 92c8cc4 into master May 31, 2023

mergify bot deleted the feat/kademlia-automatic-client-mode branch May 31, 2023 09:02

dariusc93 mentioned this pull request Jun 15, 2023

kad: Manually setting client or server mode #4074

Closed

folex reviewed Jul 7, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kad): implement automatic client mode #3877

feat(kad): implement automatic client mode #3877

thomaseizinger commented May 4, 2023 •

edited

Loading

thomaseizinger commented May 30, 2023

thomaseizinger commented May 30, 2023

mxinden left a comment

mxinden commented May 31, 2023

mxinden commented Jun 1, 2023

mxinden commented Jun 3, 2023

thomaseizinger commented Jun 3, 2023

dariusc93 commented Jun 11, 2023

thomaseizinger commented Jun 11, 2023

dariusc93 commented Jun 11, 2023

thomaseizinger commented Jun 11, 2023

dariusc93 commented Jun 11, 2023 •

edited

Loading

mxinden commented Jun 22, 2023

thomaseizinger commented Jun 22, 2023

mxinden commented Jun 22, 2023

folex commented Jul 7, 2023

folex commented Jul 7, 2023 •

edited

Loading

folex Jul 7, 2023 •

edited

Loading

thomaseizinger Jul 9, 2023

folex Jul 7, 2023

folex Jul 7, 2023 •

edited

Loading

thomaseizinger Jul 9, 2023

dariusc93 commented Jul 7, 2023

thomaseizinger commented Jul 9, 2023

thomaseizinger commented Jul 9, 2023 •

edited

Loading

feat(kad): implement automatic client mode #3877

feat(kad): implement automatic client mode #3877

Conversation

thomaseizinger commented May 4, 2023 • edited Loading

Description

Notes & open questions

Change checklist

thomaseizinger commented May 30, 2023

thomaseizinger commented May 30, 2023

mxinden left a comment

Choose a reason for hiding this comment

mxinden commented May 31, 2023

mxinden commented Jun 1, 2023

mxinden commented Jun 3, 2023

thomaseizinger commented Jun 3, 2023

dariusc93 commented Jun 11, 2023

thomaseizinger commented Jun 11, 2023

dariusc93 commented Jun 11, 2023

thomaseizinger commented Jun 11, 2023

dariusc93 commented Jun 11, 2023 • edited Loading

mxinden commented Jun 22, 2023

thomaseizinger commented Jun 22, 2023

mxinden commented Jun 22, 2023

folex commented Jul 7, 2023

folex commented Jul 7, 2023 • edited Loading

folex Jul 7, 2023 • edited Loading

Choose a reason for hiding this comment

thomaseizinger Jul 9, 2023

Choose a reason for hiding this comment

folex Jul 7, 2023

Choose a reason for hiding this comment

folex Jul 7, 2023 • edited Loading

Choose a reason for hiding this comment

Scenario 1: failed connection

Scenario 2: succesful connection

thomaseizinger Jul 9, 2023

Choose a reason for hiding this comment

dariusc93 commented Jul 7, 2023

thomaseizinger commented Jul 9, 2023

thomaseizinger commented Jul 9, 2023 • edited Loading

thomaseizinger commented May 4, 2023 •

edited

Loading

dariusc93 commented Jun 11, 2023 •

edited

Loading

folex commented Jul 7, 2023 •

edited

Loading

folex Jul 7, 2023 •

edited

Loading

folex Jul 7, 2023 •

edited

Loading

thomaseizinger commented Jul 9, 2023 •

edited

Loading