Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple connections per peer #1440

Merged
merged 34 commits into from
Mar 4, 2020
Merged

Multiple connections per peer #1440

merged 34 commits into from
Mar 4, 2020

Conversation

romanb
Copy link
Contributor

@romanb romanb commented Feb 7, 2020

Motivation

This is a proposal for #912. Instead of trying to enforce a single connection per peer, which involves quite a bit of additional complexity e.g. to prioritise simultaneously opened connections and can have other undesirable consequences (e.g. paritytech/substrate#4272) besides potentially prohibiting other valid use-cases where multiple connections per peer are desired, this PR makes multiple connections per peer a feature in libp2p-core.

Overview

The concept of a "node" with an implicit 1-1 correspondence to a connection has been replaced with the "first-class" concept of a "connection". The code from src/nodes has moved (with varying degrees of modification) to src/connection and src/network. A HandledNode has become a Connection, a NodeHandler a ConnectionHandler, the CollectionStream was the basis for the new connection::Pool, and so forth.

Conceptually, a Network contains a connection::Pool which in turn internally employs the connection::Manager for handling the background connection::manager::Tasks, one per connection, as before. These are all considered implementation details. On the public API, Peers are managed as before through the Network, except now the API has changed with the shift of focus to (potentially multiple) connections per peer. The NetworkEvents have accordingly also undergone changes.

Backward Compatibility

The API of libp2p-core obviously has changed significantly in detail and semantics, though not so much in the structure (there is still the Network emitting events and Network::peer() for managing peers and almost all existing functions have been preserved, though often with changed semantics due to the nature of permitting multiple connections).

The API of libp2p-swarm should be backward compatible. There is the subtle semantic change that a single NetworkBehaviour can now be associated with many ProtocolsHandler instances for the same peer due to multiple connections, but unless an implementation makes some hard assumptions about such a 1-1 correspondence between peers and ProtocolsHandler instances, which seems unlikely, existing implementations should remain largely unaffected (EDIT: This is incorrect and needs to be addressed. See this comment). libp2p-swarm still does not by itself create multiple connections per peer (i.e. a DialPeer received from a behaviour only results in a connection attempt if the peer is currently disconnected, whereas libp2p-core now also allows additional connect()s for already connected peers), it just may occur as a result of simultaneous dialing. Lastly, inject_connected and inject_disconnected are only called when the first connection is established, respectively when the last connection is lost to a peer, to preserve the existing semantics. inject_replaced is no longer called, since connections are no longer replaced and has thus been deprecated.

Dialing Concurrency

While these changes permit multiple established connections per peer, what I have preserved is the constraint to a single pending outgoing (i.e. "dialing") connection per peer at a time. I think that is still desirable, especially because while dialing is in progress, new addresses to try can be added, e.g. as was and is still done by libp2p-swarm.

Additions

  • Instead of just an incoming_limit, there is now also a configurable outgoing limit (pending outgoing connections to any peer) and a configurable limit for the established connections per peer. These limits are enforced by the connection Pool, i.e. the incoming limit is no longer checked on every invocation of Network::poll() to determine if the listeners stream should be polled but rather at the time when they are added to the pool and whenever a pending connection is established. The defaults are no limits for any of these options. Furthermore these are currently hard limits, i.e. there is no throttling based on delays and additional queuing involved.
  • There is a NetworkInfo obtained by Network::info() containing some basic connection statistics from the connection pool.
  • There is a NetworkConfig for configuring the task executor as well as the connection pool limits.

How to Review

It seems most sensible to first look through the new / adapted modules in src/connection/** and src/connections.rs, followed by src/network/**. and src/network.rs and lastly looking at the actual diffs for libp2p-swarm. A lot of the code and modules in src/connection and src/network will look familiar as it stems from previous src/nodes modules.

Remaining Work

At this point I'm only planning to go over the documentation again,think about new tests and verify with the current polkadot/substrate (e.g. that it indeed addresses paritytech/substrate#4272), but there are no major changes on my backlog other than those that may result from feedback here or from issues encountered in further testing, hence I'm opening this now for review and general feedback.

@romanb
Copy link
Contributor Author

romanb commented Feb 10, 2020

There is the subtle semantic change that a single NetworkBehaviour can now be associated with many ProtocolsHandler instances for the same peer due to multiple connections, but unless an implementation makes some hard assumptions about such a 1-1 correspondence between peers and ProtocolsHandler instances, which seems unlikely, existing implementations should remain largely unaffected.

After taking a look at libp2p-kad again, I think this is very much incorrect. Indeed, likely any request-reply protocol requires the reply to be sent on the same substream and connection as the original request. That is what e.g. libp2p-kad uses the KademliaRequestIds for which get assigned to incoming requests and are sent back by the behaviour to the handler with the reply.

The least intrusive solution that comes to mind is to let ProtocolsHandler::inject_event (and possibly ConnectionHandler::inject_event as well) signal in the return value whether the given event was accepted / handled. The Swarm can then iterate over the connections to the peer and call inject_event until a connection (handler) accepts it, otherwise drop it.

The alternatives that come to mind involve making a NetworkBehaviour aware of ConnectionIDs, which is likely more intrusive and may be a change that goes well together with removing the ProtocolsHandler altogether later.

@tomaka
Copy link
Member

tomaka commented Feb 11, 2020

The alternatives that come to mind involve making a NetworkBehaviour aware of ConnectionIDs, which is likely more intrusive and may be a change that goes well together with removing the ProtocolsHandler altogether later.

I would go more in this direction in general.

@romanb
Copy link
Contributor Author

romanb commented Feb 12, 2020

The alternatives that come to mind involve making a NetworkBehaviour aware of ConnectionIDs, which is likely more intrusive and may be a change that goes well together with removing the ProtocolsHandler altogether later.

I would go more in this direction in general.

While this is clearly where we are heading in general, I'd really like to keep the changes and especially API changes outside of libp2p-core to a minimum in this first iteration.

Instead of trying to enforce a single connection per peer,
which involves quite a bit of additional complexity e.g.
to prioritise simultaneously opened connections and can
have other undesirable consequences [1], we now
make multiple connections per peer a feature.

The gist of these changes is as follows:

The concept of a "node" with an implicit 1-1 correspondence
to a connection has been replaced with the "first-class"
concept of a "connection". The code from `src/nodes` has moved
(with varying degrees of modification) to `src/connection`.
A `HandledNode` has become a `Connection`, a `NodeHandler` a
`ConnectionHandler`, the `CollectionStream` was the basis for
the new `connection::Pool`, and so forth.

Conceptually, a `Network` contains a `connection::Pool` which
in turn internally employs the `connection::Manager` for
handling the background `connection::manager::Task`s, one
per connection, as before. These are all considered implementation
details. On the public API, `Peer`s are managed as before through
the `Network`, except now the API has changed with the shift of focus
to (potentially multiple) connections per peer. The `NetworkEvent`s have
accordingly also undergone changes.

The Swarm APIs remain largely unchanged, except for the fact that
`inject_replaced` is no longer called. It may now practically happen
that multiple `ProtocolsHandler`s are associated with a single
`NetworkBehaviour`, one per connection. If implementations of
`NetworkBehaviour` rely somehow on communicating with exactly
one `ProtocolsHandler`, this may cause issues, but it is unlikely.

[1]: paritytech/substrate#4272
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed write-up for this patch-set.

core/src/connection/manager.rs Show resolved Hide resolved
core/src/connection/manager.rs Outdated Show resolved Hide resolved
core/src/connection/manager.rs Show resolved Hide resolved
core/src/connection/manager/task.rs Show resolved Hide resolved
core/src/connection/pool.rs Outdated Show resolved Hide resolved
core/src/connection/pool.rs Show resolved Hide resolved
@romanb
Copy link
Contributor Author

romanb commented Feb 16, 2020

The alternatives that come to mind involve making a NetworkBehaviour aware of ConnectionIDs, which is likely more intrusive and may be a change that goes well together with removing the ProtocolsHandler altogether later.

I would go more in this direction in general.

While this is clearly where we are heading in general, I'd really like to keep the changes and especially API changes outside of libp2p-core to a minimum in this first iteration.

Turns out it was necessary to do the following two small API changes on libp2p-swarm to allow sending of responses on the same connection:

  • Add a connection ID parameter to inject_node_event (and renamed to inject_event for consistency with the rest of the changes).
  • Add a connection ID parameter to NetworkBehaviorAction::SendEvent (and renamed to NetworkBehaviourAction::NotifyHandler for consistency with the rest of the changes).

These changes are very straight-forward in terms of code migration. See: ae6bfe9.

Copy link
Member

@tomaka tomaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

swarm/src/behaviour.rs Outdated Show resolved Hide resolved
swarm/src/behaviour.rs Show resolved Hide resolved
swarm/src/lib.rs Outdated Show resolved Hide resolved
core/src/connection.rs Outdated Show resolved Hide resolved
core/src/connection.rs Outdated Show resolved Hide resolved
core/src/connection/pool.rs Outdated Show resolved Hide resolved
core/src/connection/pool.rs Outdated Show resolved Hide resolved
core/src/connection/manager/task.rs Show resolved Hide resolved
core/src/connection/manager/task.rs Show resolved Hide resolved
swarm/src/behaviour.rs Outdated Show resolved Hide resolved
core/src/connection/error.rs Outdated Show resolved Hide resolved
core/src/connection/manager.rs Outdated Show resolved Hide resolved
core/src/connection/manager.rs Outdated Show resolved Hide resolved
///
/// **Note:** Must only be called after `poll_ready_notify_handler`
/// was successful and without interference by another thread, otherwise the event is discarded.
pub fn notify_handler(&mut self, event: I) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not replacing this racy poll_ready_notify_handler/notify_handler combo with a non-racy version, e.g.

pub fn notify_handler(&mut self, event: I, cx: &mut Context) -> Option<I> {
    if self.poll_ready_notify_task(cx).is_pending() {
        return Some(event)
    }
    let cmd = task::Command::NotifyHandler(event);
    self.notify_task(cmd);
    None
}

Copy link
Member

@tomaka tomaka Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be similar to the old AsyncSink enum, and it forces you to potentially store the event somewhere for later when the handler is ready. Meanwhile, with this "racy" API (it's not racy, since it requires &mut), you can generate the event with the guarantee that it is going to be delivered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had this discussion in #1373 already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to start with the simpler and safer API as per the suggestion of @twittner and potentially add the poll_ready_x / x combination later as an alternative if needed. That is, we may at the end offer both, just like futures::channel::mpsc::Sender does in form of try_send, start_send and poll_ready.

To elaborate, I'm sure we all understand the motivation and trade-off behind these two styles of API and we just have different priorities w.r.t. what we consider more important. Given that the poll_ready_x / x design is arguably both more "complex" (in that you have to use 2 methods instead of 1) and more error-prone (in that you have to use these two methods in a certain way, subject as well to pitfalls w.r.t multi-threading as already discussed in #1373) I personally prefer to start with just providing try_notify_handler for now. The poll_ready_notify_handler / notify_handler combination could be added later if there is a strong demand or use-case (which as far as I can tell even the Swarm does not currently have).

Does that make sense and sound reasonable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomaka Could you elaborate on the 👎 w.r.t. what aspect you disagree with? Do you insist on offering poll_ready_notify_handler / notify_handler from the start or do you even have objections to offering try_notify_handler?

Copy link
Member

@tomaka tomaka Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposition has been refuted in #1373

It hasn't been refuted, I just gave up arguing.

the safer alternative

The definition of "safe" in the context of Rust is "cannot trigger undefined behaviour". In no way misusing this API can trigger any undefined behaviour. The word "safe" is totally irrelevant here.

The worst that can happen is a panic resulting from us misusing the API. There are lots of places in the Rust standard library and in the Rust ecosystem where misusing the API will result in a panic. Polling a Future after it has returned Ready, to give the most prominent example.

Using this API in a correct way is as racy as try_notify_handler is, in the sense that sometimes it will return Pending and sometimes it will return Ready, depending on the timing of the other tasks/threads.

Misuing this API is racy in a similar way as using this API correctly, with the exception that you might get a panic (because you misused the API).

the more dangerous

In order for me to change my point of view, I would like to get a precise definition of what "dangerous" means here for you.

Copy link
Member

@tomaka tomaka Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My issue with try_notify_handler is the fact that there is no way for you to know whether your event will be accepted before generating said event.

Changing the design of Sink between futures 0.1 (where it looked like try_notify_handler) and 0.3 (where it looks like poll_ready_notify_handler/notify_handler) is to me one of the best decisions they made, as it considerably simplified lots of code. I don't want to revert to a futures-0.1-style API.

If panicking really is a big deal, then an API that I think is good is:

impl Manager {
    /// Try to give out a slot for sending an event.
    pub fn poll_ready_notify(&mut self, ...) -> Poll<EventSend> { ... }
}

impl<'a, T> EventSend<'a, T> {
    /// Sends the event, consuming `self`.
    pub fn send_event(self, event: T) { ... }
}

Copy link
Contributor Author

@romanb romanb Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The worst that can happen is a panic resulting from us misusing the API. There are lots of places in the Rust standard library and in the Rust ecosystem where misusing the API will result in a panic. Polling a Future after it has returned Ready, to give the most prominent example.

Note though that the underlying API we're talking about here, futures::channel::mpsc::Sender, never panics (and neither does our current API). Indeed the Sender API says poll_ready should be called before start_send and it consequently still returns an error if the channel is full. It is a bit telling, to me anyway, that they did not go with saying "poll_ready must be called before start_send" combined with panicking due to programmer error on start_send if the channel is full. And they also provide try_send.

Using this API in a correct way is as racy as try_notify_handler is, in the sense that sometimes it will return Pending and sometimes it will return Ready, depending on the timing of the other tasks/threads.

Here we are at a point which I thought was clarified e.g. here. To recap, the crucial difference is in the assumptions that the method that must be called second (e.g. notify_handler) places on the possible errors that can occur and how it deals with them. If it assumes certain errors can never happen because of a requirement that another method must be called first then calls to such two functions which may be interleaved by multiple threads if e.g. Mutexes are used can lead to unexpected behaviour, hence why I already augmented the commentary on notify_handler to say that not only must poll_ready_notify_handler be called first, but also that no other thread must be allowed to interleave between poll_ready_notify_handler and notify_handler. Of course that is easy to satisfy if, as a user, you just call both these &mut methods in sequence on the same thread without e.g. going through a Mutex, but nevertheless it is another thing to watch out for. We may agree to disagree on whether that justifies calling an API "racy", but in any case the futures::channel::mpsc::Sender API is actually not racy, whatever definition you choose, whereas the API provided by us, currently, is considered racy by myself and @twittner because it may silently (well, now it logs at least) ignore errors due to the channel being full when calling notify_handler.

I have added try_notify_handler in a new commit in the hope that it is uncontroversial since it follows the example of the futures::channel::mpsc::Sender API, leaving the other functions in place. Is that acceptable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposition has been refuted in #1373

It hasn't been refuted, I just gave up arguing.

I would love to hear a new argument then. If "it's not racy, since it requires &mut" would be true why would we even have this discussion? Since the outcome of concurrent calls to poll_ready_notify_handler and notify_handler depends on relative timings there clearly is a race condition here that the individual &muts in each of these methods can not prevent.

the safer alternative

The definition of "safe" in the context of Rust is "cannot trigger undefined behaviour". In no way misusing this API can trigger any undefined behaviour. The word "safe" is totally irrelevant here.

[...]

the more dangerous

In order for me to change my point of view, I would like to get a precise definition of what "dangerous" means here for you.

Rust does not have a monopoly on the word "safe" and since you were asking for a definiton of "dangerous", I am happy to elaborate: Assuming there are undesirable program states, e.g. a panic, an API is safe w.r.t. those states if it can by construction guarantee that those states can never occur. Correspondingly an API is dangerous w.r.t. those states if it can not give those guarantees.

Misuing this API is racy in a similar way as using this API correctly, with the exception that you might get a panic (because you misused the API).

This "exception" being an undesirable program state and the fact that the API can not guarantee its own correct use but allows for degrees of freedom that can lead to such a state makes it more dangerous in my book.

My issue with try_notify_handler is the fact that there is no way for you to know whether your event will be accepted before generating said event.

We could combine both worlds by keeping poll_ready_notify_handler as an indication that most likely notify_handler will be able to accept the event, but in case it can not (because of API misuse) we can at least return the event:

pub fn poll_ready_notify_handler(&mut self, cx: &mut Context) -> Poll<()>;

pub fn notify_handler(&mut self, event: I, cx: &mut Context) -> Option<I>;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could combine both worlds by keeping poll_ready_notify_handler as an indication that most likely notify_handler will be able to accept the event, but in case it can not (because of API misuse) we can at least return the event:

pub fn poll_ready_notify_handler(&mut self, cx: &mut Context) -> Poll<()>;

pub fn notify_handler(&mut self, event: I, cx: &mut Context) -> Option<I>;

I would be ok with that as well, and it would make try_notify_handler superfluous (notify_handler would then internally use Sender::try_send which is used by Sender::start_send internally anyway).

Roman S. Borschel and others added 3 commits February 28, 2020 22:34
Small API simplifications and code deduplication.
Some more useful debug logging.
@tomaka
Copy link
Member

tomaka commented Mar 4, 2020

The test is repeatedly failing.

@romanb
Copy link
Contributor Author

romanb commented Mar 4, 2020

The test is repeatedly failing.

Which test do you mean? They all look green right now. Relatedly, my proposal for the substrate integration is now also ready for review (but obviously not for merging): paritytech/substrate#5066.

@tomaka
Copy link
Member

tomaka commented Mar 4, 2020

Which test do you mean?

Hah, I restarted the tests once again right after posting my comment, and now it's green.

@romanb
Copy link
Contributor Author

romanb commented Mar 4, 2020

Which test do you mean?

Hah, I restarted the tests once again right after posting my comment, and now it's green.

I see, I can look at the ipfs-kad example again, but I would probably as a first guess blame IPFS nodes. It may help if we increase the acceptable maximum payload size of Kademlia requests / responses from 4KiB to 8KiB. I remember often seeing failures due to that size limit, sometimes even from the first node. There may be other issues as well, of course.

@tomaka
Copy link
Member

tomaka commented Mar 4, 2020

I would probably as a first guess blame IPFS nodes.

That would also be my guess.

@tomaka tomaka merged commit 8337687 into libp2p:master Mar 4, 2020
romanb pushed a commit to romanb/rust-libp2p that referenced this pull request Mar 12, 2020
PR [1440] introduced a regression w.r.t. the reporting of
dial errors. In particular, if a connection attempt fails
due to an invalid remote peer ID, any remaining addresses
for the same peer would not be tried (intentional) but
the dial failure would not be reported to the behaviour,
causing e.g. libp2p-kad queries to potentially stall.

In hindsight, I figured it is better to preserve the
previous behaviour to still try alternative addresses
of the peer even on invalid peer ID errors on an earlier
address. In particular because in the context of libp2p-kad
it is not uncommon for peers to report localhost addresses
while the local node actually has e.g. an ipfs node running
on that address, obviously with a different peer ID, which
is the scenario causing frequent invalid peer ID (mismatch)
errors when running the ipfs-kad example.

This commit thus restores the previous behaviour w.r.t.
trying all remaining addresses on invalid peer ID errors
as well as making sure `inject_dial_error` is always
called when the last attempt failed.

[1440]: libp2p#1440.
romanb added a commit that referenced this pull request Mar 16, 2020
* Fix regression w.r.t. reporting of dial errors.

PR [1440] introduced a regression w.r.t. the reporting of
dial errors. In particular, if a connection attempt fails
due to an invalid remote peer ID, any remaining addresses
for the same peer would not be tried (intentional) but
the dial failure would not be reported to the behaviour,
causing e.g. libp2p-kad queries to potentially stall.

In hindsight, I figured it is better to preserve the
previous behaviour to still try alternative addresses
of the peer even on invalid peer ID errors on an earlier
address. In particular because in the context of libp2p-kad
it is not uncommon for peers to report localhost addresses
while the local node actually has e.g. an ipfs node running
on that address, obviously with a different peer ID, which
is the scenario causing frequent invalid peer ID (mismatch)
errors when running the ipfs-kad example.

This commit thus restores the previous behaviour w.r.t.
trying all remaining addresses on invalid peer ID errors
as well as making sure `inject_dial_error` is always
called when the last attempt failed.

[1440]: #1440.

* Remove an fmt::Debug requirement.
romanb pushed a commit to romanb/rust-libp2p that referenced this pull request Mar 23, 2020
This is a follow-up to libp2p#1440
and relates to libp2p#925.
This change permits multiple dialing attempts per peer.
Note though that `libp2p-swarm` does not yet make use of this ability,
retaining the current behaviour. The essence of the changes are that the
`Peer` API now provides `Peer::dial()`, i.e. regardless of the state in
which the peer is. A dialing attempt is always made up of one or more
addresses tried sequentially, as before, but now there can be multiple
dialing attempts per peer. A configurable per-peer limit for outgoing
connections and thus concurrent dialing attempts is also included.
romanb pushed a commit to romanb/rust-libp2p that referenced this pull request Mar 31, 2020
This is a follow-up to libp2p#1440
and relates to libp2p#925.
This change permits multiple dialing attempts per peer.
Note though that `libp2p-swarm` does not yet make use of this ability,
retaining the current behaviour. The essence of the changes are that the
`Peer` API now provides `Peer::dial()`, i.e. regardless of the state in
which the peer is. A dialing attempt is always made up of one or more
addresses tried sequentially, as before, but now there can be multiple
dialing attempts per peer. A configurable per-peer limit for outgoing
connections and thus concurrent dialing attempts is also included.
romanb pushed a commit to romanb/rust-libp2p that referenced this pull request Apr 8, 2020
This is a follow-up to libp2p#1440
and relates to libp2p#925.
This change permits multiple dialing attempts per peer.
Note though that `libp2p-swarm` does not yet make use of this ability,
retaining the current behaviour. The essence of the changes are that the
`Peer` API now provides `Peer::dial()`, i.e. regardless of the state in
which the peer is. A dialing attempt is always made up of one or more
addresses tried sequentially, as before, but now there can be multiple
dialing attempts per peer. A configurable per-peer limit for outgoing
connections and thus concurrent dialing attempts is also included.
romanb added a commit that referenced this pull request May 12, 2020
* Permit concurrent dialing attempts per peer.

This is a follow-up to #1440
and relates to #925.
This change permits multiple dialing attempts per peer.
Note though that `libp2p-swarm` does not yet make use of this ability,
retaining the current behaviour. The essence of the changes are that the
`Peer` API now provides `Peer::dial()`, i.e. regardless of the state in
which the peer is. A dialing attempt is always made up of one or more
addresses tried sequentially, as before, but now there can be multiple
dialing attempts per peer. A configurable per-peer limit for outgoing
connections and thus concurrent dialing attempts is also included.

* Introduce `DialError` in `libp2p-swarm`.

For a cleaner API and to treat the case of no addresses
for a peer as an error, such that a `NetworkBehaviourAction::DialPeer`
request is always matched up with either `inject_connection_established`
or `inject_dial_error`.

* Fix rustdoc link.

* Add `DialPeerCondition::Always`.

* Adapt to master.

* Update changelog.
santos227 pushed a commit to santos227/rustlib that referenced this pull request Jun 20, 2022
* Fix regression w.r.t. reporting of dial errors.

PR [1440] introduced a regression w.r.t. the reporting of
dial errors. In particular, if a connection attempt fails
due to an invalid remote peer ID, any remaining addresses
for the same peer would not be tried (intentional) but
the dial failure would not be reported to the behaviour,
causing e.g. libp2p-kad queries to potentially stall.

In hindsight, I figured it is better to preserve the
previous behaviour to still try alternative addresses
of the peer even on invalid peer ID errors on an earlier
address. In particular because in the context of libp2p-kad
it is not uncommon for peers to report localhost addresses
while the local node actually has e.g. an ipfs node running
on that address, obviously with a different peer ID, which
is the scenario causing frequent invalid peer ID (mismatch)
errors when running the ipfs-kad example.

This commit thus restores the previous behaviour w.r.t.
trying all remaining addresses on invalid peer ID errors
as well as making sure `inject_dial_error` is always
called when the last attempt failed.

[1440]: libp2p/rust-libp2p#1440.

* Remove an fmt::Debug requirement.
santos227 pushed a commit to santos227/rustlib that referenced this pull request Jun 20, 2022
* Permit concurrent dialing attempts per peer.

This is a follow-up to libp2p/rust-libp2p#1440
and relates to libp2p/rust-libp2p#925.
This change permits multiple dialing attempts per peer.
Note though that `libp2p-swarm` does not yet make use of this ability,
retaining the current behaviour. The essence of the changes are that the
`Peer` API now provides `Peer::dial()`, i.e. regardless of the state in
which the peer is. A dialing attempt is always made up of one or more
addresses tried sequentially, as before, but now there can be multiple
dialing attempts per peer. A configurable per-peer limit for outgoing
connections and thus concurrent dialing attempts is also included.

* Introduce `DialError` in `libp2p-swarm`.

For a cleaner API and to treat the case of no addresses
for a peer as an error, such that a `NetworkBehaviourAction::DialPeer`
request is always matched up with either `inject_connection_established`
or `inject_dial_error`.

* Fix rustdoc link.

* Add `DialPeerCondition::Always`.

* Adapt to master.

* Update changelog.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants