[core/swarm] Emit events for active connection close and fix `disconnect()`. #1619

romanb · 2020-06-19T13:10:52Z

The Network does currently not emit events for actively closed connections, e.g. via EstablishedConnection::close
or ConnectedPeer::disconnect(). As a result, when actively closing connections, there will be ConnectionEstablished
events emitted without eventually a matching ConnectionClosed event. This seems undesirable and has the consequence that
the Swarm::ban_peer_id feature in libp2p-swarm does not result in appropriate calls to NetworkBehaviour::inject_connection_closed and NetworkBehaviour::inject_disconnected. Furthermore, the disconnect() functionality in libp2p-core is currently broken as it leaves the Pool in an inconsistent state.

This PR does the following:

When connection background tasks are dropped (i.e. removed from the Manager), they always terminate immediately, without attempting an orderly close of the connection.
An orderly close is sent to the background task of a connection as a regular command. The background task emits a Closed event before terminating.
Pool::disconnect() removes all connection tasks for the affected peer from the Manager, i.e. without an orderly close, thereby also fixing the discovered state inconsistency due to not removing the corresponding entries in the Pool itself after removing them from the Manager. The Pool ensures that ConnectionClosed events are emitted for these connections. The former NetworkEvent::ConnectionError has been renamed to NetworkEvent::ConnectionClosed with the error field being an Option, thus error: None indicates an active (but not necessarily orderly) close.
A new test is added to libp2p-swarm that exercises the ban/unban functionality (currently somewhat broken, see Banning a connected peer does not result in calls to NetworkBehaviour methods #1584) and places assertions on the number and order of calls to the NetworkBehaviour. In that context some new testing utilities have been added to libp2p-swarm as well.

Points (1)-(3) ensure that each NetworkEvent::ConnectionEstablished is eventually paired with a NetworkEvent::ConnectionClosed, also in the case of actively closed connections (with orderly shutdown or not), thus addressing #1584. (1) and (2) went along with some internal simplifications in the state machine implementation of the background Tasks.

The `Network` does currently not emit events for actively closed connections, e.g. via `EstablishedConnection::close` or `ConnectedPeer::disconnect()`. As a result, when actively closing connections, there will be `ConnectionEstablished` events emitted without eventually a matching `ConnectionClosed` event. This seems undesirable and has the consequence that the `Swarm::ban_peer_id` feature in `libp2p-swarm` does not result in appropriate calls to `NetworkBehaviour::inject_connection_closed` and `NetworkBehaviour::inject_disconnected`. Furthermore, the `disconnect()` functionality in `libp2p-core` is currently broken as it leaves the `Pool` in an inconsistent state. This commit does the following: 1. When connection background tasks are dropped (i.e. removed from the `Manager`), they always terminate immediately, without attempting an orderly close of the connection. 2. An orderly close is sent to the background task of a connection as a regular command. The background task emits a `Closed` event before terminating. 3. `Pool::disconnect()` removes all connection tasks for the affected peer from the `Manager`, i.e. without an orderly close, thereby also fixing the discovered state inconsistency due to not removing the corresponding entries in the `Pool` itself after removing them from the `Manager`. 4. A new test is added to `libp2p-swarm` that exercises the ban/unban functionality and places assertions on the number and order of calls to the `NetworkBehaviour`. In that context some new testing utilities have been added to `libp2p-swarm`. This addresses libp2p#1584.

tomaka · 2020-06-19T13:34:21Z

I haven't read the code changes yet, but it is completely intentional that no event is emitted if close or disconnect is called.

Unless that has been changed, the API of close and disconnect guarantees that the connection in question will no longer be visible in the API of Network afterwards. The user can assume that calling close/disconnect implicitly generates an instantaneous ConnectionClosed event.

Unless there is a reason why the closing has to be asynchronous on the API level, I do strongly prefer when disconnecting appears to be synchronous.

romanb · 2020-06-19T13:47:13Z

I haven't read the code changes yet, but it is completely intentional that no event is emitted if close or disconnect is called.

I think that is not a good idea, as can be seen by #1584 which is just incorrect behaviour. The Swarm relies on Established/Closed events coming in pairs and it seems cleaner to not make a distinction there between active/passive close.

Unless that has been changed, the API of close and disconnect guarantees that the connection in question will no longer be visible in the API of Network afterwards.

That is still the case for disconnect (used by Swarm::ban_peer_id), which is essentially an ungraceful "go away" signal. A proper close, however, is only complete once the background task has completed the orderly close. Implying an immediate "connection closed" success here, as is done right now, is just incorrect in my opinion, as the orderly shutdown may also fail. As I wrote in the PR description, this PR is making a clear distinction between an orderly shutdown/close of a connection and an abrupt disconnect.

Unless there is a reason why the closing has to be asynchronous on the API level, I do strongly prefer when disconnecting appears to be synchronous.

Nothing much changed on the API-level, since the background tasks are still there. The difference is that a) disconnect() ensures there is never an attempt at a graceful close in the background task, i.e. it stops immediately and b) For each closed connection that has been actively closed (disconnect or close) an event is also emitted that pairs up with the earlier ConnectionEstablished.

romanb · 2020-06-19T13:50:20Z

I want to hightlight again here the motivations of the PR description: a) The disconnect functionality in libp2p-core is currently broken and b) The banning functionality in the Swarm is also broken, partly as a consequence of a) but also because the NetworkBehaviour callbacks are not invoked for closed connections resulting from an active close (including disconnect).

tomaka · 2020-06-19T14:27:52Z

I think that is not a good idea, as can be seen by #1584 which is just incorrect behaviour. The Swarm relies on Established/Closed events coming in pairs and it seems cleaner to not make a distinction there between active/passive close.

My argument is that we should change the Swarm to no longer rely on that. The Swarm knows when close/disconnect is called and can generate a SwarmEvent itself. I don't see any reason to have a 1:1 mapping between the events from the Network and the events from the Swarm.

Implying an immediate "connection closed" success here, as is done right now, is just incorrect in my opinion, as the orderly shutdown may also fail.

I don't understand why it is important to know whether the shutdown has been successful or not. There is nothing the user can do if for example we fail to shut down a connection because of an I/O error.

a) The disconnect functionality in libp2p-core is currently broken

That's not clear to me. I would expect disconnect() (and close) to perform a clean shut down.

but also because the NetworkBehaviour callbacks are not invoked for closed connections resulting from an active close (including disconnect).

Which can be fixed by making the NetworkBehaviour call the callback from within ban_peer_id. We don't need to modify the behaviour of disconnect() for that.

romanb · 2020-06-20T08:45:12Z

I don't see any reason to have a 1:1 mapping between the events from the Network and the events from the Swarm.

Such a an overall 1:1 mapping between Network and Swarm events is not the motivation or goal here, this is only about connection established/closed events. I do see a good reason for having Network::ConnectionEstablishedand Network::ConnectionClosed (and similarly Swarm::ConnectionEstablished/Closed) always come in pairs, because the symmetry makes it easy to understand and write correct code that is handling these events. As witnessed by #1584, it is easy to make the mistake of forgetting that you don't get Closed-events for connections that have been closed as a result of some code path using close() or disconnect(). Now as a user of libp2p-core's Network like the Swarm here, instead of just writing code that handles closed connections in a single place, where you handle Network::ConnectionClosed, you need to remember that this code won't run if you explicitly close() or disconnect(). That is just not a nice API to program against in my opinion.

I don't understand why it is important to know whether the shutdown has been successful or not. There is nothing the user can do if for example we fail to shut down a connection because of an I/O error.

How do you know? If I try to perform a clean shutdown of a connection, which implies flushing and sending all remaining buffered data, I may certainly be interested in whether that succeeded. If it doesn't, I certainly have to assume that some of the data was not sent and may want to take action on that. If I don't care or know at all about whether an orderly connection shutdown succeeds, there isn't much point in doing it in the first place, I can just drop the connection. The current code (before this PR) always tries to perform an orderly close() in the background task when it notices that the Manager has dropped the task, but at that point no-one will or can know about the outcome, which seems pointless to me. If errors on close/shutdown were irrelevant as you say, surely there would be no need to even return an error from futures::io::AsyncWrite::poll_close or tokio::io::AsyncWrite::poll_shutdown.

a) The disconnect functionality in libp2p-core is currently broken

That's not clear to me. I would expect disconnect() (and close) to perform a clean shut down.

What is not clear? Why it is broken? As I hinted at in the PR description and should be seen in the diff (and has been reproduced by the included new libp2p-swarm test), Pool::disconnect() currently leaves the Pool in an inconsistent state because it does not remove the corresponding connection entries it removed from the Manager from its own established and pending lists.
If instead you mean it is not clear why I made disconnect() and close() behave differently, that is motivated by two things:

The current use of disconnect() in the context of Swarm::ban_peer_id. When you ban a peer, I don't think you want to have the connections to that peer potentially hanging around for quite some time to complete the orderly close(), which is what the background tasks will currently try to do. Rather, you want to drop the connections to the peer immediately.
If you want to cleanly shut down all connections to a peer, you can already do that by iterating through the connections to that peer and either abort() (for pending connections) or close() them (for established connections). There is really no need for a separate disconnect() functionality just to achieve that.

My argument is that we should change the Swarm to no longer rely on that.

The thing is, the Swarm already does and needs to rely on Established / Closed coming in pairs at least for the case where connections are not actively closed, since that is instrumental to deciding when to call inject_disconnected(), i.e. when the last connection closes. You're thus proposing it shouldn't rely on them coming in pairs for the case of actively closed connections, as you say:

The Swarm knows when close/disconnect is called and can generate a SwarmEvent itself.
[..]
Which can be fixed by making the NetworkBehaviour call the callback from within ban_peer_id. We don't need to modify the behaviour of disconnect() for that.

Yes, it can, as long as you can get access to all the information you need to feed into the callbacks and / or events. I think that is possible now, but must also always remain possible in the future (that is, in principle, any information you get in a Network::ConnectionClosed event must be obtainable through other APIs, since any such information may need to be passed to inject_connection_closed or _disconnected). However, because of that and for reasons stated at the beginning of this post whereby you have to repeat the code that needs to run on connection closure in multiple places, I don't think this is a good solution in terms of API design and hence did not propose it but rather an approach that I find preferable, which is the pairing of established/closed events no matter how connections are closed. If you find it preferable to address this in the Swarm only in the way you suggest, feel free to take from this PR what you like and leave what you don't (just don't forget the necessary changes to core::connection::Pool::disconnect(), which I think are necessary in any case).

romanb · 2020-07-07T13:50:53Z

Any further comments / questions? If my arguments are not convincing, and no-one else seems to have an opinion, feel free to close the PR. Otherwise I'm happy to rebase it.

core/src/connection/manager.rs

core/src/connection/manager/task.rs

core/src/connection/pool.rs

swarm/src/lib.rs

Co-authored-by: Toralf Wittner <[email protected]>

twittner

Looks good to me.

core/src/connection/manager.rs

There is no need for a `StartClose` future.

Building on the ability to wait for connection shutdown to complete introduced in libp2p#1619, this commit extends the ability for performing graceful shutdowns in the following ways: 1. The `ConnectionHandler` (and thus also `ProtocolsHandler`) can participate in the shutdown, via new `poll_close` methods. The muxer and underlying transport connection only starts closing once the connection handler signals readiness to do so. 2. A `Network` can be gracefully shut down, which involves a graceful shutdown of the underlying connection `Pool`. The `Pool` in turn proceeds with a shutdown by rejecting new connections while draining established connections. 3. A `Swarm` can be gracefully shut down, which involves a graceful shutdown of the underlying `Network` followed by polling the `NetworkBehaviour` until it returns `Poll::Pending`, i.e. it has no more output. In particular, the following are important details: * Analogous to new inbound and outbound connections during shutdown, while a single connection is shutting down, it rejects new inbound substreams and, by the return type of `ConnectionHandler::poll_close`, no new outbound substreams can be requested. * The `NodeHandlerWrapper` managing the `ProtocolsHandler` always waits for already ongoing inbound and outbound substream upgrades to complete. Since the `NodeHandlerWrapper` is a `ConnectionHandler`, the previous point applies w.r.t. new inbound and outbound substreams. * When the `connection_keep_alive` expires, a graceful shutdown is initiated.

…ive-close

Building on the ability to wait for connection shutdown to complete introduced in libp2p#1619, this commit extends the ability for performing graceful shutdowns in the following ways: 1. The `ConnectionHandler` (and thus also `ProtocolsHandler`) can participate in the shutdown, via new `poll_close` methods. The muxer and underlying transport connection only starts closing once the connection handler signals readiness to do so. 2. A `Network` can be gracefully shut down, which involves a graceful shutdown of the underlying connection `Pool`. The `Pool` in turn proceeds with a shutdown by rejecting new connections while draining established connections. 3. A `Swarm` can be gracefully shut down, which involves a graceful shutdown of the underlying `Network` followed by polling the `NetworkBehaviour` until it returns `Poll::Pending`, i.e. it has no more output. In particular, the following are important details: * Analogous to new inbound and outbound connections during shutdown, while a single connection is shutting down, it rejects new inbound substreams and, by the return type of `ConnectionHandler::poll_close`, no new outbound substreams can be requested. * The `NodeHandlerWrapper` managing the `ProtocolsHandler` always waits for already ongoing inbound and outbound substream upgrades to complete. Since the `NodeHandlerWrapper` is a `ConnectionHandler`, the previous point applies w.r.t. new inbound and outbound substreams. * When the `connection_keep_alive` expires, a graceful shutdown is initiated.

romanb · 2020-07-29T15:43:45Z

@twittner Thanks for the review, I incorporated pretty much all your suggestions.

@tomaka I didn't hear back from you yet - do you also have a final verdict? I would go ahead with merging eventually unless you still have strong objections and my answers to earlier questions were not convincing. If you just didn't find the time to take a closer look yet, I'm happy to leave this PR open for a while longer. The gist of the changes is still that *Established events are always paired with *Closed events, regardless of how connections are closed. It is true that with these changes connections that are gracefully closed (EstablishedConnection::start_close) do not immediately disappear from the API until they are, well, actually closed, which surely isn't non-sensical. This shouldn't really be a drastic change for a user either, because due to the background tasks any "established" connection one has at hand can suddenly actually be closed, which shows in the API (e.g. the possible errors of notify_handler). It may make even more sense if I mention here that in #1682 while the connection is performing a graceful shutdown but before it is ultimately closed, the handler is still allowed to receive and emit events (but not get or create any new substreams, see that PR). Of course, whether this is in itself a good idea or not should be discussed in that PR and not here.

…ive-close

twittner reviewed Jul 7, 2020

View reviewed changes

romanb and others added 2 commits July 8, 2020 11:17

Update swarm/src/lib.rs

69a6193

Co-authored-by: Toralf Wittner <[email protected]>

Incorporate some review feedback.

b639459

twittner approved these changes Jul 15, 2020

View reviewed changes

core/src/connection/manager.rs Outdated Show resolved Hide resolved

Roman S. Borschel added 6 commits July 17, 2020 12:48

Merge branch 'master' into active-close

13d3874

Adapt to changes in master.

16362e1

More verbose panic messages.

603cb2b

Merge branch 'master' into active-close

d35d780

Simplify

d1f8fe8

There is no need for a `StartClose` future.

Fix doc links.

9322433

romanb and others added 3 commits July 29, 2020 13:10

Merge branch 'master' into active-close

885580c

Further small cleanup.

487dc69

Merge branch 'active-close' of github.com:romanb/rust-libp2p into act…

7f08192

…ive-close

romanb mentioned this pull request Jul 29, 2020

[core/swarm] Graceful shutdown for connections, networks and swarms. #1682

Closed

romanb added the pr-queued-to-merge label Aug 3, 2020

romanb and others added 4 commits August 3, 2020 16:32

Merge branch 'master' into active-close

64e0fa9

Merge branch 'master' into active-close

2623ddf

Merge branch 'active-close' of github.com:romanb/rust-libp2p into act…

49d073e

…ive-close

Update CHANGELOGs and versions.

3c83345

romanb merged commit 8e1d4ed into libp2p:master Aug 4, 2020

romanb deleted the active-close branch September 9, 2020 09:41

koivunej mentioned this pull request Jan 27, 2021

Fix connecting to peers rs-ipfs/rust-ipfs#454

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core/swarm] Emit events for active connection close and fix `disconnect()`. #1619

[core/swarm] Emit events for active connection close and fix `disconnect()`. #1619

romanb commented Jun 19, 2020 •

edited

Loading

tomaka commented Jun 19, 2020 •

edited

Loading

romanb commented Jun 19, 2020

romanb commented Jun 19, 2020 •

edited

Loading

tomaka commented Jun 19, 2020

romanb commented Jun 20, 2020

romanb commented Jul 7, 2020

twittner left a comment

romanb commented Jul 29, 2020 •

edited

Loading

[core/swarm] Emit events for active connection close and fix disconnect(). #1619

[core/swarm] Emit events for active connection close and fix disconnect(). #1619

Conversation

romanb commented Jun 19, 2020 • edited Loading

tomaka commented Jun 19, 2020 • edited Loading

romanb commented Jun 19, 2020

romanb commented Jun 19, 2020 • edited Loading

tomaka commented Jun 19, 2020

romanb commented Jun 20, 2020

romanb commented Jul 7, 2020

twittner left a comment

Choose a reason for hiding this comment

romanb commented Jul 29, 2020 • edited Loading

[core/swarm] Emit events for active connection close and fix `disconnect()`. #1619

[core/swarm] Emit events for active connection close and fix `disconnect()`. #1619

romanb commented Jun 19, 2020 •

edited

Loading

tomaka commented Jun 19, 2020 •

edited

Loading

romanb commented Jun 19, 2020 •

edited

Loading

romanb commented Jul 29, 2020 •

edited

Loading