-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core/swarm] Emit events for active connection close and fix disconnect()
.
#1619
Conversation
The `Network` does currently not emit events for actively closed connections, e.g. via `EstablishedConnection::close` or `ConnectedPeer::disconnect()`. As a result, when actively closing connections, there will be `ConnectionEstablished` events emitted without eventually a matching `ConnectionClosed` event. This seems undesirable and has the consequence that the `Swarm::ban_peer_id` feature in `libp2p-swarm` does not result in appropriate calls to `NetworkBehaviour::inject_connection_closed` and `NetworkBehaviour::inject_disconnected`. Furthermore, the `disconnect()` functionality in `libp2p-core` is currently broken as it leaves the `Pool` in an inconsistent state. This commit does the following: 1. When connection background tasks are dropped (i.e. removed from the `Manager`), they always terminate immediately, without attempting an orderly close of the connection. 2. An orderly close is sent to the background task of a connection as a regular command. The background task emits a `Closed` event before terminating. 3. `Pool::disconnect()` removes all connection tasks for the affected peer from the `Manager`, i.e. without an orderly close, thereby also fixing the discovered state inconsistency due to not removing the corresponding entries in the `Pool` itself after removing them from the `Manager`. 4. A new test is added to `libp2p-swarm` that exercises the ban/unban functionality and places assertions on the number and order of calls to the `NetworkBehaviour`. In that context some new testing utilities have been added to `libp2p-swarm`. This addresses libp2p#1584.
I haven't read the code changes yet, but it is completely intentional that no event is emitted if Unless that has been changed, the API of Unless there is a reason why the closing has to be asynchronous on the API level, I do strongly prefer when disconnecting appears to be synchronous. |
I think that is not a good idea, as can be seen by #1584 which is just incorrect behaviour. The
That is still the case for
Nothing much changed on the API-level, since the background tasks are still there. The difference is that a) |
I want to hightlight again here the motivations of the PR description: a) The |
My argument is that we should change the Swarm to no longer rely on that. The Swarm knows when
I don't understand why it is important to know whether the shutdown has been successful or not. There is nothing the user can do if for example we fail to shut down a connection because of an I/O error.
That's not clear to me. I would expect
Which can be fixed by making the |
Such a an overall 1:1 mapping between Network and Swarm events is not the motivation or goal here, this is only about connection established/closed events. I do see a good reason for having
How do you know? If I try to perform a clean shutdown of a connection, which implies flushing and sending all remaining buffered data, I may certainly be interested in whether that succeeded. If it doesn't, I certainly have to assume that some of the data was not sent and may want to take action on that. If I don't care or know at all about whether an orderly connection shutdown succeeds, there isn't much point in doing it in the first place, I can just drop the connection. The current code (before this PR) always tries to perform an orderly
What is not clear? Why it is broken? As I hinted at in the PR description and should be seen in the diff (and has been reproduced by the included new
The thing is, the
Yes, it can, as long as you can get access to all the information you need to feed into the callbacks and / or events. I think that is possible now, but must also always remain possible in the future (that is, in principle, any information you get in a |
Any further comments / questions? If my arguments are not convincing, and no-one else seems to have an opinion, feel free to close the PR. Otherwise I'm happy to rebase it. |
Co-authored-by: Toralf Wittner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Building on the ability to wait for connection shutdown to complete introduced in libp2p#1619, this commit extends the ability for performing graceful shutdowns in the following ways: 1. The `ConnectionHandler` (and thus also `ProtocolsHandler`) can participate in the shutdown, via new `poll_close` methods. The muxer and underlying transport connection only starts closing once the connection handler signals readiness to do so. 2. A `Network` can be gracefully shut down, which involves a graceful shutdown of the underlying connection `Pool`. The `Pool` in turn proceeds with a shutdown by rejecting new connections while draining established connections. 3. A `Swarm` can be gracefully shut down, which involves a graceful shutdown of the underlying `Network` followed by polling the `NetworkBehaviour` until it returns `Poll::Pending`, i.e. it has no more output. In particular, the following are important details: * Analogous to new inbound and outbound connections during shutdown, while a single connection is shutting down, it rejects new inbound substreams and, by the return type of `ConnectionHandler::poll_close`, no new outbound substreams can be requested. * The `NodeHandlerWrapper` managing the `ProtocolsHandler` always waits for already ongoing inbound and outbound substream upgrades to complete. Since the `NodeHandlerWrapper` is a `ConnectionHandler`, the previous point applies w.r.t. new inbound and outbound substreams. * When the `connection_keep_alive` expires, a graceful shutdown is initiated.
Building on the ability to wait for connection shutdown to complete introduced in libp2p#1619, this commit extends the ability for performing graceful shutdowns in the following ways: 1. The `ConnectionHandler` (and thus also `ProtocolsHandler`) can participate in the shutdown, via new `poll_close` methods. The muxer and underlying transport connection only starts closing once the connection handler signals readiness to do so. 2. A `Network` can be gracefully shut down, which involves a graceful shutdown of the underlying connection `Pool`. The `Pool` in turn proceeds with a shutdown by rejecting new connections while draining established connections. 3. A `Swarm` can be gracefully shut down, which involves a graceful shutdown of the underlying `Network` followed by polling the `NetworkBehaviour` until it returns `Poll::Pending`, i.e. it has no more output. In particular, the following are important details: * Analogous to new inbound and outbound connections during shutdown, while a single connection is shutting down, it rejects new inbound substreams and, by the return type of `ConnectionHandler::poll_close`, no new outbound substreams can be requested. * The `NodeHandlerWrapper` managing the `ProtocolsHandler` always waits for already ongoing inbound and outbound substream upgrades to complete. Since the `NodeHandlerWrapper` is a `ConnectionHandler`, the previous point applies w.r.t. new inbound and outbound substreams. * When the `connection_keep_alive` expires, a graceful shutdown is initiated.
@twittner Thanks for the review, I incorporated pretty much all your suggestions. @tomaka I didn't hear back from you yet - do you also have a final verdict? I would go ahead with merging eventually unless you still have strong objections and my answers to earlier questions were not convincing. If you just didn't find the time to take a closer look yet, I'm happy to leave this PR open for a while longer. The gist of the changes is still that |
The
Network
does currently not emit events for actively closed connections, e.g. viaEstablishedConnection::close
or
ConnectedPeer::disconnect()
. As a result, when actively closing connections, there will beConnectionEstablished
events emitted without eventually a matching
ConnectionClosed
event. This seems undesirable and has the consequence thatthe
Swarm::ban_peer_id
feature inlibp2p-swarm
does not result in appropriate calls toNetworkBehaviour::inject_connection_closed
andNetworkBehaviour::inject_disconnected
. Furthermore, thedisconnect()
functionality inlibp2p-core
is currently broken as it leaves thePool
in an inconsistent state.This PR does the following:
When connection background tasks are dropped (i.e. removed from the
Manager
), they always terminate immediately, without attempting an orderly close of the connection.An orderly close is sent to the background task of a connection as a regular command. The background task emits a
Closed
event before terminating.Pool::disconnect()
removes all connection tasks for the affected peer from theManager
, i.e. without an orderly close, thereby also fixing the discovered state inconsistency due to not removing the corresponding entries in thePool
itself after removing them from theManager
. ThePool
ensures thatConnectionClosed
events are emitted for these connections. The formerNetworkEvent::ConnectionError
has been renamed toNetworkEvent::ConnectionClosed
with theerror
field being anOption
, thuserror: None
indicates an active (but not necessarily orderly) close.A new test is added to
libp2p-swarm
that exercises the ban/unban functionality (currently somewhat broken, see Banning a connected peer does not result in calls to NetworkBehaviour methods #1584) and places assertions on the number and order of calls to theNetworkBehaviour
. In that context some new testing utilities have been added tolibp2p-swarm
as well.Points (1)-(3) ensure that each
NetworkEvent::ConnectionEstablished
is eventually paired with aNetworkEvent::ConnectionClosed
, also in the case of actively closed connections (with orderly shutdown or not), thus addressing #1584. (1) and (2) went along with some internal simplifications in the state machine implementation of the backgroundTask
s.