Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swarm: deprecate KeepAlive::Until #3844

Closed
12 tasks done
Tracked by #4306
thomaseizinger opened this issue Apr 27, 2023 · 17 comments
Closed
12 tasks done
Tracked by #4306

swarm: deprecate KeepAlive::Until #3844

thomaseizinger opened this issue Apr 27, 2023 · 17 comments

Comments

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Apr 27, 2023

Description

Deprecate the KeepAlive::Until variant. Users can achieve the same thing by having a timer internally.

Motivation

Understanding how to properly use KeepAlive::Until is difficult. It requires keeping around KeepAlive as a field in the ConnectionHandler otherwise it will be updated forever with a new timestamp. Having KeepAlive as a field thus makes it more difficult to accurately compute it. It could be set from No to Yes and back etc

Ideally, a ConnectionHandler can just compute its KeepAlive value from its internal state. There should be no reason to keep an idle connection around. If a ConnectionHandler as nothing to do, it should immediately return KeepAlive::No.

To avoid connections from being shut-down directly after it is established, NetworkBehaviours should pass initial state in handle_established_outbound_connection and handle_established_inbound_connection. This way, the connection_keep_alive function can correctly return KeepAlive::Yes.

Whether or not completely idle connections should be shut down or not is an application-wide decision and cannot reasonably be computed by a single ConnectionHandler. Users should use swarm::Config::with_idle_connection_timeout to configure that. See #4121.

@thomaseizinger
Copy link
Contributor Author

cc @nathanielc

@thomaseizinger
Copy link
Contributor Author

Identify doesn't use Until but #3876 still shows quite well how I think protocols should implement the keep-alive mechanism.

@thomaseizinger
Copy link
Contributor Author

#3876 is queued for merging now. In a first step towards deprecating KeepAlive::Until, I'd like to move all protocols to this new approach of handling KeepAlive. We should do it as one PR per protocol.

Once all protocols have been ported, we can deprecate the enum variant itself and have our users to the same.

mergify bot pushed a commit that referenced this issue May 8, 2023
Currently, the `connection_keep_alive` function of identify does not compute anything but its return value is set through the handler state machine. This is hard to understand and causes surprising behaviour because at the moment, we set `KeepAlive::No` as soon as the remote has answered our identify request. Depending on what else is happening on the connection, this might close the connection before we have successfully answered the remote's identify request.

To fix this, we now compute `connection_keep_alive` based on whether we are still using the connection.

Related: #3844.

Pull-Request: #3876.
@nathanielc
Copy link
Contributor

To avoid connections from being shut-down directly after it is established, NetworkBehaviours should pass initial state in handle_established_outbound_connection and handle_established_inbound_connection. This way, the connection_keep_alive function can correctly return KeepAlive::Yes.

Has this part already been implemented? I had assumed this was a blocker to make many changes here. Was the identity protocol a simpler case?

@thomaseizinger
Copy link
Contributor Author

To avoid connections from being shut-down directly after it is established, NetworkBehaviours should pass initial state in handle_established_outbound_connection and handle_established_inbound_connection. This way, the connection_keep_alive function can correctly return KeepAlive::Yes.

Has this part already been implemented? I had assumed this was a blocker to make many changes here. Was the identity protocol a simpler case?

Yes the identify protocol was simple in this regard because no messages are generated in the behaviour.

Help in migrating other protocols away from KeepAlive::Until would be much appreciated. Off the top of my head, kademlia might require some changes and perhaps gossipsub. I don't think any other protocol actively establishes connections?

@drHuangMHT
Copy link
Contributor

After Until being removed, KeepAlive can be Copy or be replaced by boolean, any interest?

@thomaseizinger
Copy link
Contributor Author

It is already Copy. Not sure there is any practical benefit from it being a boolean. Having a separate type gives us a place to put documentation on it for example.

mergify bot pushed a commit that referenced this issue Jun 4, 2023
Similar to #3876, we now compute `connection_keep_alive` based on whether we are still using the connection, applied to the `dcutr` protocol.

Related: #3844.

Pull-Request: #3960.
@thomaseizinger
Copy link
Contributor Author

@mxinden Do you think it makes sense to add a general idle_connection_timeout to Swarm? It would trigger as soon as all ConnectionHandlers report KeepAlive::No. We can default it to 0 seconds.

When trying to transition libp2p-request-response away from KeepAlive::Until, I ran into the issue that the perf protocol currently relies on setting the connection timeout. Tests would also benefit from such a config option.

One could even ask whether we shouldn't deprecate keep_alive::Behaviour completely in favor of an idle connection timeout setting on Swarm.

@tcoratger
Copy link
Contributor

@thomaseizinger In the relay protocol, I saw that there was at the end of the poll function in the behavior handler, there was the following part:

// Check keep alive status.
if self.reservation_request_future.is_none()
    && self.circuit_accept_futures.is_empty()
    && self.circuit_deny_futures.is_empty()
    && self.alive_lend_out_substreams.is_empty()
    && self.circuits.is_empty()
    && self.active_reservation.is_none()
{
    match self.keep_alive {
        KeepAlive::Yes => {
            self.keep_alive = KeepAlive::Until(Instant::now() + Duration::from_secs(10));
        }
        KeepAlive::Until(_) => {}
        KeepAlive::No => panic!("Handler never sets KeepAlive::No."),
    }
} else {
    self.keep_alive = KeepAlive::Yes;
}

If I'm correct, the goal of this snippet is to keep the connection alive few more seconds, even if the condition is false. By removing the self.keep_alive in the context of this issue, this snippet is removed and then we have to find a way to mimic this in order for the function connection_keep_alive to return KeepAlive::Yes in such circumstances, correct?

@thomaseizinger
Copy link
Contributor Author

Yes, although in general, I'd like to move away from implicitly keeping connections open for longer than they are needed.

For the relay protocol, what happens if we just remove the Until bit and just determine KeepAlive based on the above conditions?

@mxinden
Copy link
Member

mxinden commented Jun 26, 2023

Proposal above, namely to add an idle_connection_timeout to Swarm makes sense to me.

At first I wondered whether a idle_at_start_connection_timeout would make more sense. Thinking about it some more idle_connection_timeout i.e. a timeout applicable not only at the start of the connection, but throughout its lifecycle, might be easier to reason about.

@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented Jun 26, 2023

Proposal above, namely to add an idle_connection_timeout to Swarm makes sense to me.

At first I wondered whether a idle_at_start_connection_timeout would make more sense. Thinking about it some more idle_connection_timeout i.e. a timeout applicable not only at the start of the connection, but throughout its lifecycle, might be easier to reason about.

Yeah I think that is very hard to teach and is a bit too specialized for my taste. Also, with the strategy of initialising ConnectionHandlers with the work they should do, the "a connection that might be useful later is idle initially" is not a problem.

On the other hand, having connections not close immediately when they go idle after doing some work is definitely a problem:

  • in our tests
  • in examples
  • when running the benchmarks

The benchmarks in particular are a special-case of applications that aren't strictly p2p but more client-server style, i.e. you want to connect to very particular nodes.

I opened an issue here: #4121

@thomaseizinger thomaseizinger added this to the Simplify idle connection management milestone Sep 17, 2023
mergify bot pushed a commit that referenced this issue Sep 19, 2023
Previously, a connection would be shut down immediately as soon as its `ConnectionHandler` reports `KeepAlive::No`. As we have gained experience with libp2p, it turned out that this isn't ideal.

For one, tests often need to keep connections alive longer than the configured protocols require. Plus, some usecases require connections to be kept alive in general.

Both of these needs are currently served by the `keep_alive::Behaviour`. That one does essentially nothing other than statically returning `KeepAlive::Yes` from its `ConnectionHandler`.

It makes much more sense to deprecate `keep_alive::Behaviour` and instead allow users to globally configure an `idle_conncetion_timeout` on the `Swarm`. This timeout comes into effect once a `ConnectionHandler` reports `KeepAlive::No`. To start with, this timeout is 0. Together with #3844, this will allow us to move towards a much more aggressive closing of idle connections, together with a more ergonomic way of opting out of this behaviour.

Fixes #4121.

Pull-Request: #4161.
@thomaseizinger thomaseizinger removed this from the Simplify idle connection management milestone Sep 19, 2023
thomaseizinger pushed a commit to dgarus/rust-libp2p that referenced this issue Sep 20, 2023
Previously, a connection would be shut down immediately as soon as its `ConnectionHandler` reports `KeepAlive::No`. As we have gained experience with libp2p, it turned out that this isn't ideal.

For one, tests often need to keep connections alive longer than the configured protocols require. Plus, some usecases require connections to be kept alive in general.

Both of these needs are currently served by the `keep_alive::Behaviour`. That one does essentially nothing other than statically returning `KeepAlive::Yes` from its `ConnectionHandler`.

It makes much more sense to deprecate `keep_alive::Behaviour` and instead allow users to globally configure an `idle_conncetion_timeout` on the `Swarm`. This timeout comes into effect once a `ConnectionHandler` reports `KeepAlive::No`. To start with, this timeout is 0. Together with libp2p#3844, this will allow us to move towards a much more aggressive closing of idle connections, together with a more ergonomic way of opting out of this behaviour.

Fixes libp2p#4121.

Pull-Request: libp2p#4161.
thomaseizinger pushed a commit that referenced this issue Sep 21, 2023
Previously, a connection would be shut down immediately as soon as its `ConnectionHandler` reports `KeepAlive::No`. As we have gained experience with libp2p, it turned out that this isn't ideal.

For one, tests often need to keep connections alive longer than the configured protocols require. Plus, some usecases require connections to be kept alive in general.

Both of these needs are currently served by the `keep_alive::Behaviour`. That one does essentially nothing other than statically returning `KeepAlive::Yes` from its `ConnectionHandler`.

It makes much more sense to deprecate `keep_alive::Behaviour` and instead allow users to globally configure an `idle_conncetion_timeout` on the `Swarm`. This timeout comes into effect once a `ConnectionHandler` reports `KeepAlive::No`. To start with, this timeout is 0. Together with #3844, this will allow us to move towards a much more aggressive closing of idle connections, together with a more ergonomic way of opting out of this behaviour.

Fixes #4121.

Pull-Request: #4161.
@thomaseizinger
Copy link
Contributor Author

@mxinden Shall we put a #[deprecated] on KeepAlive::Until already and currently #[allow] them within our repository? That would give us a head-start and allow (pun intended) us to remove KeepAlive::Until with the next breaking release already!

mergify bot pushed a commit that referenced this issue Oct 15, 2023
Deprecate the `Config::idle_timeout` function in preparation for removing the `KeepAlive::Until` entirely.

Related: #3844.

Pull-Request: #4648.
mergify bot pushed a commit that referenced this issue Oct 16, 2023
@mxinden
Copy link
Member

mxinden commented Oct 16, 2023

@mxinden Shall we put a #[deprecated] on KeepAlive::Until already and currently #[allow] them within our repository? That would give us a head-start and allow (pun intended) us to remove KeepAlive::Until with the next breaking release already!

Works for me @thomaseizinger.

mergify bot pushed a commit that referenced this issue Oct 16, 2023
In preparation for removing this variant, we are issuing a deprecation notice. The linked issue describes why and guides users in migrating away from it. This deprecation is backwards-compatible and allows us to remove the variant in the next breaking release. We haven't migrated all internal protocols yet but that isn't strictly necessary to issue the deprecation.

Related: #3844.

Pull-Request: #4656.
mergify bot pushed a commit that referenced this issue Oct 17, 2023
mergify bot pushed a commit that referenced this issue Oct 17, 2023
mergify bot pushed a commit that referenced this issue Oct 20, 2023
mergify bot pushed a commit that referenced this issue Oct 20, 2023
mergify bot pushed a commit that referenced this issue Oct 20, 2023
`KeepAlive::Until` is being deprecated. We emulate the functionality within the `ConnectionHandler` to ensure relayed connections don't get immediately closed when a client establishes one.

Related: #3844.
Related: #4656.

Pull-Request: #4676.
mergify bot pushed a commit that referenced this issue Oct 20, 2023
This function has been deprecated and can now be removed.

Related: #3844.
Related: #4678.

Pull-Request: #4679.
mergify bot pushed a commit that referenced this issue Oct 20, 2023
Previously, the relay client was applying a 10 second timeout to idle connections. Instead, we now compute the connection keep alive based on whether we are still using the connection.

This makes all tests pass apart from 1: `reuse_connection`. This makes sense as that connection is immediately idle once dialed. To make that test pass, we configure a global idle connection timeout of 1 second.

In a real-world scenario, reusing a connection only applies if the connection is still alive due to other protocols being active.

Related: #3844.

Pull-Request: #4696.
mergify bot pushed a commit that referenced this issue Oct 20, 2023
@thomaseizinger
Copy link
Contributor Author

This is all done! Many thanks to @leonzchang for all the help!

@leonzchang
Copy link
Contributor

This is all done! Many thanks to @leonzchang for all the help!

@thomaseizinger Thanks for your guidance too!

binarybaron added a commit to UnstoppableSwap/core that referenced this issue Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants