Excessive calls to `poll` and cloning of `Waker`s. #4990

jakubDoka · 2023-12-08T23:30:58Z

Summary

I perfed my protocol and discovered major hot spots within the repo code mostly ending up in libp2p_swarm::connection::Connection<THandler>::poll and yamux implementation. From the looks of it the "nothing to do" paths are constantly being hit and subsequently the Arcs dropped and Wakers cloned. I quickly noticed that code usually tries to poll everything anytime one central Waker triggers.

Expected behavior

Ideally, code selecting many futures in parallel should branch the main Waker for each future to collect information on what actually needs polling.

Actual behavior

Most of the hot instructions are doing atomic operations within Waker::clone and code almost always looks like this:

pub fn set_waker(old: &mut Option<Waker>, new: &Waker) {
    *old = Some(new.clone());
}

The most significant hot spot though is something different. here we unconditionally create Arc for each available protocol as the last thing before breaking the loop. The crux of the problem is that we call the poll to drop some Arcs and clone Wakers most of the time with no real work getting done.

Relevant log output

No response

Possible Solution

Simplest fix for updating cached wakers is refactoring the function to:

pub fn set_waker(old: &mut Option<Waker>, new: &Waker) {
    if let Some(old) = old {
        // clone_from will clone only if `new` won't wake the `old`
        old.clone_from(new);
    } else {
        *old = Some(new.clone());
    }
}

Although this is just fixing the symptom, its still a substantial improvement considering how easy it is to implement.

Another simple fix is avoiding the Arc creation in mentioned section of the code. I managed to pull it off but not without breaking the API (set intersection and difference iterators getting replaced with SmallVecs in ProtocolChange events).

Now for the main cause of all this...

Excessive Polling

All the mentioned code is almost exclusively placed at spots where handwritten futures ran out of things to do, this is also the code that is executed all the time. My conclusion is that we poll when we don't need to. Solving this though is hard and requires lot of changes, and careful testing (since eliminating excessive polls is painful since bugs exhibit as program getting blocked forever).

Version

b7914e4

Would you like to work on fixing this bug ?

Yes

The text was updated successfully, but these errors were encountered:

jakubDoka · 2023-12-09T12:50:18Z

After some more research... I found out bottleneck is where is should be, I mistook the instruction for something else, so excessive polling most likely does not happen, consider this issue irrelevant.

mxinden · 2023-12-09T18:06:19Z

Thank you for the research @jakubDoka!

The most significant hot spot though is something different. here we unconditionally create Arc for each available protocol as the last thing before breaking the loop. The crux of the problem is that we call the poll to drop some Arcs and clone Wakers most of the time with no real work getting done.

This is further discussed in #4284. Help much appreciated @jakubDoka.

Ideally, code selecting many futures in parallel should branch the main Waker for each future to collect information on what actually needs polling.

Agreed. I would assume this to be especially powerful within the NetworkBehaviour and ConnectionHandler hierarchies. Again, help appreciated. Though this requires some benchmarking.

jakubDoka · 2023-12-09T19:39:04Z

@mxinden should I make a small pr that fixes the #4284. Disadvantage of the fix is that the libp2p_swarm::handler::ProtocolChange changes from difference iterator to owned SmallVec since the iterator is set specific and borrowed, so that prevents me from changing the storing strategy of protocols.

thomaseizinger · 2023-12-09T20:16:46Z

I'd like to see some benchmarks that moving to SmallVec actually fixes this. Another problem is that we always call StreamProtocol::try_from_owned which we can only get rid of by changing what UpgradeInfo returns (or by removing it entirely, see #3268 and related issues).

It is an unfortunate penality that we pay here as this scales with N (the number of protocols you plug into Swarm). I'd really like to fix it. The good news is that it shouldn't matter much (compared to the actual CPU time spent on your protocols) for small number of protocols. A proper fix for this is hard I think. Help is much appreciated though! :)

thomaseizinger · 2023-12-09T20:18:38Z

Disadvantage of the fix is that the libp2p_swarm::handler::ProtocolChange changes from difference iterator to owned SmallVec since the iterator is set specific and borrowed, so that prevents me from changing the storing strategy of protocols.

The storage should be an implementation detail. Why is that a breaking change?

jakubDoka · 2023-12-09T20:28:21Z

Disadvantage of the fix is that the libp2p_swarm::handler::ProtocolChange changes from difference iterator to owned SmallVec since the iterator is set specific and borrowed, so that prevents me from changing the storing strategy of protocols.

The storage should be an implementation detail. Why is that a breaking change?

@thomaseizinger because we use hash_set::Difference<StreamProtocol> (in api) which means I need to store HashSet<StreamProtocol> to constrct it

jakubDoka · 2023-12-10T02:22:54Z

I discovered quite the opposite problem when experimenting with optimized polling. Without proper waking in my handler, the whole swarm gets stuck because connection event does not cause a repoll anymore (with scoped waking). This behaviour is valid one as documented in the standard library. Future may be polled more then needed though that should not be relied upon (wake if you want to be polled), but, e.g., kad still relies on this.

jakubDoka closed this as not planned Won't fix, can't repro, duplicate, stale Dec 9, 2023

jakubDoka mentioned this issue Dec 9, 2023

fix(swarm): Eliminating repetitive arc construction on each poll #4991

Closed

4 tasks

jakubDoka reopened this Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive calls to `poll` and cloning of `Waker`s. #4990

Excessive calls to `poll` and cloning of `Waker`s. #4990

jakubDoka commented Dec 8, 2023

jakubDoka commented Dec 9, 2023 •

edited

Loading

mxinden commented Dec 9, 2023

jakubDoka commented Dec 9, 2023 •

edited

Loading

thomaseizinger commented Dec 9, 2023

thomaseizinger commented Dec 9, 2023

jakubDoka commented Dec 9, 2023 •

edited

Loading

jakubDoka commented Dec 10, 2023 •

edited

Loading

Excessive calls to poll and cloning of Wakers. #4990

Excessive calls to poll and cloning of Wakers. #4990

Comments

jakubDoka commented Dec 8, 2023

Summary

Expected behavior

Actual behavior

Relevant log output

Possible Solution

Excessive Polling

Version

Would you like to work on fixing this bug ?

jakubDoka commented Dec 9, 2023 • edited Loading

mxinden commented Dec 9, 2023

jakubDoka commented Dec 9, 2023 • edited Loading

thomaseizinger commented Dec 9, 2023

thomaseizinger commented Dec 9, 2023

jakubDoka commented Dec 9, 2023 • edited Loading

jakubDoka commented Dec 10, 2023 • edited Loading

Excessive calls to `poll` and cloning of `Waker`s. #4990

Excessive calls to `poll` and cloning of `Waker`s. #4990

jakubDoka commented Dec 9, 2023 •

edited

Loading

jakubDoka commented Dec 9, 2023 •

edited

Loading

jakubDoka commented Dec 9, 2023 •

edited

Loading

jakubDoka commented Dec 10, 2023 •

edited

Loading