Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(gossipsub): send more specific messages to ConnectionHandler #4811

Merged
merged 14 commits into from
Nov 22, 2023

Conversation

jxs
Copy link
Member

@jxs jxs commented Nov 6, 2023

Description

Previously, the NetworkBehaviour constructed a proto::RPC type and sent that into an unbounded queue to the ConnectionHandler. With such a broad type, the ConnectionHandler cannot do any prioritization on which messages to drop if a connection slows down.

To enable a more fine-granular handling of messages, we make the interface between NetworkBehaviour and ConnectionHandler more specific by introducing an RpcOut type that differentiates between Publish, Forward, Subscribe, etc.

Related: #4667.

Notes & open questions

Max suggested on #4667 (comment) changing HandlerIn but that would imply more changes into the fragment_message function and back and forth conversion between proto::RPC and its subtypes (using a subtype via HandlerIn to then convert to the main proto::RPC type when sending it from the ConnectionHandler). Adding a RpcType to distinguish seemed simpler.

We were also using the Rpc type for conversions between Messages, ControlActions, and Subscriptions which can be done from these types directly.

ptal @mxinden @thomaseizinger @divagant-martian @AgeManning

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@jxs jxs force-pushed the gossipsub-backpressure-1 branch from 8d79411 to d8545d8 Compare November 6, 2023 19:31
protocols/gossipsub/src/lib.rs Outdated Show resolved Hide resolved
protocols/gossipsub/src/behaviour.rs Outdated Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor

thomaseizinger commented Nov 6, 2023

Max suggested on #4667 (comment) changing HandlerIn but that would imply more changes into the fragment_message function and back and forth conversion between proto::RPC and its subtypes (using a subtype via HandlerIn to then convert to the main proto::RPC type when sending it from the ConnectionHandler). Adding a RpcType to distinguish seemed simpler.

Hmm, I am not convinced it is simpler. I think it adds complexity actually because now there is a pathological case where the message can contain one thing but the type says otherwise.

Can't we just move the fragment_message to the ConnectionHandler and thus only call it once? We should definitely avoid conversions back and forth.

My theory is that we can reduce allocations in gossipsub a lot if we delay construction of proto::RPC as long as possible and effectively only use it to write to the stream. That isn't directly relevant to the backpressure problem although I think overall, it is a good thinking to apply.

@jxs
Copy link
Member Author

jxs commented Nov 6, 2023

Hi Thomas, thanks for jumping in

Hmm, I am not convinced it is simpler. I think it adds complexity actually because now there is a pathological case where the
message can contain one thing but the type says otherwise.

yup, I agree, I also don't like that we are not leveraring the type system for that.

Can't we just move the fragment_message to the ConnectionHandler and thus only call it once? We should definitely avoid conversions back and forth.

Yeah I also tinkered with that, but then it means we'll be fragmenting the main message for each connection handler, which means calling the same function lots of time, on LH's case is ~100 connection handlers.

My theory is that we can reduce allocations in gossipsub a lot if we delay construction of proto::RPC as long as possible and effectively only use it to write to the stream. That isn't directly relevant to the backpressure problem although I think overall, it is a good thinking to apply.

It's interesting you say this, cause after I have the opposite idea aha, I was thinking that by having a proto as early as possible we can then Arc it and share it across all the handlers that are sending it eventually dropping that proto when all the handlers have sent it. If we have other types that eventually only construct the proto::RPC when sending down the wire it defeats the purpose of Arcing it as we cannot move it from the Arc to call into so it always needs to be cloned once more inside each ConnectionHandler.

How do you envision reducing the allocation by delaying the construction of the proto::RPC?

@thomaseizinger
Copy link
Contributor

My theory is that we can reduce allocations in gossipsub a lot if we delay construction of proto::RPC as long as possible and effectively only use it to write to the stream. That isn't directly relevant to the backpressure problem although I think overall, it is a good thinking to apply.

It's interesting you say this, cause after I have the opposite idea aha, I was thinking that by having a proto as early as possible we can then Arc it and share it across all the handlers that are sending it eventually dropping that proto when all the handlers have sent it. If we have other types that eventually only construct the proto::RPC when sending down the wire it defeats the purpose of Arcing it as we cannot move it from the Arc to call into so it always needs to be cloned once more inside each ConnectionHandler.

How do you envision reducing the allocation by delaying the construction of the proto::RPC?

With something like #4751 where the main payload is a Bytes and thus cheap to clone. We can then construct a proto::RPC using borrowed data by changing pb-rs to generate us proto bindings that leverage Cow.

Can't we just move the fragment_message to the ConnectionHandler and thus only call it once? We should definitely avoid conversions back and forth.

Yeah I also tinkered with that, but then it means we'll be fragmenting the main message for each connection handler, which means calling the same function lots of time, on LH's case is ~100 connection handlers.

I've not looked in detail into why we need fragment_message but I couldn't find anything in the specs on what message fragmentation is which makes me wonder how that is even compatible with other implementations?

So the options are either to remove it and simply force the caller to stay within the max message size or perform fragmentation on the bytes directly that we want to publish. My assumption would be that this only matters for publish messages anyway and this it doesn't really make sense to call it on any other message type. I'd love to learn more about what usecase this functionality serves though and how this is compatible with other libp2p implementations.

@thomaseizinger
Copy link
Contributor

I just had another look at the function and I think I know understand what is happening. We are not actually fragmenting the payload but instead, we are splitting the individual lists up into multiple RPCs.

This problem simply goes away if we change the interface to the handler by sending down each message individually. If we wanted to, we could batch messages together in the handler but I wouldn't bother. Protobufs have marginal overhead so it doesn't make a difference in terms of bandwidth whether you send 3 instances of proto::RPC or a single instance of proto::RPC that has a list of 3 items. To me, this seems like an unnecessarily complex wire format over simply sending each message individually. We can't change that now obviously but we can just not make use if it and avoid the complexity of fragmentation / batching.

@AgeManning
Copy link
Contributor

Yep the message fragmentation comes from the go implementation which was the "spec" at the time.

The idea was to group control messages into a single rpc as much as we can when possible. The issue then became that rpc message sizes are constrained and if we grouped too many messages beyond the size limit we would then have to fragment them again.

This was mainly implemented as it was in the go implementation. We can probably do the grouping when drawing from the queues in the handler. @jxs is figuring out the best way to handle these things and where best to put the logic.

@thomaseizinger
Copy link
Contributor

Anything that speaks against not doing any grouping of messages whatsoever? What is the difference between one RPC with 4 control messages vs 4 RPCs with 1 control message?

@AgeManning
Copy link
Contributor

Anything that speaks against not doing any grouping of messages whatsoever? What is the difference between one RPC with 4 control messages vs 4 RPCs with 1 control message?

Not that I know of. I think the idea was to avoid small overhead in messages, there's a varint for the size of the message and small amounts of protobuf padding.

i.e the idea was if we could easily group, then group them.

@mxinden
Copy link
Member

mxinden commented Nov 8, 2023

Anything that speaks against not doing any grouping of messages whatsoever? What is the difference between one RPC with 4 control messages vs 4 RPCs with 1 control message?

Not that I know of. I think the idea was to avoid small overhead in messages, there's a varint for the size of the message and small amounts of protobuf padding.

I consider grouping an optimization and thus only worth it if we have a calculation or benchmark proving its impact. Based on intuition I would deem the protobuf overhead negligible.

See similar calculation in https://github.com/libp2p/specs/blob/master/webrtc/README.md#faq.

@thomaseizinger
Copy link
Contributor

Let's optimise for memory usage first which seems to be an acute problem and throw out the unproven bandwidth optimisation if it makes that harder to implement.

The grouping can be reintroduced later although I somewhat doubt that we'll be able to measure an improvement. Ironically, having to allocate multiple Vecs each time we serialize or deserialize a message isn't the most efficient thing either.

@jxs jxs force-pushed the gossipsub-backpressure-1 branch 16 times, most recently from 838b14f to 1eb0d44 Compare November 10, 2023 00:41
rename Rpc to RpcIn and create RpcOut to allow sending different types of messages via such enum.
this will alow us separing into different priorities the types of messages.
@jxs jxs force-pushed the gossipsub-backpressure-1 branch from 1eb0d44 to f2b1f42 Compare November 11, 2023 13:23
Copy link
Member Author

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review Thomas! Addressed the changes in separated commits, it's easier if you follow them by order.

protocols/gossipsub/src/behaviour.rs Outdated Show resolved Hide resolved
protocols/gossipsub/src/behaviour.rs Outdated Show resolved Hide resolved
protocols/gossipsub/src/behaviour.rs Show resolved Hide resolved
protocols/gossipsub/src/behaviour.rs Outdated Show resolved Hide resolved
protocols/gossipsub/src/behaviour.rs Outdated Show resolved Hide resolved
protocols/gossipsub/src/behaviour.rs Outdated Show resolved Hide resolved
@jxs jxs force-pushed the gossipsub-backpressure-1 branch from f2b1f42 to f2c5edd Compare November 11, 2023 13:38
Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more suggestions, most of them pretty minor. Feel free to not address immediately / at all :)

Overall, I think this is already looking great and much easier to understand! I wouldn't be opposed to merge this as an initial refactoring and start another PR that explores how we can add a (prioritised) channel to the handler.

protocols/gossipsub/src/behaviour.rs Outdated Show resolved Hide resolved
protocols/gossipsub/src/behaviour/tests.rs Outdated Show resolved Hide resolved
protocols/gossipsub/src/protocol.rs Show resolved Hide resolved
protocols/gossipsub/src/behaviour.rs Outdated Show resolved Hide resolved
protocols/gossipsub/src/behaviour.rs Outdated Show resolved Hide resolved
protocols/gossipsub/src/behaviour.rs Show resolved Hide resolved
protocols/gossipsub/src/behaviour.rs Outdated Show resolved Hide resolved
protocols/gossipsub/src/behaviour.rs Outdated Show resolved Hide resolved
protocols/gossipsub/src/behaviour.rs Show resolved Hide resolved
protocols/gossipsub/src/behaviour.rs Show resolved Hide resolved
@thomaseizinger thomaseizinger changed the title feat(gossipsub): start implementing backpressure refactor(gossipsub): send more specific messages to ConnectionHandler Nov 12, 2023
@mxinden
Copy link
Member

mxinden commented Nov 12, 2023

Anything that speaks against not doing any grouping of messages whatsoever? What is the difference between one RPC with 4 control messages vs 4 RPCs with 1 control message?

Not that I know of. I think the idea was to avoid small overhead in messages, there's a varint for the size of the message and small amounts of protobuf padding.

I consider grouping an optimization and thus only worth it if we have a calculation or benchmark proving its impact. Based on intuition I would deem the protobuf overhead negligible.

See similar calculation in https://github.com/libp2p/specs/blob/master/webrtc/README.md#faq.

Another downside of large groups is head-of-line blocking. Say that one puts 10 messages into a single protobuf. Say that the resulting protobuf is sent as two IP packets. Say that the second IP packet is dropped along the path. The messages in the first IP packet will not be delivered to the remote's Gossipsub implementation until the second IP packet is resent.

@mxinden
Copy link
Member

mxinden commented Nov 12, 2023

Overall, I think this is already looking great and much easier to understand! I wouldn't be opposed to merge this as an initial refactoring and start another PR that explores how we can add a (prioritised) channel to the handler.

I would prefer only merging this pull request once we have a second pull request in place that makes use of this "feature".

@thomaseizinger
Copy link
Contributor

Overall, I think this is already looking great and much easier to understand! I wouldn't be opposed to merge this as an initial refactoring and start another PR that explores how we can add a (prioritised) channel to the handler.

I would prefer only merging this pull request once we have a second pull request in place that makes use of this "feature".

I would agree if we would be changing external interfaces here and thus merging it would be a committment to them.

But as it stands, this is just a refactoring and even if it turns out that we need something else, we can always revert it again and/or keep changing internals.

In other words, I don't see any risk in merging in. Happy to wait hold off though as long as we merge them in separate PRs and keep this as a refactoring. At the moment, it is a tiny behaviour change as we now send more individual messages. Changing the order of messages being sent is more invasive IMO so it would be great to be able to bisect between the two.

@jxs jxs requested a review from thomaseizinger November 13, 2023 00:53
Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff! Thanks for following up :)

Mind writing up what we discussed regarding a priority channel in the issue next?

protocols/gossipsub/src/types.rs Outdated Show resolved Hide resolved
protocols/gossipsub/src/types.rs Outdated Show resolved Hide resolved
protocols/gossipsub/src/behaviour.rs Show resolved Hide resolved
protocols/gossipsub/src/behaviour.rs Show resolved Hide resolved
@jxs jxs mentioned this pull request Nov 13, 2023
Copy link
Member Author

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Thomas! Wrote #4667 (comment) as a follow up

protocols/gossipsub/src/behaviour.rs Show resolved Hide resolved
protocols/gossipsub/src/behaviour.rs Show resolved Hide resolved
protocols/gossipsub/src/types.rs Show resolved Hide resolved
protocols/gossipsub/src/types.rs Outdated Show resolved Hide resolved
protocols/gossipsub/src/types.rs Outdated Show resolved Hide resolved
@jxs jxs force-pushed the gossipsub-backpressure-1 branch from 7ca7089 to b375b2f Compare November 14, 2023 11:21

This comment was marked as resolved.

@jxs jxs requested a review from thomaseizinger November 21, 2023 22:52
Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Happy to merge that as is :)

@mergify mergify bot merged commit bb2b798 into libp2p:master Nov 22, 2023
71 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants