Reducing allocations for transfer-heavy protocols by using borrowed data in protobuf files #4781

thomaseizinger · 2023-11-02T00:35:12Z

Description

This is inspired by PRs from @joshuef (#4751, #4753, #4754).

The idea is to use Bytes to avoid potentially costly clones of large chunks of memory as we are processing the messages. For gossipsub, a single message might be sent to multiple clients. For kademlia, we might serve the same record multiple times from the record store. In both cases, there shouldn't be a need for any additional allocations.

The main problem though is that the generated protobuf structs use Vec<u8> and thus force an allocation onto us every time we want to send one of these. https://github.com/tafia/quick-protobuf does have an option to generate Cow instead of Vec<u8> and thus allow encoding of borrowed data. In the past, we weren't able to use this because asynchronous-codec couldn't encode borrowed data. But it can now! See mxinden/asynchronous-codec#9.

As a result, I think it is worth experimenting whether or not we can now use the generated protobuf structs.

To actually make use of this, we need to:

Re-run pb-rs with the correct options to use Cow instead of Vec:
```
pb-rs **/generated/*.proto`
```
Work out whether or not we can adapt quick-protobuf-codec to work with the new protobuf files
Update the CI check to not fail on the now differing files
Actually make use of this in (all?) protocols to avoid allocations where possible

Motivation

Less memory allocations.

Current Implementation

We allocate each message again just to serialize it to the wire.

Are you planning to do it yourself in a pull request ?

No

The text was updated successfully, but these errors were encountered:

thomaseizinger · 2023-11-02T02:51:56Z

Re-run pb-rs with the correct options to use Cow instead of Vec:

I started with this an it yields a lot of compile errors (understandably). Those will all need to be fixed. Perhaps in a first iteration by just using Cow::Owned everywhere as that is essentially what we are currently doing by using Vec<u8>.

The lifetime would have to be 'static in many places.

To actually make use of borrowed data in gossipsub for example, we need to defer encoding to the protobuf format for as long as possible. Essentially, we need to carry some data type that uses Bytes all the way down to the encoding step. Then we can, in a single function, construct e.g. a proto::RPC with a borrow using a local lifetime to the Bytes instance and pass it to the Encoder. For gossipsub, this would be e.g. here:

rust-libp2p/protocols/gossipsub/src/handler.rs

Line 332 in 96b4c4a

match Sink::start_send(Pin::new(&mut substream), message) {

@joshuef Are you interested in working on this?

thomaseizinger · 2023-11-02T02:54:32Z

To actually make use of borrowed data in gossipsub for example, we need to defer encoding to the protobuf format for as long as possible.

To achieve this, we'd need to do something similar as @mxinden described here. I.e. we need to extend HandlerIn to send different types into the handler. However, contrary to @mxinden's example, these cannot be instances of proto::RPC as those will have a lifetime.

joshuef · 2023-11-02T08:35:09Z

@joshuef Are you interested in working on this?

Let me check out the scope of this and see how deep things go. If I have time to get in amongst it in the next week I'll definitely do that, or may be able to get someone from the team to have a look if no. I'll report back here when it's a bit clearer for me. 👍

joshuef · 2023-11-02T16:31:17Z

I did start looking at this, but won't have time to dig in until next week sometime, I think.

thomaseizinger · 2023-11-02T20:55:03Z

No worries, it is an internal change so we can ship it at any point, doesn't need to go into the current release :)

joshuef · 2023-11-13T11:05:49Z

So I've been digging about further on this this morning, but I am blocked by the Keypairs essentially. Right now we zeroize the inbound bytes when we form the SKs. But with the Cow<u8> setup we don't necessarily have ownership of this to perform this...

Error[E0596]: cannot borrow data in dereference of `Cow<'_, [u8]>` as mutable --> identity/src/keypair.rs:294:59 | 294 | return rsa::Keypair::try_decode_pkcs1(&mut private_key.Data).map(|sk| { | ^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Cow<'_, [u8]>

I've been trying out some dereffing, borrows, passing vecs, but it all comes down to that same issue that I don't think we can guarantee that we properly `Zeroize` the bytes therin...

I'm not really sure if there's a way around this? Suggestions very welcome

(My WIP branch: https://github.com/joshuef/rust-libp2p/tree/GossipBytesPbCow)

thomaseizinger · 2023-11-14T02:11:34Z

If the data is borrowed, we don't need to zeroize it, I'd say? Whoever owns the data should zeroize it when it gets dropped :)

Meaning, we can just match on the Cow and do nothing if it is Cow::Borrowed?

joshuef · 2023-11-14T09:06:09Z

Okay, I'll have a look at implementing in that fashion then, thanks @thomaseizinger ! 🙇

We can reduce the number of allocations during the encoding of messages by not depending on the `Encoder` implementation of `unsigned-varint` but doing the encoding ourselves. This allows us to directly plug the `BytesMut` into the `Writer` and directly encode into the target buffer. Previously, we were allocating each message as an additional buffer that got immediately thrown-away again. Related: #4781. Pull-Request: #4782.

thomaseizinger mentioned this issue Nov 2, 2023

feat(quick-protobuf-codec): reduce allocations during encoding #4782

Merged

4 tasks

thomaseizinger mentioned this issue Nov 2, 2023

Backpressure in Gossipsub #4667

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing allocations for transfer-heavy protocols by using borrowed data in protobuf files #4781

Reducing allocations for transfer-heavy protocols by using borrowed data in protobuf files #4781

thomaseizinger commented Nov 2, 2023 •

edited

Loading

thomaseizinger commented Nov 2, 2023

thomaseizinger commented Nov 2, 2023

joshuef commented Nov 2, 2023

joshuef commented Nov 2, 2023

thomaseizinger commented Nov 2, 2023

joshuef commented Nov 13, 2023 •

edited

Loading

thomaseizinger commented Nov 14, 2023 •

edited

Loading

joshuef commented Nov 14, 2023

Reducing allocations for transfer-heavy protocols by using borrowed data in protobuf files #4781

Reducing allocations for transfer-heavy protocols by using borrowed data in protobuf files #4781

Comments

thomaseizinger commented Nov 2, 2023 • edited Loading

Description

Motivation

Current Implementation

Are you planning to do it yourself in a pull request ?

thomaseizinger commented Nov 2, 2023

thomaseizinger commented Nov 2, 2023

joshuef commented Nov 2, 2023

joshuef commented Nov 2, 2023

thomaseizinger commented Nov 2, 2023

joshuef commented Nov 13, 2023 • edited Loading

thomaseizinger commented Nov 14, 2023 • edited Loading

joshuef commented Nov 14, 2023

thomaseizinger commented Nov 2, 2023 •

edited

Loading

joshuef commented Nov 13, 2023 •

edited

Loading

thomaseizinger commented Nov 14, 2023 •

edited

Loading