Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize read buffer with capacity to reduce allocations #888

Merged
merged 8 commits into from
May 17, 2023

Conversation

YaZasnyal
Copy link
Contributor

Problem: poor read performance for small messages

How read procedure works: read operation makes an attempt to get a message from the buffer. If there is insufficient data in the buffer, pull some from the socket.

It seems that the current implementation of the AsyncRead.read_buf fills available space in the buffer and resizes it only if len==cap. If the message is read from the buffer len becomes smaller and the buffer will attempt to reclaim free space but will have to allocate a new buffer and transfer leftover because the previous buffer is shared by split_to().freeze(). New buffer will have the capacity of the original buffer.

All of the above causes a lot of small reads from the socket and buffer reallocations. The situation gets a lot better if just a single big message is received that increases the capacity.

To avoid this problem we can initialize a buffer with some sane capacity so it can read multiple small messages at once. My tests show that it greatly increases performance (+60% on small buffers).

@caspervonb caspervonb self-requested a review March 21, 2023 08:21
@caspervonb
Copy link
Collaborator

Sound find 🙌

Open questions: (cc @Jarema, @n1ghtmare):

  • Should we just set a "good" default magic value? e.g 64k
  • Should we fiddle with the kernel level read buffer? (e.g SO_RCVBUF)?

@YaZasnyal
Copy link
Contributor Author

YaZasnyal commented Mar 21, 2023

I checked BytesMut again and it seems that original_capacity is 64K max so even if I set absurdly big buffer it will shrink to this value after first reallocation. So setting it to some constant does not sound as bad now.

I can update MR by removing configuration parameter and leaving a comment about selected value referencing this discussion.

Authors of bytes crate use bit manipulation to store information about buffers.
https://github.com/tokio-rs/bytes/blob/master/src/bytes_mut.rs#L96

@Jarema
Copy link
Member

Jarema commented Mar 29, 2023

@YaZasnyal can you please resolve conflicts? I'm still benchmarking and checking suff around this one.

@YaZasnyal YaZasnyal force-pushed the init_buffer branch 2 times, most recently from 3164f91 to 9545674 Compare March 29, 2023 20:48
@YaZasnyal
Copy link
Contributor Author

@Jarema rebased to the latest main

@caspervonb caspervonb self-assigned this May 15, 2023
@caspervonb
Copy link
Collaborator

Comparing main to this branch (on Linux).

group                                       after                                    before
-----                                       -----                                    ------
async-nats: publish messages amount/1024    1.00    123.8±7.72µs 788.5 KElem/sec     1.02    126.0±7.82µs 774.9 KElem/sec
async-nats: publish messages amount/32      1.00     50.0±1.69µs 1951.8 KElem/sec    1.07     53.8±1.40µs 1816.1 KElem/sec
async-nats: publish messages amount/8192    1.01   835.1±24.39µs 116.9 KElem/sec     1.00   828.9±43.29µs 117.8 KElem/sec
async-nats: publish throughput/1024         1.00    122.7±9.51µs   795.8 MB/sec      1.00    123.0±4.15µs   793.9 MB/sec
async-nats: publish throughput/32           1.00     51.8±2.71µs    59.0 MB/sec      1.02     52.8±1.05µs    57.8 MB/sec
async-nats: publish throughput/8192         1.00   831.2±24.75µs   939.9 MB/sec      1.00   832.9±48.00µs   938.0 MB/sec
subscribe amount/1024                       1.00   395.0±18.46µs 247.2 KElem/sec     1.58   626.1±45.08µs 156.0 KElem/sec
subscribe amount/32                         1.00    329.6±9.05µs 296.3 KElem/sec     1.43   470.6±22.45µs 207.5 KElem/sec
subscribe amount/8192                       1.00      2.2±0.08ms 45.0 KElem/sec      1.12      2.4±0.08ms 40.1 KElem/sec

@caspervonb
Copy link
Collaborator

On Mac however:

async-nats: publish throughput/32                                                                           
                        time:   [65.285 µs 65.523 µs 65.760 µs]
                        thrpt:  [46.408 MiB/s 46.575 MiB/s 46.746 MiB/s]
                 change:
                        time:   [+2.5614% +3.1077% +3.6638%] (p = 0.00 < 0.05)
                        thrpt:  [-3.5343% -3.0141% -2.4974%]
                        Performance has regressed.

@caspervonb
Copy link
Collaborator

So benchmarks are bit flaky on macOS, sometimes getting send errors, which panics and invalidates the run where-as they're consistently better on Linux.

Did also fiddle a bit with SO_RCVBUF which leads to a high throughput bump.

async-nats: publish messages amount/1024    1.29    108.0±5.28µs 904.4 KElem/sec     1.00     83.8±0.52µs 1165.0 KElem/sec
async-nats: publish messages amount/32      1.00     61.6±7.10µs 1586.4 KElem/sec    1.05     64.8±0.42µs 1506.9 KElem/sec
async-nats: publish messages amount/8192    1.93   660.7±26.13µs 147.8 KElem/sec     1.00    341.8±3.02µs 285.7 KElem/sec
async-nats: publish throughput/1024         1.44   119.8±28.13µs   814.8 MB/sec      1.00     83.5±0.48µs  1169.8 MB/sec
async-nats: publish throughput/32           1.00     59.6±0.36µs    51.2 MB/sec      1.06     63.2±0.71µs    48.3 MB/sec
async-nats: publish throughput/8192         2.90  987.0±309.18µs   791.6 MB/sec      1.00    340.5±1.97µs     2.2 GB/sec
subscribe amount/1024                       2.03  703.7±199.21µs 138.8 KElem/sec     1.00    347.1±5.79µs 281.4 KElem/sec
subscribe amount/32                         1.77  524.2±246.63µs 186.3 KElem/sec     1.00    295.7±2.63µs 330.3 KElem/sec
subscribe amount/8192                       1.77      2.2±0.13ms 45.2 KElem/sec      1.00  1216.1±86.24µs 80.3 KElem/sec

Copy link
Collaborator

@caspervonb caspervonb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been benchmarking this quite a bit, different variants including setting the recv_buffer_size which can be a follow-up.

lgtm, just one quick round of name bikeshedding 😄

@@ -57,6 +57,7 @@ pub(crate) struct ConnectorOptions {
pub(crate) name: Option<String>,
pub(crate) ignore_discovered_servers: bool,
pub(crate) retain_servers_order: bool,
pub(crate) receive_buffer_capacity: usize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets call it read_buffer_capacity, to avoid conflicting with the recv_buffer of the socket.

Suggested change
pub(crate) receive_buffer_capacity: usize,
pub(crate) read_buffer_capacity: usize,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing I dislike in the current solution is that buffer is going to shrink to 65k anyway. Maybe we should alse change usize to u16?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u16 would map nicely, sgtm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@caspervonb caspervonb changed the title Init buffer with capacity to reduce allocations Initialize read buffer with capacity to reduce allocations May 16, 2023
@caspervonb caspervonb changed the title Initialize read buffer with capacity to reduce allocations Optimize read buffer with capacity to reduce allocations May 17, 2023
@caspervonb caspervonb self-requested a review May 17, 2023 02:38
Copy link
Collaborator

@caspervonb caspervonb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@caspervonb caspervonb merged commit 3c71104 into nats-io:main May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants