Optimize read buffer with capacity to reduce allocations #888

YaZasnyal · 2023-03-21T07:49:55Z

Problem: poor read performance for small messages

How read procedure works: read operation makes an attempt to get a message from the buffer. If there is insufficient data in the buffer, pull some from the socket.

It seems that the current implementation of the AsyncRead.read_buf fills available space in the buffer and resizes it only if len==cap. If the message is read from the buffer len becomes smaller and the buffer will attempt to reclaim free space but will have to allocate a new buffer and transfer leftover because the previous buffer is shared by split_to().freeze(). New buffer will have the capacity of the original buffer.

All of the above causes a lot of small reads from the socket and buffer reallocations. The situation gets a lot better if just a single big message is received that increases the capacity.

To avoid this problem we can initialize a buffer with some sane capacity so it can read multiple small messages at once. My tests show that it greatly increases performance (+60% on small buffers).

caspervonb · 2023-03-21T09:14:03Z

Sound find 🙌

Open questions: (cc @Jarema, @n1ghtmare):

Should we just set a "good" default magic value? e.g 64k
Should we fiddle with the kernel level read buffer? (e.g SO_RCVBUF)?

YaZasnyal · 2023-03-21T09:46:16Z

I checked BytesMut again and it seems that original_capacity is 64K max so even if I set absurdly big buffer it will shrink to this value after first reallocation. So setting it to some constant does not sound as bad now.

I can update MR by removing configuration parameter and leaving a comment about selected value referencing this discussion.

Authors of bytes crate use bit manipulation to store information about buffers.
https://github.com/tokio-rs/bytes/blob/master/src/bytes_mut.rs#L96

Jarema · 2023-03-29T12:21:36Z

@YaZasnyal can you please resolve conflicts? I'm still benchmarking and checking suff around this one.

YaZasnyal · 2023-03-29T20:53:51Z

@Jarema rebased to the latest main

caspervonb · 2023-05-15T09:44:27Z

Comparing main to this branch (on Linux).

group                                       after                                    before
-----                                       -----                                    ------
async-nats: publish messages amount/1024    1.00    123.8±7.72µs 788.5 KElem/sec     1.02    126.0±7.82µs 774.9 KElem/sec
async-nats: publish messages amount/32      1.00     50.0±1.69µs 1951.8 KElem/sec    1.07     53.8±1.40µs 1816.1 KElem/sec
async-nats: publish messages amount/8192    1.01   835.1±24.39µs 116.9 KElem/sec     1.00   828.9±43.29µs 117.8 KElem/sec
async-nats: publish throughput/1024         1.00    122.7±9.51µs   795.8 MB/sec      1.00    123.0±4.15µs   793.9 MB/sec
async-nats: publish throughput/32           1.00     51.8±2.71µs    59.0 MB/sec      1.02     52.8±1.05µs    57.8 MB/sec
async-nats: publish throughput/8192         1.00   831.2±24.75µs   939.9 MB/sec      1.00   832.9±48.00µs   938.0 MB/sec
subscribe amount/1024                       1.00   395.0±18.46µs 247.2 KElem/sec     1.58   626.1±45.08µs 156.0 KElem/sec
subscribe amount/32                         1.00    329.6±9.05µs 296.3 KElem/sec     1.43   470.6±22.45µs 207.5 KElem/sec
subscribe amount/8192                       1.00      2.2±0.08ms 45.0 KElem/sec      1.12      2.4±0.08ms 40.1 KElem/sec

caspervonb · 2023-05-15T12:41:11Z

On Mac however:

async-nats: publish throughput/32                                                                           
                        time:   [65.285 µs 65.523 µs 65.760 µs]
                        thrpt:  [46.408 MiB/s 46.575 MiB/s 46.746 MiB/s]
                 change:
                        time:   [+2.5614% +3.1077% +3.6638%] (p = 0.00 < 0.05)
                        thrpt:  [-3.5343% -3.0141% -2.4974%]
                        Performance has regressed.

caspervonb · 2023-05-15T14:30:49Z

So benchmarks are bit flaky on macOS, sometimes getting send errors, which panics and invalidates the run where-as they're consistently better on Linux.

Did also fiddle a bit with SO_RCVBUF which leads to a high throughput bump.

async-nats: publish messages amount/1024    1.29    108.0±5.28µs 904.4 KElem/sec     1.00     83.8±0.52µs 1165.0 KElem/sec
async-nats: publish messages amount/32      1.00     61.6±7.10µs 1586.4 KElem/sec    1.05     64.8±0.42µs 1506.9 KElem/sec
async-nats: publish messages amount/8192    1.93   660.7±26.13µs 147.8 KElem/sec     1.00    341.8±3.02µs 285.7 KElem/sec
async-nats: publish throughput/1024         1.44   119.8±28.13µs   814.8 MB/sec      1.00     83.5±0.48µs  1169.8 MB/sec
async-nats: publish throughput/32           1.00     59.6±0.36µs    51.2 MB/sec      1.06     63.2±0.71µs    48.3 MB/sec
async-nats: publish throughput/8192         2.90  987.0±309.18µs   791.6 MB/sec      1.00    340.5±1.97µs     2.2 GB/sec
subscribe amount/1024                       2.03  703.7±199.21µs 138.8 KElem/sec     1.00    347.1±5.79µs 281.4 KElem/sec
subscribe amount/32                         1.77  524.2±246.63µs 186.3 KElem/sec     1.00    295.7±2.63µs 330.3 KElem/sec
subscribe amount/8192                       1.77      2.2±0.13ms 45.2 KElem/sec      1.00  1216.1±86.24µs 80.3 KElem/sec

caspervonb

Been benchmarking this quite a bit, different variants including setting the recv_buffer_size which can be a follow-up.

lgtm, just one quick round of name bikeshedding 😄

caspervonb · 2023-05-16T10:44:42Z

async-nats/src/connector.rs

@@ -57,6 +57,7 @@ pub(crate) struct ConnectorOptions {
    pub(crate) name: Option<String>,
    pub(crate) ignore_discovered_servers: bool,
    pub(crate) retain_servers_order: bool,
+    pub(crate) receive_buffer_capacity: usize,


Lets call it read_buffer_capacity, to avoid conflicting with the recv_buffer of the socket.

Suggested change

pub(crate) receive_buffer_capacity: usize,

pub(crate) read_buffer_capacity: usize,

The main thing I dislike in the current solution is that buffer is going to shrink to 65k anyway. Maybe we should alse change usize to u16?

u16 would map nicely, sgtm.

caspervonb

lgtm.

caspervonb self-requested a review March 21, 2023 08:21

YaZasnyal force-pushed the init_buffer branch 2 times, most recently from 3164f91 to 9545674 Compare March 29, 2023 20:48

YaZasnyal added 4 commits April 5, 2023 21:03

Init buffer with capacity to reduce allocations

f06e8a9

Add method to set receive_buffer_capacity

bdeae8c

Format document

554e17a

Fix format

b7ffb99

Jarema force-pushed the init_buffer branch from 605a286 to b7ffb99 Compare April 5, 2023 19:03

caspervonb self-assigned this May 15, 2023

caspervonb requested changes May 16, 2023

View reviewed changes

caspervonb changed the title ~~Init buffer with capacity to reduce allocations~~ Initialize read buffer with capacity to reduce allocations May 16, 2023

YaZasnyal added 2 commits May 16, 2023 20:14

Renamed option

4bd5946

Format doctest

4667ce9

caspervonb changed the title ~~Initialize read buffer with capacity to reduce allocations~~ Optimize read buffer with capacity to reduce allocations May 17, 2023

caspervonb and others added 2 commits May 16, 2023 22:38

Update doc comment

bdae7a2

Merge branch 'main' into init_buffer

86f962b

caspervonb self-requested a review May 17, 2023 02:38

caspervonb approved these changes May 17, 2023

View reviewed changes

caspervonb merged commit 3c71104 into nats-io:main May 17, 2023

Jarema mentioned this pull request Jul 13, 2023

Release async-nats/v0.30.0 #1028

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize read buffer with capacity to reduce allocations #888

Optimize read buffer with capacity to reduce allocations #888

YaZasnyal commented Mar 21, 2023

caspervonb commented Mar 21, 2023

YaZasnyal commented Mar 21, 2023 •

edited

Loading

Jarema commented Mar 29, 2023

YaZasnyal commented Mar 29, 2023

caspervonb commented May 15, 2023

caspervonb commented May 15, 2023

caspervonb commented May 15, 2023

caspervonb left a comment

caspervonb May 16, 2023

YaZasnyal May 16, 2023

caspervonb May 16, 2023

YaZasnyal May 16, 2023

caspervonb left a comment

	pub(crate) receive_buffer_capacity: usize,
	pub(crate) read_buffer_capacity: usize,

Optimize read buffer with capacity to reduce allocations #888

Optimize read buffer with capacity to reduce allocations #888

Conversation

YaZasnyal commented Mar 21, 2023

caspervonb commented Mar 21, 2023

YaZasnyal commented Mar 21, 2023 • edited Loading

Jarema commented Mar 29, 2023

YaZasnyal commented Mar 29, 2023

caspervonb commented May 15, 2023

caspervonb commented May 15, 2023

caspervonb commented May 15, 2023

caspervonb left a comment

Choose a reason for hiding this comment

caspervonb May 16, 2023

Choose a reason for hiding this comment

YaZasnyal May 16, 2023

Choose a reason for hiding this comment

caspervonb May 16, 2023

Choose a reason for hiding this comment

YaZasnyal May 16, 2023

Choose a reason for hiding this comment

caspervonb left a comment

Choose a reason for hiding this comment

YaZasnyal commented Mar 21, 2023 •

edited

Loading