Optimize `read_to_end`. #46050

sunfishcode · 2017-11-17T00:16:03Z

This patch makes read_to_end use Vec's memory-growth pattern rather
than using a custom pattern.

This has some interesting effects:

If memory is reserved up front, read_to_end can be faster, as it
starts reading at the buffer size, rather than always starting at 32
bytes. This speeds up file reading by 2x in one of my use cases.
It can reduce the number of syscalls when reading large files.
Previously, read_to_end would settle into a sequence of 8192-byte
reads. With this patch, the read size follows Vec's allocation
pattern. For example, on a 16MiB file, it can do 21 read syscalls
instead of 2057. In simple benchmarks of large files though, overall
speed is still dominated by the actual I/O.
A downside is that Read implementations that don't implement
initializer() may see increased memory zeroing overhead.

I benchmarked this on a variety of data sizes, with and without
preallocated buffers. Most benchmarks see no difference, but reading
a small/medium file with a pre-allocated buffer is faster.

This patch makes `read_to_end` use Vec's memory-growth pattern rather than using a custom pattern. This has some interesting effects: - If memory is reserved up front, `read_to_end` can be faster, as it starts reading at the buffer size, rather than always starting at 32 bytes. This speeds up file reading by 2x in one of my use cases. - It can reduce the number of syscalls when reading large files. Previously, `read_to_end` would settle into a sequence of 8192-byte reads. With this patch, the read size follows Vec's allocation pattern. For example, on a 16MiB file, it can do 21 read syscalls instead of 2057. In simple benchmarks of large files though, overall speed is still dominated by the actual I/O. - A downside is that Read implementations that don't implement `initializer()` may see increased memory zeroing overhead. I benchmarked this on a variety of data sizes, with and without preallocated buffers. Most benchmarks see no difference, but reading a small/medium file with a pre-allocated buffer is faster.

sfackler · 2017-11-17T01:34:03Z

For example, on a 16MiB file, it can do 21 read syscalls
instead of 2057.

It can, or it does? This is going to depend on how large the kernel-side buffers are, right?

sunfishcode · 2017-11-17T01:38:32Z

It can, or it does? This is going to depend on how large the kernel-side buffers are, right?

The number of syscalls doesn't depend on kernel buffer sizes, but it does depend on whether you're reading from a File or something else.

sunfishcode · 2017-11-17T01:46:00Z

Actually to correct that; a syscall can return fewer bytes than requested, which could depend on kernel buffer sizes. So if you have an OS that has a limit of how many bytes it can read per syscall, then you'd be back to the number of syscalls being proportional to the file size.

arthurprs

Looks strictly better to me.

sfackler · 2017-11-18T00:50:43Z

It will be \wasting more zeroing than previously for large reads with readers that require buffer initialization, but I'd guess that cost would be dwarfed by the overall read time.

LGTY @alexcrichton?

alexcrichton · 2017-11-18T08:11:35Z

FWIW I think the original logic here came from #23820, right? It is interesting that the amount of extra zeroing here I think is much larger than before. For example if the vector has N byes and the file has N+1 bytes then we'll zero out N extra bytes, right? I'm sure we could manufacture a test case where N is big enough that it outweighs the cost of a syscall, but I think this is fine to be considered the "slow path" anyway though as the fastest reads won't be zeroing at all.

r+ from me

sfackler · 2017-11-18T23:05:08Z

Yeah, this logic hasn't been touched other than conditionally zeroing since #23820.

sfackler · 2017-11-20T05:51:17Z

@bors r+

bors · 2017-11-20T05:51:18Z

📌 Commit 6b1a3bc has been approved by sfackler

Optimize `read_to_end`. This patch makes `read_to_end` use Vec's memory-growth pattern rather than using a custom pattern. This has some interesting effects: - If memory is reserved up front, `read_to_end` can be faster, as it starts reading at the buffer size, rather than always starting at 32 bytes. This speeds up file reading by 2x in one of my use cases. - It can reduce the number of syscalls when reading large files. Previously, `read_to_end` would settle into a sequence of 8192-byte reads. With this patch, the read size follows Vec's allocation pattern. For example, on a 16MiB file, it can do 21 read syscalls instead of 2057. In simple benchmarks of large files though, overall speed is still dominated by the actual I/O. - A downside is that Read implementations that don't implement `initializer()` may see increased memory zeroing overhead. I benchmarked this on a variety of data sizes, with and without preallocated buffers. Most benchmarks see no difference, but reading a small/medium file with a pre-allocated buffer is faster.

Rollup of 11 pull requests - Successful merges: #45987, #46031, #46050, #46052, #46103, #46120, #46134, #46141, #46148, #46155, #46157 - Failed merges:

kennytm added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 17, 2017

arthurprs approved these changes Nov 17, 2017

View reviewed changes

sfackler mentioned this pull request Nov 20, 2017

Add Read::size_hint and pre-allocate in read_to_end #45928

Closed

kennytm added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 20, 2017

kennytm mentioned this pull request Nov 21, 2017

Rollup of 11 pull requests #46153

Closed

kennytm mentioned this pull request Nov 21, 2017

Rollup of 11 pull requests #46166

Merged

bors added a commit that referenced this pull request Nov 21, 2017

Auto merge of #46166 - kennytm:rollup, r=kennytm

63739ab

Rollup of 11 pull requests - Successful merges: #45987, #46031, #46050, #46052, #46103, #46120, #46134, #46141, #46148, #46155, #46157 - Failed merges:

bors merged commit 6b1a3bc into rust-lang:master Nov 21, 2017

sunfishcode deleted the read_to_end branch November 21, 2017 20:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `read_to_end`. #46050

Optimize `read_to_end`. #46050

sunfishcode commented Nov 17, 2017

sfackler commented Nov 17, 2017

sunfishcode commented Nov 17, 2017

sunfishcode commented Nov 17, 2017

arthurprs left a comment

sfackler commented Nov 18, 2017

alexcrichton commented Nov 18, 2017

sfackler commented Nov 18, 2017

sfackler commented Nov 20, 2017

bors commented Nov 20, 2017

Optimize read_to_end. #46050

Optimize read_to_end. #46050

Conversation

sunfishcode commented Nov 17, 2017

sfackler commented Nov 17, 2017

sunfishcode commented Nov 17, 2017

sunfishcode commented Nov 17, 2017

arthurprs left a comment

Choose a reason for hiding this comment

sfackler commented Nov 18, 2017

alexcrichton commented Nov 18, 2017

sfackler commented Nov 18, 2017

sfackler commented Nov 20, 2017

bors commented Nov 20, 2017

Optimize `read_to_end`. #46050

Optimize `read_to_end`. #46050