-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize read_to_end
.
#46050
Optimize read_to_end
.
#46050
Conversation
This patch makes `read_to_end` use Vec's memory-growth pattern rather than using a custom pattern. This has some interesting effects: - If memory is reserved up front, `read_to_end` can be faster, as it starts reading at the buffer size, rather than always starting at 32 bytes. This speeds up file reading by 2x in one of my use cases. - It can reduce the number of syscalls when reading large files. Previously, `read_to_end` would settle into a sequence of 8192-byte reads. With this patch, the read size follows Vec's allocation pattern. For example, on a 16MiB file, it can do 21 read syscalls instead of 2057. In simple benchmarks of large files though, overall speed is still dominated by the actual I/O. - A downside is that Read implementations that don't implement `initializer()` may see increased memory zeroing overhead. I benchmarked this on a variety of data sizes, with and without preallocated buffers. Most benchmarks see no difference, but reading a small/medium file with a pre-allocated buffer is faster.
It can, or it does? This is going to depend on how large the kernel-side buffers are, right? |
The number of syscalls doesn't depend on kernel buffer sizes, but it does depend on whether you're reading from a |
Actually to correct that; a syscall can return fewer bytes than requested, which could depend on kernel buffer sizes. So if you have an OS that has a limit of how many bytes it can read per syscall, then you'd be back to the number of syscalls being proportional to the file size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks strictly better to me.
It will be \wasting more zeroing than previously for large reads with readers that require buffer initialization, but I'd guess that cost would be dwarfed by the overall read time. LGTY @alexcrichton? |
FWIW I think the original logic here came from #23820, right? It is interesting that the amount of extra zeroing here I think is much larger than before. For example if the vector has N byes and the file has N+1 bytes then we'll zero out N extra bytes, right? I'm sure we could manufacture a test case where N is big enough that it outweighs the cost of a syscall, but I think this is fine to be considered the "slow path" anyway though as the fastest reads won't be zeroing at all. r+ from me |
Yeah, this logic hasn't been touched other than conditionally zeroing since #23820. |
@bors r+ |
📌 Commit 6b1a3bc has been approved by |
Optimize `read_to_end`. This patch makes `read_to_end` use Vec's memory-growth pattern rather than using a custom pattern. This has some interesting effects: - If memory is reserved up front, `read_to_end` can be faster, as it starts reading at the buffer size, rather than always starting at 32 bytes. This speeds up file reading by 2x in one of my use cases. - It can reduce the number of syscalls when reading large files. Previously, `read_to_end` would settle into a sequence of 8192-byte reads. With this patch, the read size follows Vec's allocation pattern. For example, on a 16MiB file, it can do 21 read syscalls instead of 2057. In simple benchmarks of large files though, overall speed is still dominated by the actual I/O. - A downside is that Read implementations that don't implement `initializer()` may see increased memory zeroing overhead. I benchmarked this on a variety of data sizes, with and without preallocated buffers. Most benchmarks see no difference, but reading a small/medium file with a pre-allocated buffer is faster.
This patch makes
read_to_end
use Vec's memory-growth pattern ratherthan using a custom pattern.
This has some interesting effects:
If memory is reserved up front,
read_to_end
can be faster, as itstarts reading at the buffer size, rather than always starting at 32
bytes. This speeds up file reading by 2x in one of my use cases.
It can reduce the number of syscalls when reading large files.
Previously,
read_to_end
would settle into a sequence of 8192-bytereads. With this patch, the read size follows Vec's allocation
pattern. For example, on a 16MiB file, it can do 21 read syscalls
instead of 2057. In simple benchmarks of large files though, overall
speed is still dominated by the actual I/O.
A downside is that Read implementations that don't implement
initializer()
may see increased memory zeroing overhead.I benchmarked this on a variety of data sizes, with and without
preallocated buffers. Most benchmarks see no difference, but reading
a small/medium file with a pre-allocated buffer is faster.