Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize read_to_end. #46050

Merged
merged 1 commit into from
Nov 21, 2017
Merged

Optimize read_to_end. #46050

merged 1 commit into from
Nov 21, 2017

Conversation

sunfishcode
Copy link
Member

This patch makes read_to_end use Vec's memory-growth pattern rather
than using a custom pattern.

This has some interesting effects:

  • If memory is reserved up front, read_to_end can be faster, as it
    starts reading at the buffer size, rather than always starting at 32
    bytes. This speeds up file reading by 2x in one of my use cases.

  • It can reduce the number of syscalls when reading large files.
    Previously, read_to_end would settle into a sequence of 8192-byte
    reads. With this patch, the read size follows Vec's allocation
    pattern. For example, on a 16MiB file, it can do 21 read syscalls
    instead of 2057. In simple benchmarks of large files though, overall
    speed is still dominated by the actual I/O.

  • A downside is that Read implementations that don't implement
    initializer() may see increased memory zeroing overhead.

I benchmarked this on a variety of data sizes, with and without
preallocated buffers. Most benchmarks see no difference, but reading
a small/medium file with a pre-allocated buffer is faster.

This patch makes `read_to_end` use Vec's memory-growth pattern rather
than using a custom pattern.

This has some interesting effects:

 - If memory is reserved up front, `read_to_end` can be faster, as it
   starts reading at the buffer size, rather than always starting at 32
   bytes. This speeds up file reading by 2x in one of my use cases.

 - It can reduce the number of syscalls when reading large files.
   Previously, `read_to_end` would settle into a sequence of 8192-byte
   reads. With this patch, the read size follows Vec's allocation
   pattern. For example, on a 16MiB file, it can do 21 read syscalls
   instead of 2057. In simple benchmarks of large files though, overall
   speed is still dominated by the actual I/O.

 - A downside is that Read implementations that don't implement
   `initializer()` may see increased memory zeroing overhead.

I benchmarked this on a variety of data sizes, with and without
preallocated buffers. Most benchmarks see no difference, but reading
a small/medium file with a pre-allocated buffer is faster.
@sfackler
Copy link
Member

For example, on a 16MiB file, it can do 21 read syscalls
instead of 2057.

It can, or it does? This is going to depend on how large the kernel-side buffers are, right?

@sunfishcode
Copy link
Member Author

It can, or it does? This is going to depend on how large the kernel-side buffers are, right?

The number of syscalls doesn't depend on kernel buffer sizes, but it does depend on whether you're reading from a File or something else.

@sunfishcode
Copy link
Member Author

Actually to correct that; a syscall can return fewer bytes than requested, which could depend on kernel buffer sizes. So if you have an OS that has a limit of how many bytes it can read per syscall, then you'd be back to the number of syscalls being proportional to the file size.

@kennytm kennytm added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 17, 2017
Copy link
Contributor

@arthurprs arthurprs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks strictly better to me.

@sfackler
Copy link
Member

It will be \wasting more zeroing than previously for large reads with readers that require buffer initialization, but I'd guess that cost would be dwarfed by the overall read time.

LGTY @alexcrichton?

@alexcrichton
Copy link
Member

FWIW I think the original logic here came from #23820, right? It is interesting that the amount of extra zeroing here I think is much larger than before. For example if the vector has N byes and the file has N+1 bytes then we'll zero out N extra bytes, right? I'm sure we could manufacture a test case where N is big enough that it outweighs the cost of a syscall, but I think this is fine to be considered the "slow path" anyway though as the fastest reads won't be zeroing at all.

r+ from me

@sfackler
Copy link
Member

Yeah, this logic hasn't been touched other than conditionally zeroing since #23820.

@sfackler
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Nov 20, 2017

📌 Commit 6b1a3bc has been approved by sfackler

@kennytm kennytm added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 20, 2017
kennytm added a commit to kennytm/rust that referenced this pull request Nov 21, 2017
Optimize `read_to_end`.

This patch makes `read_to_end` use Vec's memory-growth pattern rather
than using a custom pattern.

This has some interesting effects:

 - If memory is reserved up front, `read_to_end` can be faster, as it
   starts reading at the buffer size, rather than always starting at 32
   bytes. This speeds up file reading by 2x in one of my use cases.

 - It can reduce the number of syscalls when reading large files.
   Previously, `read_to_end` would settle into a sequence of 8192-byte
   reads. With this patch, the read size follows Vec's allocation
   pattern. For example, on a 16MiB file, it can do 21 read syscalls
   instead of 2057. In simple benchmarks of large files though, overall
   speed is still dominated by the actual I/O.

 - A downside is that Read implementations that don't implement
   `initializer()` may see increased memory zeroing overhead.

I benchmarked this on a variety of data sizes, with and without
preallocated buffers. Most benchmarks see no difference, but reading
a small/medium file with a pre-allocated buffer is faster.
bors added a commit that referenced this pull request Nov 21, 2017
Rollup of 11 pull requests

- Successful merges: #45987, #46031, #46050, #46052, #46103, #46120, #46134, #46141, #46148, #46155, #46157
- Failed merges:
@bors bors merged commit 6b1a3bc into rust-lang:master Nov 21, 2017
@sunfishcode sunfishcode deleted the read_to_end branch November 21, 2017 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants