-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add size hint for byte iterator over file #81044
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @dtolnay (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
The job Click to see the possible cause of the failure (guessed by this bot)
|
The job Click to see the possible cause of the failure (guessed by this bot)
|
The job Click to see the possible cause of the failure (guessed by this bot)
|
The job Click to see the possible cause of the failure (guessed by this bot)
|
The job Click to see the possible cause of the failure (guessed by this bot)
|
This suffers from TOCTOU as files can be changed between any call to metadata and reads, something that generally can't happen to other iterators due to ownership. Additionally it is inefficient as it would perform a syscall every time If you're trying to deal with performance issues of a |
Do you think it would be worthwhile to implement it for
I would argue that the estimation provided by the current implementation is still correct. Also, regarding:
In my (admittedly limited) experience, is usually used ahead of iteration to reserve space for something. It is not a function that I would expect to be called repeatedly. For good measure, I could add a comment in the docs to mention that repeatedly calling it would be slow? |
For a
I think the standard library should do better than providing an implementation that qualifies as "buggy" and "violation of the trait's protocol" under its own definitions.
A comment wouldn't help because generic code doesn't look at comments, it just consumes iterators and will apply the same logic to all iterators. Here's an example where size_hint() is called inside a loop. rust/library/alloc/src/vec/mod.rs Lines 2157 to 2168 in 18ec4a9
|
match self.inner.metadata() { | ||
Ok(metadata) => { | ||
let file_length = metadata.len() as usize; | ||
(0, Some(file_length)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation will never actually help, because in practice only the .0
of the size_hint()
is ever used: https://internals.rust-lang.org/t/is-size-hint-1-ever-used/8187?u=scottmcm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a shame. Then this implementation would basically just add system call overhead while providing very little benefit :(
This one sounds potentially interesting. It'd at least help the first reallocation of a vector when collecting, and should be pretty cheap. |
I agree that this is much better. Should I change the implementation here or open a new PR with the change (since it is different)? |
I think a new PR makes sense so others don't have to read the unrelated discussion. |
For this PR (and the new one I will make), would I mark the code as |
Trait implementations are insta-stable. |
Thanks for the guidance! |
…ramertj Improved IO Bytes Size Hint After trying to implement better `size_hint()` return values for `File` in [this PR](rust-lang#81044) and changing to implementing it for `BufReader` in [this PR](rust-lang#81052), I have arrived at this implementation that provides tighter bounds for the `Bytes` iterator of various readers including `BufReader`, `Empty`, and `Chain`. Unfortunately, for `BufReader`, the size_hint only improves after calling `fill_buffer` due to it using the contents of the buffer for the hint. Nevertheless, the the tighter bounds should result in better pre-allocation of space to handle the contents of the `Bytes` iterator. Closes rust-lang#81052
…ramertj Improved IO Bytes Size Hint After trying to implement better `size_hint()` return values for `File` in [this PR](rust-lang#81044) and changing to implementing it for `BufReader` in [this PR](rust-lang#81052), I have arrived at this implementation that provides tighter bounds for the `Bytes` iterator of various readers including `BufReader`, `Empty`, and `Chain`. Unfortunately, for `BufReader`, the size_hint only improves after calling `fill_buffer` due to it using the contents of the buffer for the hint. Nevertheless, the the tighter bounds should result in better pre-allocation of space to handle the contents of the `Bytes` iterator. Closes rust-lang#81052
…ramertj Improved IO Bytes Size Hint After trying to implement better `size_hint()` return values for `File` in [this PR](rust-lang#81044) and changing to implementing it for `BufReader` in [this PR](rust-lang#81052), I have arrived at this implementation that provides tighter bounds for the `Bytes` iterator of various readers including `BufReader`, `Empty`, and `Chain`. Unfortunately, for `BufReader`, the size_hint only improves after calling `fill_buffer` due to it using the contents of the buffer for the hint. Nevertheless, the the tighter bounds should result in better pre-allocation of space to handle the contents of the `Bytes` iterator. Closes rust-lang#81052
…ramertj Improved IO Bytes Size Hint After trying to implement better `size_hint()` return values for `File` in [this PR](rust-lang#81044) and changing to implementing it for `BufReader` in [this PR](rust-lang#81052), I have arrived at this implementation that provides tighter bounds for the `Bytes` iterator of various readers including `BufReader`, `Empty`, and `Chain`. Unfortunately, for `BufReader`, the size_hint only improves after calling `fill_buffer` due to it using the contents of the buffer for the hint. Nevertheless, the the tighter bounds should result in better pre-allocation of space to handle the contents of the `Bytes` iterator. Closes rust-lang#81052
…ramertj Improved IO Bytes Size Hint After trying to implement better `size_hint()` return values for `File` in [this PR](rust-lang#81044) and changing to implementing it for `BufReader` in [this PR](rust-lang#81052), I have arrived at this implementation that provides tighter bounds for the `Bytes` iterator of various readers including `BufReader`, `Empty`, and `Chain`. Unfortunately, for `BufReader`, the size_hint only improves after calling `fill_buffer` due to it using the contents of the buffer for the hint. Nevertheless, the the tighter bounds should result in better pre-allocation of space to handle the contents of the `Bytes` iterator. Closes rust-lang#81052
…ramertj Improved IO Bytes Size Hint After trying to implement better `size_hint()` return values for `File` in [this PR](rust-lang#81044) and changing to implementing it for `BufReader` in [this PR](rust-lang#81052), I have arrived at this implementation that provides tighter bounds for the `Bytes` iterator of various readers including `BufReader`, `Empty`, and `Chain`. Unfortunately, for `BufReader`, the size_hint only improves after calling `fill_buffer` due to it using the contents of the buffer for the hint. Nevertheless, the the tighter bounds should result in better pre-allocation of space to handle the contents of the `Bytes` iterator. Closes rust-lang#81052
I noticed that the
Bytes
iterator over the bytes of aFile
returned the default value when callingsize_hint
. Since files are one of the cases where the number of items to iterate over is fixed, I felt that it made sense to take advantage of the available information.Now, the
Bytes
Iterator
returned by callingbytes()
on aFile
will provide more accurate bounds with its size_hint.