-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new test case for possible bug in BufReader #58913
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @joshtriplett (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
This comment has been minimized.
This comment has been minimized.
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test case makes sense; if the buffer still exists, seeking to a location within it seems reasonable.
I don't think it makes sense to throw away the buffer until it gets overwritten. But once it does get overwritten, it needs invalidating.
Do you have a test case in which you can read data from the buffer that you shouldn't be able to? (Or am I misunderstanding what you're trying to demonstrate here?)
@joshtriplett Yes, I believe the test case that I presented already shows that it is possible to access data from the buffer when we shouldn't:
|
By the way, sorry for the mess. I'd like to squash those mistaken commits and force push them to keep it organized, but I don't have good internet access right now. I didn't even clone the Rust repository, I edited the file and commited on GitHub itself. |
On Mon, Mar 04, 2019 at 03:14:57PM -0800, André Vicente Milack wrote:
@joshtriplett Yes, I believe the test case that I presented already shows that it is possible to access data from the buffer when we shouldn't:
1. `[5, 6, 7]` is read and stored in the buffer;
1. `[0, 1, 2, 3, 4]` is read bypassing the buffer. The buffer should now be invalidated but it's not, the chunk `[5, 6, 7]` remains there. At this point, we're at the end of the stream.
1. We call seek(-2). The buffer is 3 bytes long, so BufReader understands that we're still in its range, and that it doesn't need to fetch data from the inner reader.
1. We read again, but instead of getting `[3, 4]` the test fails and we get `[6, 7]`, which was incorrectly stored in the buffer.
Ah, I see! And does the test pass if you uncomment the commented lines?
If so, please do go ahead and uncomment it and document the reason for
it, and I'd be happy to review that.
|
Yes, the test passes when we uncomment those lines. I'll do it as soon as possible. |
There are two moments when a BufRead object needs to empty it's internal buffer: - In a seek call; - In a read call when all data in the internal buffer had been already consumed and the output buffer has a greater or equal size than the internal buffer. In both cases, the buffer was not being properly emptied, but only marked as consumed (self.pos = self.cap). That should be no problem if the inner reader is only Read, but if it is Seek as well, then it's possible to access the data in the buffer by using the seek_relative method. In order to prevent this from happening, both self.pos and self.cap should be set to 0. Two test cases were added to detect that failure: - test_buffered_reader_invalidated_after_read - test_buffered_reader_invalidated_after_seek Both tests are very similar to each other. The inner reader contains the following data: [5, 6, 7, 0, 1, 2, 3, 4]. The buffer capacity is 3 bytes. - First, we call fill_buffer, which loads [5, 6, 7] into the internal buffer, and then consume those 3 bytes. - Then we either read the 5 remaining bytes in a single read call or we move to the end of the stream by calling seek. In both cases the buffer should be emptied to prevent the previous data [5, 6, 7] from being read. - We now call seek_relative(-2) and read two bytes, which should give us the last 2 bytes of the stream: [3, 4]. Before this commit, the the seek_relative method would consider that we're still in the range of the internal buffer, so instead of fetching data from the inner reader, it would return the two last bytes that were incorrectly still in the buffer: [6, 7]. Therefore, the test would fail. Now, when seek_relative is called the buffer is empty. So the expected data [3, 4] is fetched from the inner reader and the test passes.
@joshtriplett I realized that this problem not only happened in the |
I feel like I need an extra pair of eyes here. I can see how this makes sense, but it feels like it might be invalidating more than it needs to, and I also wonder why it looks like seek's control flow can pass through both the invalidates you added in one call. @Diggsey, @sfackler, it looks like you've done a fair bit of work in this area, can you take a look, please? |
I had the same question when I first read that code. It seems to me that the invalidation inside the else is there because the seek just below it can fail and return without passing through the invalidation outside the else. |
src/libstd/io/buffered.rs
Outdated
self.pos = self.cap; // empty the buffer | ||
// Empty the buffer | ||
self.cap = 0; | ||
self.pos = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to mark the buffer as empty here, when we do the same further down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never-mind just saw the conversation above!
What about |
Also, it might be worth pulling this out into a method, |
@Diggsey Good point, I didn't understand yet how |
It's pretty much the same as |
Does /// Invalidates all data in the internal buffer.
#[inline]
fn discard_buffer(&mut self) {
self.pos = 0;
self.cap = 0;
} |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
I had tried to improve the readability of the seek method, but I ended up changing its behavior as well, so I just rolled back. |
ping from triage @joshtriplett any updates on this? |
@bors r+ |
📌 Commit c36d91c has been approved by |
…iplett Add new test case for possible bug in BufReader When reading a large chunk from a BufReader, if all the bytes from the buffer have been already consumed, the internal buffer is bypassed entirely. However, it is not invalidated, and it's possible to access its contents using the `seek_relative` method, because it tries to reuse the existing buffer.
…iplett Add new test case for possible bug in BufReader When reading a large chunk from a BufReader, if all the bytes from the buffer have been already consumed, the internal buffer is bypassed entirely. However, it is not invalidated, and it's possible to access its contents using the `seek_relative` method, because it tries to reuse the existing buffer.
Add new test case for possible bug in BufReader When reading a large chunk from a BufReader, if all the bytes from the buffer have been already consumed, the internal buffer is bypassed entirely. However, it is not invalidated, and it's possible to access its contents using the `seek_relative` method, because it tries to reuse the existing buffer.
☀️ Test successful - checks-travis, status-appveyor |
Tested on commit rust-lang/rust@15a5dfa. Direct link to PR: <rust-lang/rust#58913> 💔 rls on windows: test-pass → test-fail (cc @nrc @Xanewok, @rust-lang/infra).
When reading a large chunk from a BufReader, if all the bytes from the buffer have been already consumed, the internal buffer is bypassed entirely. However, it is not invalidated, and it's possible to access its contents using the
seek_relative
method, because it tries to reuse the existing buffer.