Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for read_all #980

Merged
merged 13 commits into from
Aug 7, 2015
50 changes: 50 additions & 0 deletions text/0000-read-all.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
- Feature Name: read_all
- Start Date: 2015-03-15
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary

Rust's Write trait has write_all, which attempts to write an entire
buffer. This proposal adds read_all, which attempts to read a fixed
number of bytes into a given buffer.

# Motivation

The new read_all method will allow programs to read from disk without
having to write their own read loops. Most Rust programs which need
to read from disk will prefer this to the plain read function. Many C
programs have the same need, and solve it the same way (e.g. git has
read_in_full). Here's one example of a Rust library doing this:
https://github.com/BurntSushi/byteorder/blob/master/src/new.rs#L184

# Detailed design

The read_all function will take a mutable, borrowed slice of u8 to
read into, and will attempt to fill that entire slice with data.

It will loop, calling read() once per iteration and attempting to read
the remaining amount of data. If read returns EINTR, the loop will
retry. If there are no more bytes to read (as signalled by a return
of Ok(0) from read()), a new error type, ErrorKind::ReadZero, will be
returned. In the event of another error, that error will be
returned. After a read call returns having successfully read some
bytes, the total number of bytes read will be updated. If that
total is equal to the size of the buffer, read will return
successfully.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reasoning for introducing a new error kind instead of following the pattern of read()'s return value (io::Result<uint>)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Consistency with write_all.
  2. io::Result is either Ok(uint) or Error. read_all has only one kind of ok result: "yep, we read the whole thing". If instead we returned Ok(number of bytes read), then every caller would have to check that the number of bytes read matched their expectations. In other words, we are optimizing for the common case of "either this file has the data, or we're doomed". And because the error on EOF is distinct from other errors, callers who care can choose to match on it and handle it separately. But I expect that most callers will not care.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, we are optimizing for the common case of "either this file has the data, or we're doomed".

Who says that that's the common case? What about: "Fill this buffer as much as possible without me having to check for EINTR"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one could use take + read_to_end to do something like that (of course, it allocates a new buffer instead of re-using an old one). It seems likely to me that the "eof is a temporary condition rather than an error" applies more to network streams, and re-using buffers in network stream situations is less safe: http://www.tedunangst.com/flak/post/heartbleed-in-rust

I guess my only argument for read_all being the common case is that I've seen it twice (once in the code I am now writing, and once in the link above), compared to zero for the other one.

I could put this into the rationale section, if you like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, it allocates a new buffer instead of re-using an old one

Exactly.

It seems likely to me that the "eof is a temporary condition rather than an error"

I wasn't talking about EOF. I was talking about EINTR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a four byte difference between i32 errno and u64 usize that we're talking here?

No that's probably manageable, I just wanted to point out that the size of io::Error is a very real worry and it can be affected by ErrorKind depending on how we're implementing it.

Making a separate error type for this method might be the most straightforward approach.

This is possible, but has drawbacks:

  • More API surface area to stabilize
  • Nonstandard as the very similar write_all API would probably also want this treatment.
  • Can hinder understanding by returning a non-normal error type.
  • Operating generically over I/O functions may be hindered as this would stick out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think write_all can't use a separate error type. The write_all fn should never return WriteZero on a POSIX system (unless a zero-length buffer is passed); instead, it will either succeed or return some other error. So in fact we would have need to annotate all of the other io errors with the number of bytes written. I don't know how useful this would be, and it seems like an odd wart to have to have on io::Error given that it's only useful for write_all and read_all.

By contrast, read_all could just add the number of bytes written to ShortRead. That seems simpler, and I would be happy to adjust this proposal and the implementing Pull Request if that's the consensus.

I don't understand how operating generically would be hindered -- can you explain a bit? I'm not very familiar with the language.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton, the second case (Some bytes were read, and then we saw Ok(0)) is the only case for which I would generally care about knowing how many bytes were read. For the I/O error case, the vast majority of the time I don't care about how many bytes were read, I just bail out and return the error. I'm much more likely to care in the Ok(0) case, such as if I'm reading fixed-size records from a file or the network, but the last record is allowed to be short.

Since the EOF condition doesn't exist for write_all, no changes would be needed there for consistency.

One way to implement this would be as follows:

  • Add ShortRead(usize) to Repr
  • Also add ShortRead(usize) to ErrorKind
  • Add Repr::ShortRead(n) => ErrorKind::ShortRead(n) to Error::kind (and also update description and detail)

This would not increase the size of io::Error. (io::Error already has Custom(Box<Custom>), which is the same size). It would increase the size of Custom, but that always gets boxed. It would also allow the return value to stay io::Result<()>, as desired, and would maintain usability by keeping the check for a short read out of the normal program flow and allowing a short read to be treated as a normal error value (via try!, et cetera) unless the user explicitly wants to handle it differently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@novalis

I don't understand how operating generically would be hindered -- can you explain a bit?

This may not be too too common, but if you are operating in a context that expects io::Result<T> to be returned via something like:

fn foo<F>(f: F) -> io::Result<T> 
    where F: Fn() -> io::Result<T> {
    // ...
}

You can't naturally use read_all with a different error type as you would be able to otherwise. You can probably get around it with try! and Ok, but it's somewhat unfortunate to have to do so.


@rkjnsn

Since the EOF condition doesn't exist for write_all, no changes would be needed there for consistency.

Technically Ok(0) is indeed special as cases like Write on &mut [u8] will end up generating this once you hit the end of the buffer.

One way to implement this would be as follows:

Hm yes, that does seem plausible! I would still be a little wary of adding this error, but we do have precedent with WriteZero, so it may no be so bad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton
Technically Ok(0) is indeed special as cases like Write on &mut [u8] will end up generating this once you hit the end of the buffer.

Really? I would expect and strongly argue that that should generate an IO error, even for a standard write call, and not return Ok(0). In my mind, this is analogous to a pipe or a socket with the remote end closed, where reading yields end of file, while writing results in an error. As perhaps a closer analogy to the fixed buffer case, on Linux, if I have a fixed-size block device (say a small ramdisk), and I read to the end, subsequent reads will return 0 (end of file). If, however, I reach the end of the file and try to write more, I'll receive an error (ENOSPC). Similarly, writing to a file that has reached the maximum allowed size results in (EFBIG), not a successful write of zero size.

Is there any chance this can be changed?


# Drawbacks

The major weakness of this API (shared with write_all) is that in the
event of an error, there is no way to return the number of bytes that
were successfully read before the error. But since that is the design
of write_all, it makes sense to mimic that design decision for read_all.

# Alternatives

One alternative design would return some new kind of Result which
could report the number of bytes sucessfully read before an error.
This would be inconsistent with write_all, but arguably more correct.

Or we could leave this out, and let every Rust user write their own
read_all function -- like savages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary. Cargo makes us a highly developed distributed culture. With minimal investment needed to contribute & make a difference.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary. Cargo makes us a highly developed distributed culture.

If you mean we don't need a standard library because we have Cargo then I disagree strongly. Putting things off into crates was a strategy used to reach a stable 1.0 library in time, but the standard library should catch up eventually. If you do a thing, you should do it good. If standard library does reading and writing, then it should cover such basics as filling an entire buffer in face of interrupts.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArtemGr Completely agree

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm disagreeing with saying we're savages if it's not in libstd. And we don't have to write it ourselves -- it's easy to reuse code using cargo, very easy. Not saying that's a replacement but an augmentation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not savages, but babies. = )
It's 1.0 beta and there is space to grow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you do this, calling people names is beside the point of the RFC and doesn't help. I'll reply with some on-topic questions in the main discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huuuuuge -1 for 'like savages' here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling people names is beside the point of the RFC and doesn't help

Sorry it didn't help, @bluss. I had to try. = }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ignoring the last line of this RFC. As has already been pointed out there is a discrepancy between reading/writing with the std::io as it currently stands. I comments on here would also reenforce that its already catching a number of us out. Simply deferring to cargo for a single function will also create problems. Not only in the cost of tracking down the correct crate (maybe byteorder in this case?) but also the cost of pull in all of its dependencies etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crate can be designed for that use case though: small and without dependencies. Using cargo is super easy IMO. Since crates may even re-export other crates, you can even ask authors to split out features to sub crates (when it's possible).