io::Stdout should use block bufferring when appropriate #60673

BurntSushi · 2019-05-09T16:39:57Z

I feel like a pretty common pitfall for beginning Rust programmers is to try writing a program that uses println! to print a lot of lines, compare its performance to a similar program written in Python, and be (rightly) baffled at the fact that Python is substantially faster. This occurred most recently here: https://www.reddit.com/r/rust/comments/bl7j7j/hey_rustaceans_got_an_easy_question_ask_here/emx3bhm/

The reason why this happens is because io::Stdout unconditionally uses line buffering, regardless of whether it's being used interactively (e.g., printing to a console or a tty) or whether it's printing to a file. So if you print a lot of lines, you end up calling the write syscall for every line, which is quite expensive. In contrast, Python uses line buffering when printing interactively, and standard block bufferring otherwise. You can see more details on this here and here.

In my opinion, Rust should adopt the same policy as Python. Indeed, there is even a FIXME item for this in the code:

rust/src/libstd/io/stdio.rs

Lines 401 to 404 in ef01f29

    
           // FIXME: this should be LineWriter or BufWriter depending on the state of 
        
           //        stdout (tty or not). Note that if this is not line buffered it 
        
           //        should also flush-on-panic or some form of flush-on-abort. 
        
           inner: Arc<ReentrantMutex<RefCell<LineWriter<Maybe<StdoutRaw>>>>>,

I think this would potentially solve a fairly large stumbling block that folks run into. The CLI working group even calls it out as a performance footgun. And also here too. Additionally, ripgrep rolls its own handling for this.

I can't think of too many appreciable downsides to doing this. It is a change in behavior. For example, if you wrote a Rust program today that printed to io::Stdout, and the user redirected the output to a file, then the user could (for example) tail that output and see it updated as each line was printed. If we made io::Stdout use block buffering when printing to a file like this, then that behavior would change. (This is the reasoning for flags like --line-buffered on grep.)

cc @rust-lang/libs

The text was updated successfully, but these errors were encountered:

BurntSushi · 2019-05-09T16:40:29Z

cc @killercup @kbknapp as other folks that might have opinions here.

sfackler · 2019-05-09T17:00:50Z

If we're worried about regressing people that are depending on it being line buffered, we could minimally have methods on Stdout/Stderr to switch it between line and block buffering.

alexcrichton · 2019-05-09T18:35:47Z

FWIW I personally continue to feel that we can do this at any time (change libstd's buffering strategy on non-TTY stdout/stderr streams) and I agree with @sfackler that if breakage arises we can work around it with methods and such.

kbknapp · 2019-05-09T18:40:04Z

I would be very much in favor of at least the minimal route of giving Stdout/Stderr the option to switch between line and block buffering.

Another slightly less minimalist approach is to use block buffering by default on print!("..") and to prominently displaying the characteristics of both macros in the docs. The downside being to change println!("..") calls to print!("..\n") is a multi cursor movement. A different approach is to add a pythonesque opt-in version of println!, ignoring the exact syntax as purely an example println!(buf=true, "..") which I believe could be done in a backwards compat way, and isn't a multi-cursor movement.

In general I'd like to switch wholesale, as it's one of the very common footguns I see.

Lonami · 2019-05-11T21:50:36Z

If we're worried about regressing people that are depending on it being line buffered […]

I wouldn't worry about this unless the documentation explicitly states the current behaviour (e.g. always line-buffered). If it's not documented, it's like relying on implementation details (which are subject to change).

BurntSushi · 2019-05-12T01:19:05Z

I don't think we specify the behavior. But even if we don't, and we want to make this change (it sounds like folks agree we should), we should go into it while being considerate of behavioral changes to existing code. The letter of the law is important, but so is the spirit.

canadaduane · 2019-07-20T03:23:22Z

Just to document further agreement with @BurntSushi that this is a common pitfall--here I am, a new user, doing it today, and asking around for help :)

https://users.rust-lang.org/t/why-is-this-rust-loop-3x-slower-when-writing-to-disk/30489

This matches glibc behavior. This is determined using the `isatty` function on Unixes, and not attempted at all for other operating systems. Fixes rust-lang#60673.

Lokathor · 2020-01-12T06:43:19Z

I'd be most in favor of simply a method to switch to block buffer mode. It's something that keeps the default case simple, and if you notice performance is bad you can opt into specific behavior. The same idea as being able to lock stdout manually to avoid repeatedly locking it.

I would be against trying to auto-detect the program mode and then using that to decide. Particularly, I absolutely want my interactive programs to be able to use block output and manual flushing.

Lucretiel · 2020-05-28T04:59:46Z

@Lokathor what about doing both? I think it's a common (and reasonable) default behavior of many other languages to do line-buffered on a terminal, and block-buffered otherwise. We could do the same, but then also add something like:

impl Stdout {
    // These functions do not cause any flushes or i/o interaction of any kind;
    // they simply set a flag that is consulted on each call to `write`. So,
    // transitioning to line_buffered wouldn't try to flush existing unflushed
    // lines until more writes come in (or a manual flush(), obviously).
    fn force_line_buffered(&mut self);
    fn force_block_buffered(&mut self);
}

I'm interested in tackling an implementation for this; would a change like this be considered significant enough that I should write an RFC for it first, to hash out the specific details, or could I write a draft PR and have the discussion take place in there?

Lokathor · 2020-05-28T05:52:49Z

I'm not on any team, but asking on the Rust Zulip for T-Libs might be your best starting place.

Lucretiel · 2020-06-02T20:44:52Z

I've started an implementation of this; I'll tag it in the relevant Pull Requests as I file them.

…nieu Substantial refactor to the design of LineWriter # Preamble This is the first in a series of pull requests designed to move forward with rust-lang#60673 (and the related [5 year old FIXME](https://github.com/rust-lang/rust/blob/ea7181b5f7a888c2cf969ae86de7207fa5fb40aa/src/libstd/io/stdio.rs#L459-L461)), which calls for an update to `Stdout` such that it can be block-buffered rather than line-buffered under certain circumstances (such as a `tty`, or a user setting the mode with a function call). This pull request refactors the logic `LineWriter` into a `LineWriterShim`, which operates on a `BufWriter` by mutable reference, such that it is easy to invoke the line-writing logic on an existing `BufWriter` without having to construct a new `LineWriter`. Additionally, fixes rust-lang#72721 ## A note on flushing Because the word **flush** tends to be pretty overloaded in this discussion, I'm going to use the word **unbuffered** to refer to a `BufWriter` sending its data to the wrapped writer via `write`, without calling `flush` on it, and I'll be using **flushed** when referring to sending data via flush, which recursively writes the data all the way to the final sink. For example, given a `T = BufWriter<BufWriter<File>>`, saying that `T` **unbuffers** its data means that it is sent to the inner `BufWriter`, but not necessarily to the `File`, whereas saying that `T` **flushes** its data means that causes it (via `Write::flush`) to be delivered all the way to `File`. # Goals Once it became clear (for reasons described below) that the best way to approach this would involve refactoring `LineWriter` to work more directly on `BufWriter`'s internals, I established the following design goals for the refactor: - Do not duplicate logic with `BufWriter`. It's great at buffering and then unbuffering data, so use the existing logic as much as possible. - Minimize superfluous copying of data into `BufWriter`'s buffer. - Eliminate calls to `BufWriter::flush` and instead do the same thing as `BufWriter::write`, which is to only write to the wrapped writer (rather than flushing all the way down to the final data sink). - Uphold the "at-most 1 write of new data" convention of `Write::write` - Minimize or eliminate dropping errors (that is, eliminate the parts of the old design that threw away errors because `write` *must* report if any bytes were written) - As much as possible, attempt to fully flush completed lines, and *not* flush partial lines. One of the advantages of this design is that, so long as we don't encounter lines larger than the `BufWriter`'s capacity, partial lines will never be unbuffered, while completed lines will *always* be unbuffered (with subsequent calls to `LineWriter::write` retrying failed writes before processing new data. # Design There are two major & related parts of the design. First, a new internal stuct, `LineWriterShim`, is added. This struct implements all of the actual logic of line-writing in a `Write` implementation, but it only operates on an `&mut BufWriter`. This means that this shim can be constructed on-the-fly to apply line writing logic to an existing `BufWriter`. This is in fact how `LineWriter` has been updated to operate, and it is also how `Stdout` is being updated in my [development branch](https://github.com/Lucretiel/rust/tree/stdout-block-buffer) to switch which mode it wants to use at runtime. [An example of how this looks in practice](https://github.com/Lucretiel/rust/blob/f24f272df674dc7fa8941b97b45f41ad08b2199b/src/libstd/io/stdio.rs#L479-L484 ) The second major part of the design that the line-buffering logic, implemented in `LineWriterShim`, has been updated to work slightly more directly on the internals of `BufWriter`. Mostly it makes us of the public interface—particularly `buffer()` and `get_mut()`—but it also controls the flushing of the buffer with `flush_buf` rather than `flush`, and it writes to the buffer infallibly with a new `write_to_buffer` method. This has several advantages: - Data no longer has to round trip through the `BufWriter`'s buffer. If the user provides a complete line, that line is written directly to the inner writer (after ensuring the existing buffer is flushed). - The conventional contract of `write`—that at-most 1 attempt to write new data is made—is much more cleanly upheld, because we don't have to perform fallible flushes and perform semi-complicated logic of trying to pretend errors at different stages didn't happen. Instead, after attempting to write lines directly to the buffer, we can infallibly add trailing data to the buffer without allowing any attempts to continue writing it to the `inner` writer. - Perhaps most importantly, `LineWriter` *no longer performs a full flush on every line.* This makes its behavior much more consistent with `BufWriter`, which unbuffers data to its inner writer, without trying to flush it all the way to the final device. Previously, `LineWriter` had no choice but to use `flush` to ensure that the lines were unbuffered, but by writing directly to `inner` via `get_mut()` (when appropriate), we can use a more correct behavior. ## New(ish) line buffering logic The logic for line writing has been cleaned up, as described above. It now follows this algorithm for `write`, with minor adjustments for `write_all` and `write_vectored`: - Does our input data contain a newline? - If no: - simply use the regular `BufWriter::write` to write it; this will append it to the buffer and/or flush it as necessary based on how full the buffer is and how much input data there is. - additionally, if the current buffer ends with `'\n'`, attempt to immediately flush it with `flush_buf` before calling `BufWriter::write` This reproduces the old `needs_flush` behavior and ensures completed lines are flushed as soon as possible. The reason we only check if the buffer *ends* with `'\n'` is discussed later. - If yes: - First, `flush_buf` - Then use `bufwriter.get_mut().write()` to write the input data directly to the underlying writer, up to the last newline. Make at most one attempt at this. - If it errors, return the error - If it succeeds with a full write, add the remaining data (between the last newline and the end of the input) to the buffer. In order to uphold the "at-most 1 attempt to write new data" convention, no attempts are made to write this data to the inner writer (though obviously a subsequent write may immediately flush it, e.g., if it totally filled the buffer's capacity. - If it only partially succeeds, buffer the data only up to the last newline. We do this to try to avoid writing partial lines to the inner writer where possible (that is, whenever the lines are shorter than the total buffer capacity). While it was not my intention for this behavior to diverge from this existing `LineWriter` algorithm, this updated design emerged very naturally once `LineWriter` wasn't burdened with having to only operate via `BufWriter::flush`. There essentially two main changes to observable behavior: - `flush` is no longer used to unbuffer lines. The are only written to the writer wrapped by `LineWriter`; this inner writer might do its own buffering. This change makes `LineWriter` consistent with the behavior of `BufWriter`. This is probably the most obvious user-visible change; it's the one I most expect to provoke issue reports, if any are provoked. - Unless a line exceeds the capacity of the buffer, partial lines are not unbuffered (without the user manually calling flush). This is a less surprising behavior, and is enabled because `LineWriter` now has more precise control of what data is buffered and when it is unbuffered. I'd be surprised if anyone is relying on `LineWriter` unbuffering or flushing *partial* lines that are shorter than the capacity, so I'm not worried about this one. None of these changes are inconsistent with any published documentation of `LineWriter`. Nonetheless, like all changes with user-facing behavior changes, this design will obviously have to be very carefully scrutinized. # Alternative designs and design rationalle The initial goal of this project was to provide a way for the `LineWriter` logic to be operable directly on a `BufWriter`, so that the updated `Stdout` doesn't need to do something convoluted like `enum { BufWriter, LineWriter }` (which ends up being ~~impossible~~ difficult to transition between states after being constructed). The design went through several iterations before arriving at the current draft. The major first version simply involved adding methods like `write_line_buffered` to `BufWriter`; these would contain the actual logic of line-buffered writing, and would additionally have the advantages (described above) of operating directly on the internals of `BufWriter`. The idea was that `LineWriter` would simply call these methods, and the updated `Stdout` would use either `BufWriter::write` or `BufWriter::write_line_buffered`, depending on what mode it was in. The major issue with this design is that it loses the ability to take advantage of the `io::Write` trait, which provides several useful default implementations of the various io methods, such as `write_fmt` and `write_all`, just using the core methods. For this reason, the `write_line_buffered` design was retained, but moved into a separate struct called `LineWriterShim` which operates on an `&mut LineWriter`. As part of this move, the logic was lightly retooled to not touch the innards of `BufWriter` directly, but instead to make use of the unexported helper methods like `flush_buf`. The other design evolutions were mostly related to answering questions like "how much data should be buffered", "how should partial line writes be handled", etc. As much as possible I tried to answer these by emulating the current `LineWriter` logic (which, for example, retries partial line writes on subsequent calls to `write`) while still meeting the refactor design goals. # Next steps ~Currently, this design fails a few `LineWriter` tests, mostly because they expect `LineWriter` to *fully* flush its content. There are also some changes to the way that `LineWriter` buffers data *after* writing completed lines, aimed at ensuring that partial lines are not unbuffered prematurely. I want to make sure I fully understand the intent behind these tests before I either update the test or update this design so that they pass.~ However, in the meantime I wanted to get this published so that feedback could start to accumulate on it. There's a lot of errata around how I arrived at this design that didn't really fit in this overlong document, so please ask questions about anything that confusing or unclear and hopefully I can explain more of the rationale that led to it. # Test updates This design required some tests to be updated; I've research the intent behind these tests (mostly via `git blame`) and updated them appropriately. Those changes are cataloged here. - `test_line_buffer_fail_flush`: This test was added as a regression test for rust-lang#32085, and is intended to assure that an errors from `flush` aren't propagated when preceded by a successful `write`. Because type of issue is no longer possible, because `write` calls `buffer.get_mut().write()` instead of `buffer.write(); buffer.flush();`, I'm simply removing this test entirely. Other, similar error invariants related to errors during write-retrying are handled in other test cases. - `erroneous_flush_retried`: This test was added as a regression test for rust-lang#37807, and was intended to ensure that flush-retrying (via `needs_flush`) and error-ignoring were being handled correctly (ironically, this issue was caused by the flush-error-ignoring, above). Half of that issue is not possible by design with this refactor, because we no longer make fallible i/o calls that might produce errors we have to ignore after unbuffering lines. The `should_flush` behavior is captured by checking for a trailing newline in the `LineWriter` buffer; this test now checks that behavior. - `line_vectored`: changes here were pretty minor, mostly related to when partial lines are or aren't written. The old implementation of `write_vectored` used very complicated logic to precisely determine the location of the last newline and precisely write up to that point; this required doing several consecutive fallible writes, with all the complex error handling or ignoring issues that come with it. The updated design does at-most one write of a subset of total buffers (that is, it doesn't split in the middle of a buffer), even if that means writing partial lines. One of the major advantages of the new design is that the underlying vectored write operation on the device can be taken advantage of, even with small writes, so long as they include a newline; previously these were unconditionally buffered then written. - `line_vectored_partial_and_errors`: Pretty similiar to `line_vectored`, above; this test is for basic error recovery in `write_vectored` for vectored writes. As previously discussed, the mocked behavior being tested for (errors ignored under certain circumstances) no occurs, so I've simplified the test while doing my best to retain its spirit.

Substantial refactor to the design of LineWriter # Preamble This is the first in a series of pull requests designed to move forward with rust-lang#60673 (and the related [5 year old FIXME](https://github.com/rust-lang/rust/blob/ea7181b5f7a888c2cf969ae86de7207fa5fb40aa/src/libstd/io/stdio.rs#L459-L461)), which calls for an update to `Stdout` such that it can be block-buffered rather than line-buffered under certain circumstances (such as a `tty`, or a user setting the mode with a function call). This pull request refactors the logic `LineWriter` into a `LineWriterShim`, which operates on a `BufWriter` by mutable reference, such that it is easy to invoke the line-writing logic on an existing `BufWriter` without having to construct a new `LineWriter`. Additionally, fixes rust-lang#72721 ## A note on flushing Because the word **flush** tends to be pretty overloaded in this discussion, I'm going to use the word **unbuffered** to refer to a `BufWriter` sending its data to the wrapped writer via `write`, without calling `flush` on it, and I'll be using **flushed** when referring to sending data via flush, which recursively writes the data all the way to the final sink. For example, given a `T = BufWriter<BufWriter<File>>`, saying that `T` **unbuffers** its data means that it is sent to the inner `BufWriter`, but not necessarily to the `File`, whereas saying that `T` **flushes** its data means that causes it (via `Write::flush`) to be delivered all the way to `File`. # Goals Once it became clear (for reasons described below) that the best way to approach this would involve refactoring `LineWriter` to work more directly on `BufWriter`'s internals, I established the following design goals for the refactor: - Do not duplicate logic with `BufWriter`. It's great at buffering and then unbuffering data, so use the existing logic as much as possible. - Minimize superfluous copying of data into `BufWriter`'s buffer. - Eliminate calls to `BufWriter::flush` and instead do the same thing as `BufWriter::write`, which is to only write to the wrapped writer (rather than flushing all the way down to the final data sink). - Uphold the "at-most 1 write of new data" convention of `Write::write` - Minimize or eliminate dropping errors (that is, eliminate the parts of the old design that threw away errors because `write` *must* report if any bytes were written) - As much as possible, attempt to fully flush completed lines, and *not* flush partial lines. One of the advantages of this design is that, so long as we don't encounter lines larger than the `BufWriter`'s capacity, partial lines will never be unbuffered, while completed lines will *always* be unbuffered (with subsequent calls to `LineWriter::write` retrying failed writes before processing new data. # Design There are two major & related parts of the design. First, a new internal stuct, `LineWriterShim`, is added. This struct implements all of the actual logic of line-writing in a `Write` implementation, but it only operates on an `&mut BufWriter`. This means that this shim can be constructed on-the-fly to apply line writing logic to an existing `BufWriter`. This is in fact how `LineWriter` has been updated to operate, and it is also how `Stdout` is being updated in my [development branch](https://github.com/Lucretiel/rust/tree/stdout-block-buffer) to switch which mode it wants to use at runtime. [An example of how this looks in practice](https://github.com/Lucretiel/rust/blob/f24f272df674dc7fa8941b97b45f41ad08b2199b/src/libstd/io/stdio.rs#L479-L484 ) The second major part of the design that the line-buffering logic, implemented in `LineWriterShim`, has been updated to work slightly more directly on the internals of `BufWriter`. Mostly it makes us of the public interface—particularly `buffer()` and `get_mut()`—but it also controls the flushing of the buffer with `flush_buf` rather than `flush`, and it writes to the buffer infallibly with a new `write_to_buffer` method. This has several advantages: - Data no longer has to round trip through the `BufWriter`'s buffer. If the user provides a complete line, that line is written directly to the inner writer (after ensuring the existing buffer is flushed). - The conventional contract of `write`—that at-most 1 attempt to write new data is made—is much more cleanly upheld, because we don't have to perform fallible flushes and perform semi-complicated logic of trying to pretend errors at different stages didn't happen. Instead, after attempting to write lines directly to the buffer, we can infallibly add trailing data to the buffer without allowing any attempts to continue writing it to the `inner` writer. - Perhaps most importantly, `LineWriter` *no longer performs a full flush on every line.* This makes its behavior much more consistent with `BufWriter`, which unbuffers data to its inner writer, without trying to flush it all the way to the final device. Previously, `LineWriter` had no choice but to use `flush` to ensure that the lines were unbuffered, but by writing directly to `inner` via `get_mut()` (when appropriate), we can use a more correct behavior. ## New(ish) line buffering logic The logic for line writing has been cleaned up, as described above. It now follows this algorithm for `write`, with minor adjustments for `write_all` and `write_vectored`: - Does our input data contain a newline? - If no: - simply use the regular `BufWriter::write` to write it; this will append it to the buffer and/or flush it as necessary based on how full the buffer is and how much input data there is. - additionally, if the current buffer ends with `'\n'`, attempt to immediately flush it with `flush_buf` before calling `BufWriter::write` This reproduces the old `needs_flush` behavior and ensures completed lines are flushed as soon as possible. The reason we only check if the buffer *ends* with `'\n'` is discussed later. - If yes: - First, `flush_buf` - Then use `bufwriter.get_mut().write()` to write the input data directly to the underlying writer, up to the last newline. Make at most one attempt at this. - If it errors, return the error - If it succeeds with a full write, add the remaining data (between the last newline and the end of the input) to the buffer. In order to uphold the "at-most 1 attempt to write new data" convention, no attempts are made to write this data to the inner writer (though obviously a subsequent write may immediately flush it, e.g., if it totally filled the buffer's capacity. - If it only partially succeeds, buffer the data only up to the last newline. We do this to try to avoid writing partial lines to the inner writer where possible (that is, whenever the lines are shorter than the total buffer capacity). While it was not my intention for this behavior to diverge from this existing `LineWriter` algorithm, this updated design emerged very naturally once `LineWriter` wasn't burdened with having to only operate via `BufWriter::flush`. There essentially two main changes to observable behavior: - `flush` is no longer used to unbuffer lines. The are only written to the writer wrapped by `LineWriter`; this inner writer might do its own buffering. This change makes `LineWriter` consistent with the behavior of `BufWriter`. This is probably the most obvious user-visible change; it's the one I most expect to provoke issue reports, if any are provoked. - Unless a line exceeds the capacity of the buffer, partial lines are not unbuffered (without the user manually calling flush). This is a less surprising behavior, and is enabled because `LineWriter` now has more precise control of what data is buffered and when it is unbuffered. I'd be surprised if anyone is relying on `LineWriter` unbuffering or flushing *partial* lines that are shorter than the capacity, so I'm not worried about this one. None of these changes are inconsistent with any published documentation of `LineWriter`. Nonetheless, like all changes with user-facing behavior changes, this design will obviously have to be very carefully scrutinized. # Alternative designs and design rationalle The initial goal of this project was to provide a way for the `LineWriter` logic to be operable directly on a `BufWriter`, so that the updated `Stdout` doesn't need to do something convoluted like `enum { BufWriter, LineWriter }` (which ends up being ~~impossible~~ difficult to transition between states after being constructed). The design went through several iterations before arriving at the current draft. The major first version simply involved adding methods like `write_line_buffered` to `BufWriter`; these would contain the actual logic of line-buffered writing, and would additionally have the advantages (described above) of operating directly on the internals of `BufWriter`. The idea was that `LineWriter` would simply call these methods, and the updated `Stdout` would use either `BufWriter::write` or `BufWriter::write_line_buffered`, depending on what mode it was in. The major issue with this design is that it loses the ability to take advantage of the `io::Write` trait, which provides several useful default implementations of the various io methods, such as `write_fmt` and `write_all`, just using the core methods. For this reason, the `write_line_buffered` design was retained, but moved into a separate struct called `LineWriterShim` which operates on an `&mut LineWriter`. As part of this move, the logic was lightly retooled to not touch the innards of `BufWriter` directly, but instead to make use of the unexported helper methods like `flush_buf`. The other design evolutions were mostly related to answering questions like "how much data should be buffered", "how should partial line writes be handled", etc. As much as possible I tried to answer these by emulating the current `LineWriter` logic (which, for example, retries partial line writes on subsequent calls to `write`) while still meeting the refactor design goals. # Next steps ~Currently, this design fails a few `LineWriter` tests, mostly because they expect `LineWriter` to *fully* flush its content. There are also some changes to the way that `LineWriter` buffers data *after* writing completed lines, aimed at ensuring that partial lines are not unbuffered prematurely. I want to make sure I fully understand the intent behind these tests before I either update the test or update this design so that they pass.~ However, in the meantime I wanted to get this published so that feedback could start to accumulate on it. There's a lot of errata around how I arrived at this design that didn't really fit in this overlong document, so please ask questions about anything that confusing or unclear and hopefully I can explain more of the rationale that led to it. # Test updates This design required some tests to be updated; I've research the intent behind these tests (mostly via `git blame`) and updated them appropriately. Those changes are cataloged here. - `test_line_buffer_fail_flush`: This test was added as a regression test for rust-lang#32085, and is intended to assure that an errors from `flush` aren't propagated when preceded by a successful `write`. Because type of issue is no longer possible, because `write` calls `buffer.get_mut().write()` instead of `buffer.write(); buffer.flush();`, I'm simply removing this test entirely. Other, similar error invariants related to errors during write-retrying are handled in other test cases. - `erroneous_flush_retried`: This test was added as a regression test for rust-lang#37807, and was intended to ensure that flush-retrying (via `needs_flush`) and error-ignoring were being handled correctly (ironically, this issue was caused by the flush-error-ignoring, above). Half of that issue is not possible by design with this refactor, because we no longer make fallible i/o calls that might produce errors we have to ignore after unbuffering lines. The `should_flush` behavior is captured by checking for a trailing newline in the `LineWriter` buffer; this test now checks that behavior. - `line_vectored`: changes here were pretty minor, mostly related to when partial lines are or aren't written. The old implementation of `write_vectored` used very complicated logic to precisely determine the location of the last newline and precisely write up to that point; this required doing several consecutive fallible writes, with all the complex error handling or ignoring issues that come with it. The updated design does at-most one write of a subset of total buffers (that is, it doesn't split in the middle of a buffer), even if that means writing partial lines. One of the major advantages of the new design is that the underlying vectored write operation on the device can be taken advantage of, even with small writes, so long as they include a newline; previously these were unconditionally buffered then written. - `line_vectored_partial_and_errors`: Pretty similiar to `line_vectored`, above; this test is for basic error recovery in `write_vectored` for vectored writes. As previously discussed, the mocked behavior being tested for (errors ignored under certain circumstances) no occurs, so I've simplified the test while doing my best to retain its spirit.

calebstewart · 2020-09-19T05:14:31Z

I'm not sure if this is the right place, but I stumbled on this issue when trying to find a way to disable line-buffering on stdout. I see there was a pull-request recently merged and some talk above about possibly adding a method to disable/enable line-buffering. I tried to go through the pull request but didn't completely follow what was added. Can anyone give me a run-down of if/how this was resolved? Currently, I've solved my issue by doing File::from_raw_fd(1) to get a non-line-buffered stdout stream, but this is platform dependent. Platform independence isn't a strict requirement of my project, so it's not the worst thing, but if there's now a way to disable line-buffering, I'd love to use a solution that doesn't depend on Unix conventions explicitly. Thanks! :)

Lucretiel · 2020-09-19T16:26:55Z

I've been working on resolving this over most of the summer. If you're referring to #72808, that PR is entirely preliminary; it refactors the design of LineWriter to allow for a future implementation of switchable buffering behavior.

Right now, there's no way to fully disable buffering on stdout. However, if you want to use block buffering, you can still wrap the Stdout or StdoutLock object in a BufWriter, which, when flushed, will send all the buffered data to the stdout device at once.

It's worth noting that, unless you are manually sending byte slices to stdout, you almost certainly don't want unbuffered stdout. print! and all the other formatted write utilities work by performing numerous tiny writes (several for each component of the formatted content); if these are performed directly on an I/O device, your performance will seriously suffer.

calebstewart · 2020-09-19T18:22:57Z

I appreciate your response. I assume this is a heavy lift, so not trying to be annoying or rush anyone, haha. What you explained makes sense, and I'll continue to track the progress moving forward. Thanks for all the hard work!

I am in fact in that very small edge case of writing byte slices to stdout, but I appreciate the heads up! It's a very good point and true in 99.99% of cases.

Refactor io/buffered.rs into submodules This pull request splits `BufWriter`, `BufReader`, `LineWriter`, and `LineWriterShim` (along with their associated tests) into separate submodules. It contains no functional changes. This change is being made in anticipation of adding another type of buffered writer which can be switched between line- and block-buffering mode. Part of a series of pull requests resolving rust-lang#60673.

Lucretiel · 2020-10-29T04:36:36Z

Proposed implementation of switchable stdout buffering: #78515. This should be the second-to-last PR towards the fulfillment of this issue; after that, it's only a matter of actually adding code to detect stdout's environment (tty or not) and correctly init the buffer mode.

Rust io::Stdout support only line buffering, no block buffering, so we use a third party library to get it. See rust-lang/rust#60673

the8472 · 2023-08-02T20:36:35Z

for reference, #78515 (comment) has the API team's last position on how this issue should be approached.

RalfJung · 2023-08-27T13:53:57Z

I'm a bit confused by the current status here. The issue is still open, and hence I assume it is unresolved. But we also have this comment in the standard library:

rust/library/std/src/io/buffered/linewritershim.rs

Lines 10 to 12 in e7ef5d8

    
           /// implementation details of BufWriter. This also allows existing 
        
           /// `BufWriters` to be temporarily given line-buffering logic; this is what 
        
           /// enables Stdout to be alternately in line-buffered or block-buffered mode.

That to me sounds like there is some logic that would make "Stdout [...] be alternately in line-buffered or block-buffered mode". But there appears to be no such logic. Am I misunderstanding the comment?

the8472 · 2023-08-27T14:01:59Z

Lucretiel was refactoring the buffering logic with the goal to enable that. But the work stalled, see the comment linked above. So now a good chunk of the capability is in std but it's not exposed.

Lucretiel · 2023-08-28T05:13:14Z

Correct. LineWriterShim enables all of the necessary logic to allow a Bufwriter to temporarily opt-in to line buffering behavior (using the buffer it already has) Remaining work involves figuring out the exact shape that should take (my own proposal was an internal SwitchWriter type, which contains a bufwriter and a mode), along with what (if any) new public APIs should be added for controlling the mode.

RalfJung · 2023-08-28T06:17:30Z

Okay I see, thanks. The comment is misleading then since it strongly implies that Stdout does something that it doesn't do.

the8472 · 2023-09-26T23:38:14Z

@m-ou-se on from zulip:

i have an actual rust program running here that prints the energy consumption of my home every 10 seconds, which is tee'd into a log file and to a terminal. that program would work fine when run directly on the terminal, and suddenly seem to fully hang when piped through tee or similar.

Afaict such an invocation depends on unspecified behavior, it's not documented anywhere that stdout is line-buffered. The docs say it's buffered, but not which buffering strategy.
And, as the opening comment says, people often are surprised that stdout is not performant, which is more subtle to notice than output not appearing.

Since the behavior can't be perfect for all uses and it's unspecified it's mostly a matter of

expectations
defaults
providing ways for the user to override if the defaults don't work for them

expectations

Here we can check prior art:

glibc's FILE* stdout is line-buffered for ttys, block- or unbuffered otherwise
python's stdout and stderr switch depending on isatty. But there's a global override and print also has an optional flush argument which makes it easy to get data out immediately when needed.
java's system.out is always line-buffered, FileDescriptor.out can be used to obtain unbuffered or block-buffered behavior.
Go's os.stdout is unbuffered since it's a simple file descriptor wrapper
nodejs is just wild
uutil's cat tries to write blocks to stdout but when it can't splice this still goes through linewriter which scans each written slice for newline bytes

defaults

Keep status quo, stdout is always line-buffered at process startup. If we decide that this is important enough to keep because people rely on it then perhaps we should guarantee it?
initialize Stdout based on IsTerminal
- as above, but add more heuristics such as seeing if a process has a controlling terminal (and thus a user that might be looking at things) or is part of a foreground process group
keep Stdout linebuffering but make StdoutLock block-buffered

Some other approaches of varying seriousness that also came to mind while writing this:

switch to a more selective line-buffering strategy based on tty-ness: only flush if the last byte of a write() or write_all() call is a \n. If the callers doesn't care about doing things line-by-line then we don't care either. This avoids splitting up binary data that happens to contain some 0x0A in random places.
never line-buffer stdout by default but make println! call flush() instead, writeln! could be the non-flushing alternative.
~~implement nagle's algorithm~~

overrides

This one is simple on the surface, we only need two configurables:

line-buffered on/off
buffer size (0 = unbuffered)

Though that'd still leave questions where to put those APIs in Stdout, LineWriter or BufWriter.
Refer to #78515 (comment).

BurntSushi added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label May 9, 2019

Thomasdezeeuw mentioned this issue May 12, 2019

Benchmark direct writing vs using io::Stdout Thomasdezeeuw/std-logger#17

Closed

Arcterus mentioned this issue May 17, 2019

libstd: use block buffering when stdout is not a TTY #60904

Closed

zackw mentioned this issue Sep 12, 2019

io::Stderr should be line-buffered by default, not unbuffered #64413

Open

tbu- mentioned this issue Sep 28, 2019

Change to non-line buffered output if output is not a TTY #64861

Closed

This was referenced May 29, 2020

LineWriter should not fully flush its lines #72721

Closed

Substantial refactor to the design of LineWriter #72808

Merged

Lucretiel mentioned this issue Aug 29, 2020

Refactor io/buffered.rs into submodules #76084

Merged

jonas-schievink added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Sep 13, 2020

sstadick mentioned this issue Sep 28, 2020

stdout buffering ezrosent/frawk#18

Closed

Lucretiel mentioned this issue Oct 29, 2020

Switchable buffering for Stdout #78515

Closed

9 tasks

jonhoo mentioned this issue Jan 6, 2021

Use BufWriter when STDOUT is not a TTY jonhoo/inferno#206

Merged

tavianator mentioned this issue Feb 2, 2021

--color=always ~10x slower sharkdp/fd#720

Closed

siebenHeaven mentioned this issue Apr 25, 2021

ls: reduce write syscalls & cleanup uutils/coreutils#2115

Merged

riquito added a commit to riquito/tuc that referenced this issue Jul 22, 2021

Use grep-cli for faster output

d31f487

Rust io::Stdout support only line buffering, no block buffering, so we use a third party library to get it. See rust-lang/rust#60673

sstadick mentioned this issue Jul 28, 2021

Speed up Rust renatoathaydes/prechelt-phone-number-encoding#1

Open

UE2020 mentioned this issue Aug 6, 2021

Performance concern regarding print function Alonely0/Voila#2

Closed

tavianator mentioned this issue Aug 11, 2021

Added buffering to stdout when its not a terminal sharkdp/fd#736

Closed

SUPERCILEX mentioned this issue Dec 16, 2022

Expose raw std{in,out,err} rust-lang/libs-team#148

Open

Amanieu mentioned this issue Mar 31, 2023

Tracking Issue for IsTerminal / is_terminal #98070

Closed

3 tasks

the8472 mentioned this issue Aug 2, 2023

io::copy does not use copy_file_range consistently #114341

Closed

ChrisDenton added the A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` label Aug 10, 2023

the8472 mentioned this issue Sep 8, 2023

Add buffering to stdout #115652

Closed

NobodyXu mentioned this issue Dec 9, 2023

Project Stalled BartMassey/rust-nonstdio#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

io::Stdout should use block bufferring when appropriate #60673

io::Stdout should use block bufferring when appropriate #60673

BurntSushi commented May 9, 2019 •

edited

Loading

BurntSushi commented May 9, 2019

sfackler commented May 9, 2019

alexcrichton commented May 9, 2019

kbknapp commented May 9, 2019 •

edited

Loading

Lonami commented May 11, 2019

BurntSushi commented May 12, 2019

canadaduane commented Jul 20, 2019

Lokathor commented Jan 12, 2020

Lucretiel commented May 28, 2020 •

edited

Loading

Lokathor commented May 28, 2020

Lucretiel commented Jun 2, 2020

calebstewart commented Sep 19, 2020

Lucretiel commented Sep 19, 2020 •

edited

Loading

calebstewart commented Sep 19, 2020

Lucretiel commented Oct 29, 2020

the8472 commented Aug 2, 2023

RalfJung commented Aug 27, 2023 •

edited

Loading

the8472 commented Aug 27, 2023

Lucretiel commented Aug 28, 2023

RalfJung commented Aug 28, 2023 •

edited

Loading

the8472 commented Sep 26, 2023

io::Stdout should use block bufferring when appropriate #60673

io::Stdout should use block bufferring when appropriate #60673

Comments

BurntSushi commented May 9, 2019 • edited Loading

BurntSushi commented May 9, 2019

sfackler commented May 9, 2019

alexcrichton commented May 9, 2019

kbknapp commented May 9, 2019 • edited Loading

Lonami commented May 11, 2019

BurntSushi commented May 12, 2019

canadaduane commented Jul 20, 2019

Lokathor commented Jan 12, 2020

Lucretiel commented May 28, 2020 • edited Loading

Lokathor commented May 28, 2020

Lucretiel commented Jun 2, 2020

calebstewart commented Sep 19, 2020

Lucretiel commented Sep 19, 2020 • edited Loading

calebstewart commented Sep 19, 2020

Lucretiel commented Oct 29, 2020

the8472 commented Aug 2, 2023

RalfJung commented Aug 27, 2023 • edited Loading

the8472 commented Aug 27, 2023

Lucretiel commented Aug 28, 2023

RalfJung commented Aug 28, 2023 • edited Loading

the8472 commented Sep 26, 2023

expectations

defaults

overrides

BurntSushi commented May 9, 2019 •

edited

Loading

kbknapp commented May 9, 2019 •

edited

Loading

Lucretiel commented May 28, 2020 •

edited

Loading

Lucretiel commented Sep 19, 2020 •

edited

Loading

RalfJung commented Aug 27, 2023 •

edited

Loading

RalfJung commented Aug 28, 2023 •

edited

Loading