-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow writing of incomplete UTF-8 sequences to the Windows console via stdout/stderr #83342
Allow writing of incomplete UTF-8 sequences to the Windows console via stdout/stderr #83342
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should stdout use STDERR handle here?
Cut&Paste mistake, will fix it right away. Thanks for spotting it! |
…als" feature gate and use it in sys::windows::stdio instead of reimplementing it there.
cb3fc85
to
fb1fa97
Compare
This comment has been minimized.
This comment has been minimized.
Nominating for team discussion. See #83342 (comment) for an open question. Apart from that question, we just need to decide if this is the behavior we want. |
We discussed this in today's @rust-lang/libs-api meeting. We generally agreed that this was the right approach, including the behavior of flush (don't flush the partial-character buffer. Using rfcbot to confirm consensus; when we see consensus we'll go ahead and r+ without waiting 10 days. @rfcbot merge |
Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. The RFC will be merged soon. |
@bors r+ |
📌 Commit fbfde7e has been approved by |
…utf8, r=m-ou-se Allow writing of incomplete UTF-8 sequences to the Windows console via stdout/stderr # Problem Writes of just an incomplete UTF-8 byte sequence (e.g. `b"\xC3"` or `b"\xF0\x9F"`) to stdout/stderr with a Windows console attached error with `io::ErrorKind::InvalidData, "Windows stdio in console mode does not support writing non-UTF-8 byte sequences"` even though further writes could complete the codepoint. This is currently a rare occurence since the [linewritershim](https://github.com/rust-lang/rust/blob/2c56ea38b045624dc8b42ec948fc169eaff1206a/library/std/src/io/buffered/linewritershim.rs) implementation flushes complete lines immediately and buffers up to 1024 bytes for incomplete lines. It can still happen as described in rust-lang#83258. The problem will become more pronounced once the developer can switch stdout/stderr from line-buffered to block-buffered or immediate when the changes in the "Switchable buffering for Stdout" pull request (rust-lang#78515) get merged. # Patch description If there is at least one valid UTF-8 codepoint all valid UTF-8 is passed through to the extracted `write_valid_utf8_to_console()` fn. The new code only comes into play if `write()` is being passed a short byte slice comprising an incomplete UTF-8 codepoint. In this case up to three bytes are buffered in the `IncompleteUtf8` struct associated with `Stdout` / `Stderr`. The bytes are accepted one at a time. As soon as an error can be detected `io::ErrorKind::InvalidData, "Windows stdio in console mode does not support writing non-UTF-8 byte sequences"` is returned. Once a complete UTF-8 codepoint is received it is passed to the `write_valid_utf8_to_console()` and the buffer length is set to zero. Calling `flush()` will neither error nor write anything if an incomplete codepoint is present in the buffer. # Tests Currently there are no Windows-specific tests for console writing code at all. Writing (regression) tests for this problem is a bit challenging since unit tests and UI tests don't run in a console and suddenly popping up another console window might be surprising to developers running the testsuite and it might not work at all in CI builds. To just test the new functionality in unit tests the code would need to be refactored. Some guidance on how to proceed would be appreciated. # Public API changes * `std::str::verifications::utf8_char_width()` would be exposed as `std::str::utf8_char_width()` behind the "str_internals" feature gate. # Related issues * Fixes rust-lang#83258. * PR rust-lang#78515 will exacerbate the problem. # Open questions * Add tests? * Squash into one commit with better commit message?
☀️ Test successful - checks-actions |
Pkgsrc changes: * Remove one now-longer-applicable patch, adjust a few others * Bump bootstrap requirements to 1.55.0. Upstream changes: Version 1.56.0 (2021-10-21) ======================== Language -------- - [The 2021 Edition is now stable.][rust#88100] See [the edition guide][rust-2021-edition-guide] for more details. - [The pattern in `binding @ pattern` can now also introduce new bindings.] [rust#85305] - [Union field access is permitted in `const fn`.][rust#85769] [rust-2021-edition-guide]: https://doc.rust-lang.org/nightly/edition-guide/rust-2021/index.html Compiler -------- - [Upgrade to LLVM 13.][rust#87570] - [Support memory, address, and thread sanitizers on aarch64-unknown-freebsd.][rust#88023] - [Allow specifying a deployment target version for all iOS targets][rust#87699] - [Warnings can be forced on with `--force-warn`.][rust#87472] This feature is primarily intended for usage by `cargo fix`, rather than end users. - [Promote `aarch64-apple-ios-sim` to Tier 2\*.][rust#87760] - [Add `powerpc-unknown-freebsd` at Tier 3\*.][rust#87370] - [Add `riscv32imc-esp-espidf` at Tier 3\*.][rust#87666] \* Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. Libraries --------- - [Allow writing of incomplete UTF-8 sequences via stdout/stderr on Windows.] [rust#83342] The Windows console still requires valid Unicode, but this change allows splitting a UTF-8 character across multiple write calls. This allows, for instance, programs that just read and write data buffers (e.g. copying a file to stdout) without regard for Unicode or character boundaries. - [Prefer `AtomicU{64,128}` over Mutex for Instant backsliding protection.] [rust#83093] For this use case, atomics scale much better under contention. - [Implement `Extend<(A, B)>` for `(Extend<A>, Extend<B>)`][rust#85835] - [impl Default, Copy, Clone for std::io::Sink and std::io::Empty][rust#86744] - [`impl From<[(K, V); N]>` for all collections.][rust#84111] - [Remove `P: Unpin` bound on impl Future for Pin.][rust#81363] - [Treat invalid environment variable names as non-existent.][rust#86183] Previously, the environment functions would panic if given a variable name with an internal null character or equal sign (`=`). Now, these functions will just treat such names as non-existent variables, since the OS cannot represent the existence of a variable with such a name. Stabilised APIs --------------- - [`std::os::unix::fs::chroot`] - [`UnsafeCell::raw_get`] - [`BufWriter::into_parts`] - [`core::panic::{UnwindSafe, RefUnwindSafe, AssertUnwindSafe}`] These APIs were previously stable in `std`, but are now also available in `core`. - [`Vec::shrink_to`] - [`String::shrink_to`] - [`OsString::shrink_to`] - [`PathBuf::shrink_to`] - [`BinaryHeap::shrink_to`] - [`VecDeque::shrink_to`] - [`HashMap::shrink_to`] - [`HashSet::shrink_to`] These APIs are now usable in const contexts: - [`std::mem::transmute`] - [`[T]::first`][`slice::first`] - [`[T]::split_first`][`slice::split_first`] - [`[T]::last`][`slice::last`] - [`[T]::split_last`][`slice::split_last`] Cargo ----- - [Cargo supports specifying a minimum supported Rust version in Cargo.toml.] [`rust-version`] This has no effect at present on dependency version selection. We encourage crates to specify their minimum supported Rust version, and we encourage CI systems that support Rust code to include a crate's specified minimum version in the text matrix for that crate by default. Compatibility notes ------------------- - [Update to new argument parsing rules on Windows.][rust#87580] This adjusts Rust's standard library to match the behavior of the standard libraries for C/C++. The rules have changed slightly over time, and this PR brings us to the latest set of rules (changed in 2008). - [Disallow the aapcs calling convention on aarch64][rust#88399] This was already not supported by LLVM; this change surfaces this lack of support with a better error message. - [Make `SEMICOLON_IN_EXPRESSIONS_FROM_MACROS` warn by default][rust#87385] - [Warn when an escaped newline skips multiple lines.][rust#87671] - [Calls to `libc::getpid` / `std::process::id` from `Command::pre_exec` may return different values on glibc <= 2.24.][rust#81825] Rust now invokes the `clone3` system call directly, when available, to use new functionality available via that system call. Older versions of glibc cache the result of `getpid`, and only update that cache when calling glibc's clone/fork functions, so a direct system call bypasses that cache update. glibc 2.25 and newer no longer cache `getpid` for exactly this reason. Internal changes ---------------- These changes provide no direct user facing benefits, but represent significant improvements to the internals and overall performance of rustc and related tools. - [LLVM is compiled with PGO in published x86_64-unknown-linux-gnu artifacts.][rust#88069] This improves the performance of most Rust builds. - [Unify representation of macros in internal data structures.][rust#88019] This change fixes a host of bugs with the handling of macros by the compiler, as well as rustdoc. [`std::os::unix::fs::chroot`]: https://doc.rust-lang.org/stable/std/os/unix/fs/fn.chroot.html [`Iterator::intersperse`]: https://doc.rust-lang.org/stable/std/iter/trait.Iterator.html#method.intersperse [`Iterator::intersperse_with`]: https://doc.rust-lang.org/stable/std/iter/trait.Iterator.html#method.intersperse [`UnsafeCell::raw_get`]: https://doc.rust-lang.org/stable/std/cell/struct.UnsafeCell.html#method.raw_get [`BufWriter::into_parts`]: https://doc.rust-lang.org/stable/std/io/struct.BufWriter.html#method.into_parts [`core::panic::{UnwindSafe, RefUnwindSafe, AssertUnwindSafe}`]: rust-lang/rust#84662 [`Vec::shrink_to`]: https://doc.rust-lang.org/stable/std/vec/struct.Vec.html#method.shrink_to [`String::shrink_to`]: https://doc.rust-lang.org/stable/std/string/struct.String.html#method.shrink_to [`OsString::shrink_to`]: https://doc.rust-lang.org/stable/std/ffi/struct.OsString.html#method.shrink_to [`PathBuf::shrink_to`]: https://doc.rust-lang.org/stable/std/path/struct.PathBuf.html#method.shrink_to [`BinaryHeap::shrink_to`]: https://doc.rust-lang.org/stable/std/collections/struct.BinaryHeap.html#method.shrink_to [`VecDeque::shrink_to`]: https://doc.rust-lang.org/stable/std/collections/struct.VecDeque.html#method.shrink_to [`HashMap::shrink_to`]: https://doc.rust-lang.org/stable/std/collections/hash_map/struct.HashMap.html#method.shrink_to [`HashSet::shrink_to`]: https://doc.rust-lang.org/stable/std/collections/hash_set/struct.HashSet.html#method.shrink_to [`std::mem::transmute`]: https://doc.rust-lang.org/stable/std/mem/fn.transmute.html [`slice::first`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.first [`slice::split_first`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_first [`slice::last`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.last [`slice::split_last`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_last [`rust-version`]: https://doc.rust-lang.org/nightly/cargo/reference/manifest.html#the-rust-version-field [rust#87671]: rust-lang/rust#87671 [rust#86183]: rust-lang/rust#86183 [rust#87385]: rust-lang/rust#87385 [rust#88100]: rust-lang/rust#88100 [rust#86860]: rust-lang/rust#86860 [rust#84039]: rust-lang/rust#84039 [rust#86492]: rust-lang/rust#86492 [rust#88363]: rust-lang/rust#88363 [rust#85305]: rust-lang/rust#85305 [rust#87832]: rust-lang/rust#87832 [rust#88069]: rust-lang/rust#88069 [rust#87472]: rust-lang/rust#87472 [rust#87699]: rust-lang/rust#87699 [rust#87570]: rust-lang/rust#87570 [rust#88023]: rust-lang/rust#88023 [rust#87760]: rust-lang/rust#87760 [rust#87370]: rust-lang/rust#87370 [rust#87580]: rust-lang/rust#87580 [rust#83342]: rust-lang/rust#83342 [rust#83093]: rust-lang/rust#83093 [rust#88177]: rust-lang/rust#88177 [rust#88548]: rust-lang/rust#88548 [rust#88551]: rust-lang/rust#88551 [rust#88299]: rust-lang/rust#88299 [rust#88220]: rust-lang/rust#88220 [rust#85835]: rust-lang/rust#85835 [rust#86879]: rust-lang/rust#86879 [rust#86744]: rust-lang/rust#86744 [rust#84662]: rust-lang/rust#84662 [rust#86593]: rust-lang/rust#86593 [rust#81050]: rust-lang/rust#81050 [rust#81363]: rust-lang/rust#81363 [rust#84111]: rust-lang/rust#84111 [rust#85769]: rust-lang/rust#85769 (comment) [rust#88490]: rust-lang/rust#88490 [rust#88269]: rust-lang/rust#88269 [rust#84176]: rust-lang/rust#84176 [rust#88399]: rust-lang/rust#88399 [rust#88227]: rust-lang/rust#88227 [rust#88200]: rust-lang/rust#88200 [rust#82776]: rust-lang/rust#82776 [rust#88077]: rust-lang/rust#88077 [rust#87728]: rust-lang/rust#87728 [rust#87050]: rust-lang/rust#87050 [rust#87619]: rust-lang/rust#87619 [rust#81825]: rust-lang/rust#81825 (comment) [rust#88019]: rust-lang/rust#88019 [rust#87666]: rust-lang/rust#87666
Pkgsrc changes: * Bump bootstrap kit version to 1.55.0. * Adjust patches as needed, some no longer apply (so removed) * Update checksum adjustments. * Avoid rust-llvm on SunOS * Optionally build docs * Remove reference to closed/old PR#54621 Upstream changes: Version 1.56.1 (2021-11-01) =========================== - New lints to detect the presence of bidirectional-override Unicode codepoints in the compiled source code ([CVE-2021-42574]) [CVE-2021-42574]: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-42574 Version 1.56.0 (2021-10-21) ======================== Language -------- - [The 2021 Edition is now stable.][rust#88100] See [the edition guide][rust-2021-edition-guide] for more details. - [The pattern in `binding @ pattern` can now also introduce new bindings.] [rust#85305] - [Union field access is permitted in `const fn`.][rust#85769] [rust-2021-edition-guide]: https://doc.rust-lang.org/nightly/edition-guide/rust-2021/index.html Compiler -------- - [Upgrade to LLVM 13.][rust#87570] - [Support memory, address, and thread sanitizers on aarch64-unknown-freebsd.] [rust#88023] - [Allow specifying a deployment target version for all iOS targets][rust#87699] - [Warnings can be forced on with `--force-warn`.][rust#87472] This feature is primarily intended for usage by `cargo fix`, rather than end users. - [Promote `aarch64-apple-ios-sim` to Tier 2\*.][rust#87760] - [Add `powerpc-unknown-freebsd` at Tier 3\*.][rust#87370] - [Add `riscv32imc-esp-espidf` at Tier 3\*.][rust#87666] \* Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. Libraries --------- - [Allow writing of incomplete UTF-8 sequences via stdout/stderr on Windows.] [rust#83342] The Windows console still requires valid Unicode, but this change allows splitting a UTF-8 character across multiple write calls. This allows, for instance, programs that just read and write data buffers (e.g. copying a file to stdout) without regard for Unicode or character boundaries. - [Prefer `AtomicU{64,128}` over Mutex for Instant backsliding protection.] [rust#83093] For this use case, atomics scale much better under contention. - [Implement `Extend<(A, B)>` for `(Extend<A>, Extend<B>)`][rust#85835] - [impl Default, Copy, Clone for std::io::Sink and std::io::Empty][rust#86744] - [`impl From<[(K, V); N]>` for all collections.][rust#84111] - [Remove `P: Unpin` bound on impl Future for Pin.][rust#81363] - [Treat invalid environment variable names as non-existent.][rust#86183] Previously, the environment functions would panic if given a variable name with an internal null character or equal sign (`=`). Now, these functions will just treat such names as non-existent variables, since the OS cannot represent the existence of a variable with such a name. Stabilised APIs --------------- - [`std::os::unix::fs::chroot`] - [`UnsafeCell::raw_get`] - [`BufWriter::into_parts`] - [`core::panic::{UnwindSafe, RefUnwindSafe, AssertUnwindSafe}`] These APIs were previously stable in `std`, but are now also available in `core`. - [`Vec::shrink_to`] - [`String::shrink_to`] - [`OsString::shrink_to`] - [`PathBuf::shrink_to`] - [`BinaryHeap::shrink_to`] - [`VecDeque::shrink_to`] - [`HashMap::shrink_to`] - [`HashSet::shrink_to`] These APIs are now usable in const contexts: - [`std::mem::transmute`] - [`[T]::first`][`slice::first`] - [`[T]::split_first`][`slice::split_first`] - [`[T]::last`][`slice::last`] - [`[T]::split_last`][`slice::split_last`] Cargo ----- - [Cargo supports specifying a minimum supported Rust version in Cargo.toml.] [`rust-version`] This has no effect at present on dependency version selection. We encourage crates to specify their minimum supported Rust version, and we encourage CI systems that support Rust code to include a crate's specified minimum version in the text matrix for that crate by default. Compatibility notes ------------------- - [Update to new argument parsing rules on Windows.][rust#87580] This adjusts Rust's standard library to match the behavior of the standard libraries for C/C++. The rules have changed slightly over time, and this PR brings us to the latest set of rules (changed in 2008). - [Disallow the aapcs calling convention on aarch64][rust#88399] This was already not supported by LLVM; this change surfaces this lack of support with a better error message. - [Make `SEMICOLON_IN_EXPRESSIONS_FROM_MACROS` warn by default][rust#87385] - [Warn when an escaped newline skips multiple lines.][rust#87671] - [Calls to `libc::getpid` / `std::process::id` from `Command::pre_exec` may return different values on glibc <= 2.24.][rust#81825] Rust now invokes the `clone3` system call directly, when available, to use new functionality available via that system call. Older versions of glibc cache the result of `getpid`, and only update that cache when calling glibc's clone/fork functions, so a direct system call bypasses that cache update. glibc 2.25 and newer no longer cache `getpid` for exactly this reason. Internal changes ---------------- These changes provide no direct user facing benefits, but represent significant improvements to the internals and overall performance of rustc and related tools. - [LLVM is compiled with PGO in published x86_64-unknown-linux-gnu artifacts.] [rust#88069] This improves the performance of most Rust builds. - [Unify representation of macros in internal data structures.][rust#88019] This change fixes a host of bugs with the handling of macros by the compiler, as well as rustdoc. [`std::os::unix::fs::chroot`]: https://doc.rust-lang.org/stable/std/os/unix/fs/fn.chroot.html [`Iterator::intersperse`]: https://doc.rust-lang.org/stable/std/iter/trait.Iterator.html#method.intersperse [`Iterator::intersperse_with`]: https://doc.rust-lang.org/stable/std/iter/trait.Iterator.html#method.intersperse [`UnsafeCell::raw_get`]: https://doc.rust-lang.org/stable/std/cell/struct.UnsafeCell.html#method.raw_get [`BufWriter::into_parts`]: https://doc.rust-lang.org/stable/std/io/struct.BufWriter.html#method.into_parts [`core::panic::{UnwindSafe, RefUnwindSafe, AssertUnwindSafe}`]: rust-lang/rust#84662 [`Vec::shrink_to`]: https://doc.rust-lang.org/stable/std/vec/struct.Vec.html#method.shrink_to [`String::shrink_to`]: https://doc.rust-lang.org/stable/std/string/struct.String.html#method.shrink_to [`OsString::shrink_to`]: https://doc.rust-lang.org/stable/std/ffi/struct.OsString.html#method.shrink_to [`PathBuf::shrink_to`]: https://doc.rust-lang.org/stable/std/path/struct.PathBuf.html#method.shrink_to [`BinaryHeap::shrink_to`]: https://doc.rust-lang.org/stable/std/collections/struct.BinaryHeap.html#method.shrink_to [`VecDeque::shrink_to`]: https://doc.rust-lang.org/stable/std/collections/struct.VecDeque.html#method.shrink_to [`HashMap::shrink_to`]: https://doc.rust-lang.org/stable/std/collections/hash_map/struct.HashMap.html#method.shrink_to [`HashSet::shrink_to`]: https://doc.rust-lang.org/stable/std/collections/hash_set/struct.HashSet.html#method.shrink_to [`std::mem::transmute`]: https://doc.rust-lang.org/stable/std/mem/fn.transmute.html [`slice::first`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.first [`slice::split_first`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_first [`slice::last`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.last [`slice::split_last`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_last [`rust-version`]: https://doc.rust-lang.org/nightly/cargo/reference/manifest.html#the-rust-version-field [rust#87671]: rust-lang/rust#87671 [rust#86183]: rust-lang/rust#86183 [rust#87385]: rust-lang/rust#87385 [rust#88100]: rust-lang/rust#88100 [rust#86860]: rust-lang/rust#86860 [rust#84039]: rust-lang/rust#84039 [rust#86492]: rust-lang/rust#86492 [rust#88363]: rust-lang/rust#88363 [rust#85305]: rust-lang/rust#85305 [rust#87832]: rust-lang/rust#87832 [rust#88069]: rust-lang/rust#88069 [rust#87472]: rust-lang/rust#87472 [rust#87699]: rust-lang/rust#87699 [rust#87570]: rust-lang/rust#87570 [rust#88023]: rust-lang/rust#88023 [rust#87760]: rust-lang/rust#87760 [rust#87370]: rust-lang/rust#87370 [rust#87580]: rust-lang/rust#87580 [rust#83342]: rust-lang/rust#83342 [rust#83093]: rust-lang/rust#83093 [rust#88177]: rust-lang/rust#88177 [rust#88548]: rust-lang/rust#88548 [rust#88551]: rust-lang/rust#88551 [rust#88299]: rust-lang/rust#88299 [rust#88220]: rust-lang/rust#88220 [rust#85835]: rust-lang/rust#85835 [rust#86879]: rust-lang/rust#86879 [rust#86744]: rust-lang/rust#86744 [rust#84662]: rust-lang/rust#84662 [rust#86593]: rust-lang/rust#86593 [rust#81050]: rust-lang/rust#81050 [rust#81363]: rust-lang/rust#81363 [rust#84111]: rust-lang/rust#84111 [rust#85769]: rust-lang/rust#85769 (comment) [rust#88490]: rust-lang/rust#88490 [rust#88269]: rust-lang/rust#88269 [rust#84176]: rust-lang/rust#84176 [rust#88399]: rust-lang/rust#88399 [rust#88227]: rust-lang/rust#88227 [rust#88200]: rust-lang/rust#88200 [rust#82776]: rust-lang/rust#82776 [rust#88077]: rust-lang/rust#88077 [rust#87728]: rust-lang/rust#87728 [rust#87050]: rust-lang/rust#87050 [rust#87619]: rust-lang/rust#87619 [rust#81825]: rust-lang/rust#81825 (comment) [rust#88019]: rust-lang/rust#88019 [rust#87666]: rust-lang/rust#87666 Version 1.55.0 (2021-09-09) ============================ Language -------- - [You can now write open "from" range patterns (`X..`), which will start at `X` and will end at the maximum value of the integer.][83918] - [You can now explicitly import the prelude of different editions through `std::prelude` (e.g. `use std::prelude::rust_2021::*;`).][86294] Compiler -------- - [Added tier 3\* support for `powerpc64le-unknown-freebsd`.][83572] \* Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. Libraries --------- - [Updated std's float parsing to use the Eisel-Lemire algorithm.][86761] These improvements should in general provide faster string parsing of floats, no longer reject certain valid floating point values, and reduce the produced code size for non-stripped artifacts. - [`string::Drain` now implements `AsRef<str>` and `AsRef<[u8]>`.][86858] Stabilised APIs --------------- - [`Bound::cloned`] - [`Drain::as_str`] - [`IntoInnerError::into_error`] - [`IntoInnerError::into_parts`] - [`MaybeUninit::assume_init_mut`] - [`MaybeUninit::assume_init_ref`] - [`MaybeUninit::write`] - [`array::map`] - [`ops::ControlFlow`] - [`x86::_bittest`] - [`x86::_bittestandcomplement`] - [`x86::_bittestandreset`] - [`x86::_bittestandset`] - [`x86_64::_bittest64`] - [`x86_64::_bittestandcomplement64`] - [`x86_64::_bittestandreset64`] - [`x86_64::_bittestandset64`] The following previously stable functions are now `const`. - [`str::from_utf8_unchecked`] Cargo ----- - [Cargo will now deduplicate compiler diagnostics to the terminal when invoking rustc in parallel such as when using `cargo test`.][cargo/9675] - [The package definition in `cargo metadata` now includes the `"default_run"` field from the manifest.][cargo/9550] - [Added `cargo d` as an alias for `cargo doc`.][cargo/9680] - [Added `{lib}` as formatting option for `cargo tree` to print the `"lib_name"` of packages.][cargo/9663] Rustdoc ------- - [Added "Go to item on exact match" search option.][85876] - [The "Implementors" section on traits no longer shows redundant method definitions.][85970] - [Trait implementations are toggled open by default.][86260] This should make the implementations more searchable by tools like `CTRL+F` in your browser. - [Intra-doc links should now correctly resolve associated items (e.g. methods) through type aliases.][86334] - [Traits which are marked with `#[doc(hidden)]` will no longer appear in the "Trait Implementations" section.][86513] Compatibility Notes ------------------- - [std functions that return an `io::Error` will no longer use the `ErrorKind::Other` variant.][85746] This is to better reflect that these kinds of errors could be categorised [into newer more specific `ErrorKind` variants][79965], and that they do not represent a user error. - [Using environment variable names with `process::Command` on Windows now behaves as expected.][85270] Previously using envionment variables with `Command` would cause them to be ASCII-uppercased. - [Rustdoc will now warn on using rustdoc lints that aren't prefixed with `rustdoc::`][86849] [86849]: rust-lang/rust#86849 [86513]: rust-lang/rust#86513 [86334]: rust-lang/rust#86334 [86260]: rust-lang/rust#86260 [85970]: rust-lang/rust#85970 [85876]: rust-lang/rust#85876 [83572]: rust-lang/rust#83572 [86294]: rust-lang/rust#86294 [86858]: rust-lang/rust#86858 [86761]: rust-lang/rust#86761 [85769]: rust-lang/rust#85769 [85746]: rust-lang/rust#85746 [85305]: rust-lang/rust#85305 [85270]: rust-lang/rust#85270 [84111]: rust-lang/rust#84111 [83918]: rust-lang/rust#83918 [79965]: rust-lang/rust#79965 [87370]: rust-lang/rust#87370 [87298]: rust-lang/rust#87298 [cargo/9663]: rust-lang/cargo#9663 [cargo/9675]: rust-lang/cargo#9675 [cargo/9550]: rust-lang/cargo#9550 [cargo/9680]: rust-lang/cargo#9680 [cargo/9663]: rust-lang/cargo#9663 [`array::map`]: https://doc.rust-lang.org/stable/std/primitive.array.html#method.map [`Bound::cloned`]: https://doc.rust-lang.org/stable/std/ops/enum.Bound.html#method.cloned [`Drain::as_str`]: https://doc.rust-lang.org/stable/std/string/struct.Drain.html#method.as_str [`IntoInnerError::into_error`]: https://doc.rust-lang.org/stable/std/io/struct.IntoInnerError.html#method.into_error [`IntoInnerError::into_parts`]: https://doc.rust-lang.org/stable/std/io/struct.IntoInnerError.html#method.into_parts [`MaybeUninit::assume_init_mut`]: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#method.assume_init_mut [`MaybeUninit::assume_init_ref`]: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#method.assume_init_ref [`MaybeUninit::write`]: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#method.write [`Seek::rewind`]: https://doc.rust-lang.org/stable/std/io/trait.Seek.html#method.rewind [`ops::ControlFlow`]: https://doc.rust-lang.org/stable/std/ops/enum.ControlFlow.html [`str::from_utf8_unchecked`]: https://doc.rust-lang.org/stable/std/str/fn.from_utf8_unchecked.html [`x86::_bittest`]: https://doc.rust-lang.org/stable/core/arch/x86/fn._bittest.html [`x86::_bittestandcomplement`]: https://doc.rust-lang.org/stable/core/arch/x86/fn._bittestandcomplement.html [`x86::_bittestandreset`]: https://doc.rust-lang.org/stable/core/arch/x86/fn._bittestandreset.html [`x86::_bittestandset`]: https://doc.rust-lang.org/stable/core/arch/x86/fn._bittestandset.html [`x86_64::_bittest64`]: https://doc.rust-lang.org/stable/core/arch/x86_64/fn._bittest64.html [`x86_64::_bittestandcomplement64`]: https://doc.rust-lang.org/stable/core/arch/x86_64/fn._bittestandcomplement64.html [`x86_64::_bittestandreset64`]: https://doc.rust-lang.org/stable/core/arch/x86_64/fn._bittestandreset64.html [`x86_64::_bittestandset64`]: https://doc.rust-lang.org/stable/core/arch/x86_64/fn._bittestandset64.html
Problem
Writes of just an incomplete UTF-8 byte sequence (e.g.
b"\xC3"
orb"\xF0\x9F"
) to stdout/stderr with a Windows console attached error withio::ErrorKind::InvalidData, "Windows stdio in console mode does not support writing non-UTF-8 byte sequences"
even though further writes could complete the codepoint. This is currently a rare occurence since the linewritershim implementation flushes complete lines immediately and buffers up to 1024 bytes for incomplete lines. It can still happen as described in #83258.The problem will become more pronounced once the developer can switch stdout/stderr from line-buffered to block-buffered or immediate when the changes in the "Switchable buffering for Stdout" pull request (#78515) get merged.
Patch description
If there is at least one valid UTF-8 codepoint all valid UTF-8 is passed through to the extracted
write_valid_utf8_to_console()
fn. The new code only comes into play ifwrite()
is being passed a short byte slice comprising an incomplete UTF-8 codepoint. In this case up to three bytes are buffered in theIncompleteUtf8
struct associated withStdout
/Stderr
. The bytes are accepted one at a time. As soon as an error can be detectedio::ErrorKind::InvalidData, "Windows stdio in console mode does not support writing non-UTF-8 byte sequences"
is returned. Once a complete UTF-8 codepoint is received it is passed to thewrite_valid_utf8_to_console()
and the buffer length is set to zero.Calling
flush()
will neither error nor write anything if an incomplete codepoint is present in the buffer.Tests
Currently there are no Windows-specific tests for console writing code at all. Writing (regression) tests for this problem is a bit challenging since unit tests and UI tests don't run in a console and suddenly popping up another console window might be surprising to developers running the testsuite and it might not work at all in CI builds. To just test the new functionality in unit tests the code would need to be refactored. Some guidance on how to proceed would be appreciated.
Public API changes
std::str::verifications::utf8_char_width()
would be exposed asstd::str::utf8_char_width()
behind the "str_internals" feature gate.Related issues
Open questions