Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce formatting width and precision to 16 bits #136932

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented Feb 12, 2025

This is part of #99012

This is reduces the width and precision fields in format strings to 16 bits. They are currently full usizes, but it's a bit nonsensical that we need to support the case where someone wants to pad their value to eighteen quintillion spaces and/or have eighteen quintillion digits of precision.

By reducing these fields to 16 bit, we can reduce FormattingOptions to 64 bits (see #136974) and improve the in memory representation of format_args!(). (See additional context below.)

This also fixes a bug where the width or precision is silently truncated when cross-compiling to a target with a smaller usize. By reducing the width and precision fields to the minimum guaranteed size of usize, 16 bits, this bug is eliminated.

This is a breaking change, but affects almost no existing code.


Details of this change:

There are three ways to set a width or precision today:

  1. Directly a formatting string, e.g. println!("{a:1234}")
  2. Indirectly in a formatting string, e.g. println!("{a:width$}", width=1234)
  3. Through the unstable FormattingOptions::width method.

This PR:

  • Adds a compiler error for 1. (println!("{a:9999999}") no longer compiles and gives a clear error.)
  • Adds a runtime check for 2. (println!("{a:width$}, width=9999999) will panic.)
  • Changes the signatures of the (unstable) FormattingOptions::[get_]width methods to use a u16 instead.

Additional context for improving FormattingOptions and fmt::Arguments:

All the formatting flags and options are currently:

  • The + flag (1 bit)
  • The - flag (1 bit)
  • The # flag (1 bit)
  • The 0 flag (1 bit)
  • The x? flag (1 bit)
  • The X? flag (1 bit)
  • The alignment (2 bits)
  • The fill character (21 bits)
  • Whether a width is specified (1 bit)
  • Whether a precision is specified (1 bit)
  • If used, the width (a full usize)
  • If used, the precision (a full usize)

Everything except the last two can simply fit in a u32 (those add up to 31 bits in total).

If we can accept a max width and precision of u16::MAX, we can make a FormattingOptions that is exactly 64 bits in size; the same size as a thin reference on most platforms.

If, additionally, we also limit the number of formatting arguments, we can also reduce the size of fmt::Arguments (that is, of a format_args!() expression).

@m-ou-se m-ou-se added T-libs Relevant to the library team, which will review and decide on the PR/issue. A-fmt Area: `core::fmt` S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. labels Feb 12, 2025
@m-ou-se m-ou-se self-assigned this Feb 12, 2025
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 12, 2025
@m-ou-se m-ou-se removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 12, 2025
@m-ou-se
Copy link
Member Author

m-ou-se commented Feb 12, 2025

@bors try

@bors
Copy link
Contributor

bors commented Feb 12, 2025

⌛ Trying commit 7355738 with merge 7af7790...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 12, 2025
…try>

Reduce formatting `width` and `precision` to 16 bits

This is reduces the `width` and `precision` fields in format strings to 16 bits. They are currently full `usize`s, but it's a bit nonsensical that we need to support the case where someone wants to pad their value to eighteen quintillion spaces and/or have eighteen quintillion digits of precision.

This is a breaking change, but probably affects virtually no code. Let's do a crater run to find out. Marking this as experiment for now.

---

There are three ways to set a width or precision today:

1. Directly a formatting string, e.g. `println!("{a:1234}")`
2. Indirectly in a formatting string, e.g. `println!("{a:width$}", width=1234)`
3. Through the unstable `FormattingOptions::width` method.

This PR:

- Adds a compiler error for 1. (`println!("{a:9999999}")` no longer compiles and gives a clear error.)
- Adds a runtime check for 2. (`println!("{a:width$}, width=9999999)` will panic.)
- Changes the signature of `FormattingOptions::width` to take a `u16` instead.

---

Additional context:

All the formatting flags and options are currently:

- The `+` flag (1 bit)
- The `-` flag (1 bit)
- The `#` flag (1 bit)
- The `0` flag (1 bit)
- The `x?` flag (1 bit)
- The `X?` flag (1 bit)
- The alignment (2 bits)
- The fill character (21 bits)
- Whether a width is specified (1 bit)
- Whether a precision is specified (1 bit)
- If used, the width (a full usize)
- If used, the precision (a full usize)

Everything except the last two can simply fit in a `u32` (those add up to 31 bits in total).

If we can accept a max width and precision of u16::MAX, we can make a `FormattingOptions` that is exactly 64 bits in size; the same size as a thin reference on most platforms.

If, additionally, we also limit the number of formatting arguments to u16::MAX, we can also reduce the size of `fmt::Arguments` (that is, of a `format_args!()` expression).
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Feb 12, 2025

☀️ Try build successful - checks-actions
Build commit: 7af7790 (7af779037716ae4125ceabb429791b4cf5dd0a43)

@m-ou-se
Copy link
Member Author

m-ou-se commented Feb 12, 2025

@craterbot run mode=build-and-test

@craterbot
Copy link
Collaborator

👌 Experiment pr-136932 created and queued.
🤖 Automatically detected try build 7af7790
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-crater Status: Waiting on a crater run to be completed. and removed S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. labels Feb 12, 2025
@craterbot
Copy link
Collaborator

🚧 Experiment pr-136932 is now running

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@m-ou-se m-ou-se force-pushed the fmt-width-precision-u16 branch from 7355738 to ccb9429 Compare February 12, 2025 20:05
@oxalica
Copy link
Contributor

oxalica commented Feb 13, 2025

This is a breaking change, but probably affects virtually no code. Let's do a crater run to find out. Marking this as experiment for now.

As a data point, I do use usize width for indentation generation of pretty-printer in some of my crates.
It's used as a shortcut for " ".repeat(width) without allocation (at least I think so?). I can accept to enforce u16 here because nobody actually use 65536 indentation levels, but usize is definitely a more natural choice for "length" semantic than u16.

write!(self.w, "\n{:1$}]", "", self.cur_indent)?;
write!(f, "{:len$}", "", len = (stride - 1) * 4)?;

@m-ou-se
Copy link
Member Author

m-ou-se commented Feb 13, 2025

The type will stay a usize; changing that would break a lot of things. This PR adds a runtime check (panic) that the usize you give it isn't above u16::MAX. (And then stores everything internally as a u16.)

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 13, 2025
Reduce FormattingOptions to 64 bits

This reduces FormattingOptions from 6-7 machine words (384 bits on 64-bit platforms, 224 bits on 32-bit platforms) to just 64 bits (a single register on 64-bit platforms).

This PR includes rust-lang#136932, which reduces the width and precision options to 16 bits, to make it all fit.

Before:

```rust
pub struct FormattingOptions {
    flags: u32, // only 6 bits used
    fill: char,
    align: Option<Alignment>,
    width: Option<usize>,
    precision: Option<usize>,
}
```

After:

```rust
pub struct FormattingOptions {
    /// Bits:
    ///  - 0: `+` flag [rt::Flag::SignPlus]
    ///  - 1: `-` flag [rt::Flag::SignMinus]
    ///  - 2: `#` flag [rt::Flag::Alternate]
    ///  - 3: `0` flag [rt::Flag::SignAwareZeroPad]
    ///  - 4: `x?` flag [rt::Flag::DebugLowerHex]
    ///  - 5: `X?` flag [rt::Flag::DebugUpperHex]
    ///  - 6-7: Alignment (0: Left, 1: Right, 2: Center, 3: Unknown)
    ///  - 8: Width flag (if set, the width field below is used)
    ///  - 9: Precision flag (if set, the precision field below is used)
    ///  - 10: unused
    ///  - 11-31: fill character (21 bits, a full `char`)
    flags: u32,
    /// Width if width flag above is set. Otherwise, always 0.
    width: u16,
    /// Precision if precision flag above is set. Otherwise, always 0.
    precision: u16,
}
```
@craterbot
Copy link
Collaborator

🎉 Experiment pr-136932 is completed!
📊 172 regressed and 113 fixed (582049 total)
📰 Open the full report.

⚠️ If you notice any spurious failure please add them to the denylist!
ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Feb 13, 2025
@m-ou-se
Copy link
Member Author

m-ou-se commented Feb 14, 2025

Crater results: Only two regressions are caused by an error or panic from this change.

  1. gh/vasekp/stream-rust - Runtime panic - Using a value close to usize::MAX as precision - Update: fixed
  2. gh/uutils/coreutils - Compiler error in a unit test - Update: fixed

@m-ou-se m-ou-se added I-libs-nominated Nominated for discussion during a libs team meeting. I-libs-api-nominated Nominated for discussion during a libs-api team meeting. labels Feb 14, 2025
@m-ou-se m-ou-se marked this pull request as ready for review February 14, 2025 08:45
@m-ou-se
Copy link
Member Author

m-ou-se commented Feb 14, 2025

This reduces the width and precision flags from a usize to a u16, with the goal to reduce FormattingOptions to 64 bits.

This is technically a breaking change, in two ways:

  1. format!("{a:9999999}") will become a compiler error.
  2. format!("{a:width$}") with let width=999999; will become a runtime panic.

This is expected to have virtually no impact on real world code. See crater run results above: #136932 (comment)

@rfcbot merge

@rfcbot
Copy link

rfcbot commented Feb 14, 2025

Team member @m-ou-se has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Feb 14, 2025
@m-ou-se m-ou-se added the relnotes Marks issues that should be documented in the release notes of the next release. label Feb 14, 2025
@m-ou-se m-ou-se removed I-libs-nominated Nominated for discussion during a libs team meeting. I-libs-api-nominated Nominated for discussion during a libs-api team meeting. labels Feb 14, 2025
@m-ou-se
Copy link
Member Author

m-ou-se commented Feb 14, 2025

Fun fact: This also fixes a previously unknown bug that can occur when you run rustc on a 64-bit platform but target a 32-bit platform.

$ cat src/main.rs 
fn main() {
    println!("[{:4294967306}]", "hello");
}
$ cargo run --target=i686-unknown-linux-gnu 
   Compiling scratchpad v0.1.0
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.26s
     Running `target/i686-unknown-linux-gnu/debug/scratchpad`
[hello     ]

A width of 4294967306 was accepted because it fits in the compiler's usize, but it is then silently truncated to 10 when casted to the target usize later, resulting in much less padding than requested. :)

@joshtriplett
Copy link
Member

@m-ou-se I updated the release notes based on that bug, which I think makes this change even more well-justified.

@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Feb 15, 2025
@rfcbot
Copy link

rfcbot commented Feb 15, 2025

🔔 This is now entering its final comment period, as per the review above. 🔔

@Amanieu Amanieu removed the S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). label Feb 18, 2025
vasekp added a commit to vasekp/stream-rust that referenced this pull request Feb 19, 2025
@m-ou-se m-ou-se force-pushed the fmt-width-precision-u16 branch from ccb9429 to b61458a Compare February 20, 2025 16:33
@matthieu-m
Copy link
Contributor

Has any thought been given to reducing the maximum width/precision to u16::MAX - 1 rather than u16::MAX?

In terms of breakage, I'd expect it'd be mostly similar: 65534 or 65535 is about one and the same.

On the other hand, from a "compaction" point of view, this allows using NonMaxU16 instead of u16 + presence bit, thereby shaving off two bits.

Even if not changing to NonMaxU16 internally, it may be worth just setting the maximum value to u16::MAX - 1, to reverse the possibility to switch the internal representation later on without further impacting the API.

@m-ou-se
Copy link
Member Author

m-ou-se commented Feb 20, 2025

Has any thought been given to reducing the maximum width/precision to u16::MAX - 1 rather than u16::MAX?

Yes, I've considered that.

I agree that it'd also work fine, but here are the reasons I didn't go for that option:

  • We have three bits left over if we pack all the flags/alignment/fill in 32 bits, so we're not really saving any space.
  • We don't actually have a NonMaxU16 type. So the FormattingOptions methods would still need to take/return a regular u16 and have a runtime check. If we accept all u16, then FormattingOptions::width/precision won't need any runtime checks.
  • With those two bits part of the 32-bit flags field, one can check if a FormattingOptions is fully set to defaults by only checking that one 32-bit field, ignoring the width and precision fields. I don't know if that's very useful, but it seems it might be a nice property.
  • It might be more surprising/unexpected if the limit is one less than u16::MAX. u16::MAX makes sense becaues it's the lowest guaranteed usize::MAX. But u16::MAX-1 feels more arbitrary, although I agree that it's unlikely to ever matter in real world code.
  • There are a few cases where an unset width and a width of zero result in the same behaviour. In those cases, it could be nice if you can just ignore the 'width set' flag and just use the 'width' field directly (which will store zero when unused). (Although that might only save one or two instructions. 🤷‍♀️)
  • Initializing the width and precision fields to zero rather than 0xFFFF might also save an instruciton or two. (The 32-bit flags field will be nonzero anyway, if it includes the ' ' fill character.) With the width and precision fields set to zero, an entire default FormattingOptions can be loaded into a 64-bit register in a single ARM64 instruction.

All of these arguments are quite weak. But unless there are clear benefits for u16::MAX - 1, I think it's simpler to stick to the full u16 range.

@tspiteri
Copy link
Contributor

tspiteri commented Feb 22, 2025

While printing primitives doesn't need large width and precision, they can sometimes be used when dealing with arbitrary precision. Would this code to get 100,000 digits of pi break?

    let pi = rug::Float::with_val(332200, rug::float::Constant::Pi);
    let pi_s = format!("{pi:.100000}");

Edit: for what it's worth, Rug does have an alternative method, so that code could be changed to let pi_s = pi.to_string_radix(10, Some(100_000));, but this change would still make the Rust formatting system less suitable for arbitrary precision numbers.

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Feb 25, 2025
@rfcbot
Copy link

rfcbot commented Feb 25, 2025

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@m-ou-se
Copy link
Member Author

m-ou-se commented Feb 25, 2025

While printing primitives doesn't need large width and precision, they can sometimes be used when dealing with arbitrary precision. Would this code to get 100,000 digits of pi break?

    let pi = rug::Float::with_val(332200, rug::float::Constant::Pi);
    let pi_s = format!("{pi:.100000}");

That code will give a compiler error:

error: invalid format string: integer `100000` does not fit into the type `u16` whose range is `0..=65535`
  |
  | let pi_s = format!("{pi:.100000}");
  |                          ^^^^^^ integer out of range for `u16` in format string

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-fmt Area: `core::fmt` disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. to-announce Announce this issue on triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.