Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce formatting width and precision to 16 bits #136932

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented Feb 12, 2025

This is part of #99012

This is reduces the width and precision fields in format strings to 16 bits. They are currently full usizes, but it's a bit nonsensical that we need to support the case where someone wants to pad their value to eighteen quintillion spaces and/or have eighteen quintillion digits of precision.

By reducing these fields to 16 bit, we can reduce FormattingOptions to 64 bits (see #136974) and improve the in memory representation of format_args!(). (See additional context below.)

This also fixes a bug where the width or precision is silently truncated when cross-compiling to a target with a smaller usize. By reducing the width and precision fields to the minimum guaranteed size of usize, 16 bits, this bug is eliminated.

This is a breaking change, but affects almost no existing code.


Details of this change:

There are three ways to set a width or precision today:

  1. Directly a formatting string, e.g. println!("{a:1234}")
  2. Indirectly in a formatting string, e.g. println!("{a:width$}", width=1234)
  3. Through the unstable FormattingOptions::width method.

This PR:

  • Adds a compiler error for 1. (println!("{a:9999999}") no longer compiles and gives a clear error.)
  • Adds a runtime check for 2. (println!("{a:width$}, width=9999999) will panic.)
  • Changes the signatures of the (unstable) FormattingOptions::[get_]width methods to use a u16 instead.

Additional context for improving FormattingOptions and fmt::Arguments:

All the formatting flags and options are currently:

  • The + flag (1 bit)
  • The - flag (1 bit)
  • The # flag (1 bit)
  • The 0 flag (1 bit)
  • The x? flag (1 bit)
  • The X? flag (1 bit)
  • The alignment (2 bits)
  • The fill character (21 bits)
  • Whether a width is specified (1 bit)
  • Whether a precision is specified (1 bit)
  • If used, the width (a full usize)
  • If used, the precision (a full usize)

Everything except the last two can simply fit in a u32 (those add up to 31 bits in total).

If we can accept a max width and precision of u16::MAX, we can make a FormattingOptions that is exactly 64 bits in size; the same size as a thin reference on most platforms.

If, additionally, we also limit the number of formatting arguments, we can also reduce the size of fmt::Arguments (that is, of a format_args!() expression).

@m-ou-se m-ou-se added T-libs Relevant to the library team, which will review and decide on the PR/issue. A-fmt Area: `core::fmt` S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. labels Feb 12, 2025
@m-ou-se m-ou-se self-assigned this Feb 12, 2025
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 12, 2025
@m-ou-se m-ou-se removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 12, 2025
@m-ou-se
Copy link
Member Author

m-ou-se commented Feb 12, 2025

@bors try

@bors
Copy link
Contributor

bors commented Feb 12, 2025

⌛ Trying commit 7355738 with merge 7af7790...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 12, 2025
…try>

Reduce formatting `width` and `precision` to 16 bits

This is reduces the `width` and `precision` fields in format strings to 16 bits. They are currently full `usize`s, but it's a bit nonsensical that we need to support the case where someone wants to pad their value to eighteen quintillion spaces and/or have eighteen quintillion digits of precision.

This is a breaking change, but probably affects virtually no code. Let's do a crater run to find out. Marking this as experiment for now.

---

There are three ways to set a width or precision today:

1. Directly a formatting string, e.g. `println!("{a:1234}")`
2. Indirectly in a formatting string, e.g. `println!("{a:width$}", width=1234)`
3. Through the unstable `FormattingOptions::width` method.

This PR:

- Adds a compiler error for 1. (`println!("{a:9999999}")` no longer compiles and gives a clear error.)
- Adds a runtime check for 2. (`println!("{a:width$}, width=9999999)` will panic.)
- Changes the signature of `FormattingOptions::width` to take a `u16` instead.

---

Additional context:

All the formatting flags and options are currently:

- The `+` flag (1 bit)
- The `-` flag (1 bit)
- The `#` flag (1 bit)
- The `0` flag (1 bit)
- The `x?` flag (1 bit)
- The `X?` flag (1 bit)
- The alignment (2 bits)
- The fill character (21 bits)
- Whether a width is specified (1 bit)
- Whether a precision is specified (1 bit)
- If used, the width (a full usize)
- If used, the precision (a full usize)

Everything except the last two can simply fit in a `u32` (those add up to 31 bits in total).

If we can accept a max width and precision of u16::MAX, we can make a `FormattingOptions` that is exactly 64 bits in size; the same size as a thin reference on most platforms.

If, additionally, we also limit the number of formatting arguments to u16::MAX, we can also reduce the size of `fmt::Arguments` (that is, of a `format_args!()` expression).
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Feb 12, 2025

☀️ Try build successful - checks-actions
Build commit: 7af7790 (7af779037716ae4125ceabb429791b4cf5dd0a43)

@m-ou-se
Copy link
Member Author

m-ou-se commented Feb 12, 2025

@craterbot run mode=build-and-test

@craterbot
Copy link
Collaborator

👌 Experiment pr-136932 created and queued.
🤖 Automatically detected try build 7af7790
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-crater Status: Waiting on a crater run to be completed. and removed S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. labels Feb 12, 2025
@craterbot
Copy link
Collaborator

🚧 Experiment pr-136932 is now running

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@m-ou-se m-ou-se force-pushed the fmt-width-precision-u16 branch from 7355738 to ccb9429 Compare February 12, 2025 20:05
@oxalica
Copy link
Contributor

oxalica commented Feb 13, 2025

This is a breaking change, but probably affects virtually no code. Let's do a crater run to find out. Marking this as experiment for now.

As a data point, I do use usize width for indentation generation of pretty-printer in some of my crates.
It's used as a shortcut for " ".repeat(width) without allocation (at least I think so?). I can accept to enforce u16 here because nobody actually use 65536 indentation levels, but usize is definitely a more natural choice for "length" semantic than u16.

write!(self.w, "\n{:1$}]", "", self.cur_indent)?;
write!(f, "{:len$}", "", len = (stride - 1) * 4)?;

@m-ou-se
Copy link
Member Author

m-ou-se commented Feb 13, 2025

The type will stay a usize; changing that would break a lot of things. This PR adds a runtime check (panic) that the usize you give it isn't above u16::MAX. (And then stores everything internally as a u16.)

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 13, 2025
Reduce FormattingOptions to 64 bits

This reduces FormattingOptions from 6-7 machine words (384 bits on 64-bit platforms, 224 bits on 32-bit platforms) to just 64 bits (a single register on 64-bit platforms).

This PR includes rust-lang#136932, which reduces the width and precision options to 16 bits, to make it all fit.

Before:

```rust
pub struct FormattingOptions {
    flags: u32, // only 6 bits used
    fill: char,
    align: Option<Alignment>,
    width: Option<usize>,
    precision: Option<usize>,
}
```

After:

```rust
pub struct FormattingOptions {
    /// Bits:
    ///  - 0: `+` flag [rt::Flag::SignPlus]
    ///  - 1: `-` flag [rt::Flag::SignMinus]
    ///  - 2: `#` flag [rt::Flag::Alternate]
    ///  - 3: `0` flag [rt::Flag::SignAwareZeroPad]
    ///  - 4: `x?` flag [rt::Flag::DebugLowerHex]
    ///  - 5: `X?` flag [rt::Flag::DebugUpperHex]
    ///  - 6-7: Alignment (0: Left, 1: Right, 2: Center, 3: Unknown)
    ///  - 8: Width flag (if set, the width field below is used)
    ///  - 9: Precision flag (if set, the precision field below is used)
    ///  - 10: unused
    ///  - 11-31: fill character (21 bits, a full `char`)
    flags: u32,
    /// Width if width flag above is set. Otherwise, always 0.
    width: u16,
    /// Precision if precision flag above is set. Otherwise, always 0.
    precision: u16,
}
```
@craterbot
Copy link
Collaborator

🎉 Experiment pr-136932 is completed!
📊 172 regressed and 113 fixed (582049 total)
📰 Open the full report.

⚠️ If you notice any spurious failure please add them to the denylist!
ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Feb 13, 2025
@m-ou-se
Copy link
Member Author

m-ou-se commented Feb 14, 2025

Crater results: Only two regressions are caused by an error or panic from this change.

  1. gh/vasekp/stream-rust - Runtime panic - Using a value close to usize::MAX as precision - Update: fixed
  2. gh/uutils/coreutils - Compiler error in a unit test - Update: fixed

@m-ou-se m-ou-se added I-libs-nominated Nominated for discussion during a libs team meeting. I-libs-api-nominated Nominated for discussion during a libs-api team meeting. labels Feb 14, 2025
@m-ou-se m-ou-se marked this pull request as ready for review February 14, 2025 08:45
@rfcbot rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Feb 15, 2025
@rfcbot
Copy link

rfcbot commented Feb 15, 2025

🔔 This is now entering its final comment period, as per the review above. 🔔

@Amanieu Amanieu removed the S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). label Feb 18, 2025
vasekp added a commit to vasekp/stream-rust that referenced this pull request Feb 19, 2025
@m-ou-se m-ou-se force-pushed the fmt-width-precision-u16 branch from ccb9429 to b61458a Compare February 20, 2025 16:33
@matthieu-m
Copy link
Contributor

Has any thought been given to reducing the maximum width/precision to u16::MAX - 1 rather than u16::MAX?

In terms of breakage, I'd expect it'd be mostly similar: 65534 or 65535 is about one and the same.

On the other hand, from a "compaction" point of view, this allows using NonMaxU16 instead of u16 + presence bit, thereby shaving off two bits.

Even if not changing to NonMaxU16 internally, it may be worth just setting the maximum value to u16::MAX - 1, to reverse the possibility to switch the internal representation later on without further impacting the API.

@m-ou-se
Copy link
Member Author

m-ou-se commented Feb 20, 2025

Has any thought been given to reducing the maximum width/precision to u16::MAX - 1 rather than u16::MAX?

Yes, I've considered that.

I agree that it'd also work fine, but here are the reasons I didn't go for that option:

  • We have three bits left over if we pack all the flags/alignment/fill in 32 bits, so we're not really saving any space.
  • We don't actually have a NonMaxU16 type. So the FormattingOptions methods would still need to take/return a regular u16 and have a runtime check. If we accept all u16, then FormattingOptions::width/precision won't need any runtime checks.
  • With those two bits part of the 32-bit flags field, one can check if a FormattingOptions is fully set to defaults by only checking that one 32-bit field, ignoring the width and precision fields. I don't know if that's very useful, but it seems it might be a nice property.
  • It might be more surprising/unexpected if the limit is one less than u16::MAX. u16::MAX makes sense becaues it's the lowest guaranteed usize::MAX. But u16::MAX-1 feels more arbitrary, although I agree that it's unlikely to ever matter in real world code.
  • There are a few cases where an unset width and a width of zero result in the same behaviour. In those cases, it could be nice if you can just ignore the 'width set' flag and just use the 'width' field directly (which will store zero when unused). (Although that might only save one or two instructions. 🤷‍♀️)
  • Initializing the width and precision fields to zero rather than 0xFFFF might also save an instruciton or two. (The 32-bit flags field will be nonzero anyway, if it includes the ' ' fill character.) With the width and precision fields set to zero, an entire default FormattingOptions can be loaded into a 64-bit register in a single ARM64 instruction.

All of these arguments are quite weak. But unless there are clear benefits for u16::MAX - 1, I think it's simpler to stick to the full u16 range.

@tspiteri
Copy link
Contributor

tspiteri commented Feb 22, 2025

While printing primitives doesn't need large width and precision, they can sometimes be used when dealing with arbitrary precision. Would this code to get 100,000 digits of pi break?

    let pi = rug::Float::with_val(332200, rug::float::Constant::Pi);
    let pi_s = format!("{pi:.100000}");

Edit: for what it's worth, Rug does have an alternative method, so that code could be changed to let pi_s = pi.to_string_radix(10, Some(100_000));, but this change would still make the Rust formatting system less suitable for arbitrary precision numbers.

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Feb 25, 2025
@rfcbot
Copy link

rfcbot commented Feb 25, 2025

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@m-ou-se
Copy link
Member Author

m-ou-se commented Feb 25, 2025

While printing primitives doesn't need large width and precision, they can sometimes be used when dealing with arbitrary precision. Would this code to get 100,000 digits of pi break?

    let pi = rug::Float::with_val(332200, rug::float::Constant::Pi);
    let pi_s = format!("{pi:.100000}");

That code will give a compiler error:

error: invalid format string: integer `100000` does not fit into the type `u16` whose range is `0..=65535`
  |
  | let pi_s = format!("{pi:.100000}");
  |                          ^^^^^^ integer out of range for `u16` in format string

@m-ou-se
Copy link
Member Author

m-ou-se commented Feb 26, 2025

r? libs

@rustbot rustbot assigned cuviper and unassigned m-ou-se Feb 26, 2025
@m-ou-se m-ou-se requested a review from scottmcm February 27, 2025 10:28
Copy link
Member

@scottmcm scottmcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code overall looks good to me. One small cleanup request for expr_u*, then r=me.

span: sp,
node: ast::LitKind::Int(
u128::from(value).into(),
ast::LitIntType::Unsigned(ast::UintTy::U16),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like expr_usize, expr_u32, and expr_u16 are all nearly identical.

Now that there's three copies, can you dedup their implementations to share code? Maybe expr_unsigned that takes a ast::UintTy or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that they take a u16 or u32 as argument. I think that's a useful distinction.

Copy link
Member

@scottmcm scottmcm Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean to delete the type-specific methods, just to have them share an implementation.

So it'd be something like

    pub(super) fn expr_usize(&mut self, sp: Span, value: u64) -> hir::Expr<'hir> {
        self.expr_uint(sp, value, ast::UintTy::Usize)
    }
    pub(super) fn expr_u32(&mut self, sp: Span, value: u32) -> hir::Expr<'hir> {
        self.expr_uint(sp, value, ast::UintTy::U32)
    }
    pub(super) fn expr_u16(&mut self, sp: Span, value: u16) -> hir::Expr<'hir> {
        self.expr_uint(sp, value, ast::UintTy::U16)
    }

etc, rather than repeating all the arena alloc and LitKind and such in all three of them

@@ -294,8 +294,8 @@ pub struct FormattingOptions {
flags: u32,
fill: char,
align: Option<Alignment>,
width: Option<usize>,
precision: Option<usize>,
width: Option<u16>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pondering: is width: Some(0) ever meaningful? Could this reasonably be Option<NonZeroU16>, or I guess just width: u16 treating width: 0 as the default?

(I guess that's not really this PR's problem, though.)

Copy link
Member Author

@m-ou-se m-ou-se Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A width of 0 can be different than a no width. (It behaves the same for Display for str and friends, but can be different for other impls.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, in the next PR (#136974), it becomes a regular u16 field width the 'presence bit' moved to the flags field.

@klensy
Copy link
Contributor

klensy commented Mar 2, 2025

Should be neutral, but maybe perf run?

@m-ou-se
Copy link
Member Author

m-ou-se commented Mar 4, 2025

@klensy #136974 has a perf run, which is the follow up to this PR.

pub const fn from_usize(x: &usize) -> Argument<'_> {
Argument { ty: ArgumentType::Count(*x) }
if *x > u16::MAX as usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should special-case usize::MAX and treat it as "unlimited". This is what was done in stream-rust, apparently. ISTM that that wasn't wrong - indeed, it's convenient. We shouldn't break it.

@bors
Copy link
Contributor

bors commented Mar 5, 2025

☔ The latest upstream changes (presumably #138058) made this pull request unmergeable. Please resolve the merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-fmt Area: `core::fmt` disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. to-announce Announce this issue on triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.