Fix parenthesization of subexprs containing statement boundary #119105

dtolnay · 2023-12-19T07:18:51Z

This PR fixes a multitude of false negatives and false positives in the AST pretty printer's parenthesis insertion related to statement boundaries — statements which terminate unexpectedly early if there aren't parentheses.

Without this fix, the AST pretty printer (including both stringify! and rustc -Zunpretty=expanded) is prone to producing output which is not syntactically valid Rust. Invalid output is problematic because it means Rustfmt is unable to parse the output of cargo expand, for example, causing friction by forcing someone trying to debug a macro into reading poorly formatted code.

I believe the set of bugs fixed in this PR account for the most prevalent reason that cargo expand produces invalid output in real-world usage.

Fixes #98790.

False negatives

The following is a correct program — cargo check succeeds.

macro_rules! m {
    ($e:expr) => {
        match () { _ => $e }
    };
}

fn main() {
    m!({ 1 } - 1);
}

But rustc -Zunpretty=expanded main.rs produces output that is invalid Rust syntax, because parenthesization is needed and not being done by the pretty printer.

fn main() { match () { _ => { 1 } - 1, }; }

Piping this expanded code to rustfmt, it fails to parse.

error: unexpected `,` in pattern
 --> <stdin>:1:38
  |
1 | fn main() { match () { _ => { 1 } - 1, }; }
  |                                      ^
  |
help: try adding parentheses to match on a tuple...
  |
1 | fn main() { match () { _ => { 1 } (- 1,) }; }
  |                                   +    +
help: ...or a vertical bar to match on multiple alternatives
  |
1 | fn main() { match () { _ => { 1 } - 1 | }; }
  |                                   ~~~~~

Fixed output after this PR:

fn main() { match () { _ => ({ 1 }) - 1, }; }

False positives

Less problematic, but worth fixing (just like #118726).

fn main() {
    let _ = match () { _ => 1 } - 1;
}

Output of rustc -Zunpretty=expanded lib.rs before this PR. There is no reason parentheses would need to be inserted there.

fn main() { let _ = (match () { _ => 1, }) - 1; }

After this PR:

fn main() { let _ = match () { _ => 1, } - 1; }

Alternatives considered

In this PR I opted to parenthesize only the leading subexpression causing the statement boundary, rather than the entire statement. Example:

macro_rules! m {
    ($e:expr) => {
        $e
    };
}

fn main() {
    m!(loop { break [1]; }[0] - 1);
}

This PR produces the following pretty-printed contents for fn main:

(loop { break [1]; })[0] - 1;

A different equally correct output would be:

(loop { break [1]; }[0] - 1);

I chose the one I did because it is the only approach used by handwritten code in the standard library and compiler. There are 4 places where parenthesization is being used to prevent a statement boundary, and in all 4, the developer has chosen to parenthesize the smallest subexpression rather than the whole statement:

rust/compiler/rustc_codegen_cranelift/example/alloc_system.rs

Line 102 in b37d43e

(if ptr.is_null() { ptr } else { align_ptr(ptr, layout.align()) }) as *mut u8

rust/compiler/rustc_parse/src/errors.rs

Lines 1021 to 1029 in b37d43e

    
           (match token_descr { 
        
               Some(TokenDescription::ReservedIdentifier) => { 
        
                   ExpectedIdentifierFound::ReservedIdentifier 
        
               } 
        
               Some(TokenDescription::Keyword) => ExpectedIdentifierFound::Keyword, 
        
               Some(TokenDescription::ReservedKeyword) => ExpectedIdentifierFound::ReservedKeyword, 
        
               Some(TokenDescription::DocComment) => ExpectedIdentifierFound::DocComment, 
        
               None => ExpectedIdentifierFound::Other, 
        
           })(span)

rust/library/core/src/future/poll_fn.rs

Line 151 in b37d43e

(unsafe { &mut self.get_unchecked_mut().f })(cx)

rust/library/core/src/ops/range.rs

Lines 824 to 828 in b37d43e

    
           (match self.start_bound() { 
        
               Included(start) => start <= item, 
        
               Excluded(start) => start < item, 
        
               Unbounded => true, 
        
           }) && (match self.end_bound() {

rustbot · 2023-12-19T07:19:00Z

r? @WaffleLapkin

(rustbot has picked a reviewer for you, use r? to override)

WaffleLapkin · 2023-12-27T06:50:54Z

I'm surprised that stringify is expected to produce valid rust, as opposed to "what's literally written in the file", I but given it can observe macro expansion and there is precedence for this, I'm inclined to say that it's fine.

Will review the code a bit later.

dtolnay · 2023-12-27T09:05:39Z

For stringify!, I agree it is not obvious that we need it to produce valid Rust syntax. I do not know of any use cases that involve parsing the output of stringify! other than visually by a human (for which valid syntax is nice but not a hard requirement).

But there is only one AST pretty-printer, and stringify! and rustc -Zunpretty=expanded use the same one. I care 100× more about the latter producing valid syntax, because cargo expand is based on it and relies on parsing the output not just for piping to rustfmt / prettyplease, but also for cargo expand path::to::module where it filters the output down to just a single item by path, which I use commonly.

WaffleLapkin · 2023-12-27T16:28:25Z

@bors r+

bors · 2023-12-27T16:28:28Z

📌 Commit 17239d9 has been approved by WaffleLapkin

It is now in the queue for this repository.

bors · 2023-12-27T21:27:29Z

⌛ Testing commit 17239d9 with merge 89e2160...

bors · 2023-12-27T23:25:44Z

☀️ Test successful - checks-actions
Approved by: WaffleLapkin
Pushing 89e2160 to master...

rust-timer · 2023-12-28T00:46:29Z

Finished benchmarking commit (89e2160): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.0%	[1.0%, 1.0%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.1%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.4%	[-0.1%, 1.0%]	2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.6%	[0.6%, 0.6%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.6%	[0.6%, 0.6%]	1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 672.366s -> 670.933s (-0.21%)
Artifact size: 312.33 MiB -> 312.35 MiB (0.00%)

Fix, document, and test parser and pretty-printer edge cases related to braced macro calls _Review note: this is a deceptively small PR because it comes with 145 lines of docs and 196 lines of tests, and only 25 lines of compiler code changed. However, I recommend reviewing it 1 commit at a time because much of the effect of the code changes is non-local i.e. affecting code that is not visible in the final state of the PR. I have paid attention that reviewing the PR one commit at a time is as easy as I can make it. All of the code you need to know about is touched in those commits, even if some of those changes disappear by the end of the stack._ This is a follow-up to rust-lang#119105. One case that is not relevant to `-Zunpretty=expanded`, but which came up as I'm porting rust-lang#119105 and rust-lang#118726 into `syn`'s printer and `prettyplease`'s printer where it **is** relevant, and is also relevant to rustc's `stringify!`, is statement boundaries in the vicinity of braced macro calls. Rustc's AST pretty-printer produces invalid syntax for statements that begin with a braced macro call: ```rust macro_rules! stringify_item { ($i:item) => { stringify!($i) }; } macro_rules! repro { ($e:expr) => { stringify_item!(fn main() { $e + 1; }) }; } fn main() { println!("{}", repro!(m! {})); } ``` **Before this PR:** output is not valid Rust syntax. ```console fn main() { m! {} + 1; } ``` ```console error: leading `+` is not supported --> <anon>:1:19 | 1 | fn main() { m! {} + 1; } | ^ unexpected `+` | help: try removing the `+` | 1 - fn main() { m! {} + 1; } 1 + fn main() { m! {} 1; } | ``` **After this PR:** valid syntax. ```console fn main() { (m! {}) + 1; } ```

Inline ExprPrecedence::order into Expr::precedence The representation of expression precedence in rustc_ast has been an obstacle to further improvements in the pretty-printer (continuing from rust-lang#119105 and rust-lang#119427). Previously the operation of *"does this expression have lower precedence than that one"* (relevant for parenthesis insertion in macro-generated syntax trees) consisted of 3 steps: 1. Convert `Expr` to `ExprPrecedence` using `.precedence()` 2. Convert `ExprPrecedence` to `i8` using `.order()` 3. Compare using `<` As far as I can guess, the reason for the separation between `precedence()` and `order()` was so that both `rustc_ast::Expr` and `rustc_hir::Expr` could convert as straightforwardly as possible to the same `ExprPrecedence` enum, and then the more finicky logic performed by `order` could be present just once. The mapping between `Expr` and `ExprPrecedence` was intended to be as straightforward as possible: ```rust match self.kind { ExprKind::Closure(..) => ExprPrecedence::Closure, ... } ``` although there were exceptions of both many-to-one, and one-to-many: ```rust ExprKind::Underscore => ExprPrecedence::Path, ExprKind::Path(..) => ExprPrecedence::Path, ... ExprKind::Match(_, _, MatchKind::Prefix) => ExprPrecedence::Match, ExprKind::Match(_, _, MatchKind::Postfix) => ExprPrecedence::PostfixMatch, ``` Where the nature of `ExprPrecedence` becomes problematic is when a single expression kind might be associated with multiple different precedence levels depending on context (outside the expression) and contents (inside the expression). For example consider what is the precedence of an ExprKind::Closure `$closure`. Well, on the left-hand side of a binary operator it would need parentheses in order to avoid the trailing binary operator being absorbed into the closure body: `($closure) + Rhs`, so the precedence is something lower than that of `+`. But on the right-hand side of a binary operator, a closure is just a straightforward prefix expression like a unary op, which is a relatively high precedence level, higher than binops but lower than method calls: `Lhs + $closure` is fine without parens but `($closure).method()` needs them. But as a third case, if the closure contains an explicit return type, then the precedence is an even higher level than that, never needing parenthesization even in a binop left-hand side or method call: `|| -> bool { false } + Rhs` or `|| -> bool { false }.method()`. You can see that trying to capture all of this resolution about expressions into `ExprPrecedence` violates the intention of `ExprPrecedence` being a straightforward one-to-one correspondence from each AST and HIR `ExprKind` variant. It would be possible to attempt that by doing stuff like `ExprPrecedence::Closure(Side::Leading, ReturnType::No)`, but I don't foresee the original envisioned benefit of the `precedence()`/`order()` distinction being retained in this approach. Instead I want to move toward a model that Syn has been using successfully. In Syn, there is a Precedence enum but it differs from rustc in the following ways: - There are [relatively few variants](https://github.com/dtolnay/syn/blob/2.0.87/src/precedence.rs#L11-L47) compared to rustc's `ExprPrecedence`. For example there is no distinction at the precedence level between returns and closures, or between loops and method calls. - We distinguish between [leading](https://github.com/dtolnay/syn/blob/2.0.87/src/fixup.rs#L293) and [trailing](https://github.com/dtolnay/syn/blob/2.0.87/src/fixup.rs#L309) precedence, taking into account an expression's context such as what token follows it (for various syntactic bail-outs in Rust's grammar, like ambiguities around break-with-value) and how it relates to operators from the surrounding syntax tree. - There are no hardcoded mysterious integer quantities like rustc's `PREC_CLOSURE = -40`. All precedence comparisons are performed via PartialOrd on a C-like enum. This PR is just a first step in these changes. As you can tell from Syn, I definitely think there is value in having a dedicated type to represent precedence, instead of what `order()` is doing with `i8`. But that is a whole separate adventure because rustc_ast doesn't even agree consistently on `i8` being the type for precedence order; `AssocOp::precedence` instead uses `usize` and there are casts in both directions. It is likely that a type called `ExprPrecedence` will re-appear, but it will look substantially different from the one that existed before this PR.

Rollup merge of rust-lang#133140 - dtolnay:precedence, r=fmease Inline ExprPrecedence::order into Expr::precedence The representation of expression precedence in rustc_ast has been an obstacle to further improvements in the pretty-printer (continuing from rust-lang#119105 and rust-lang#119427). Previously the operation of *"does this expression have lower precedence than that one"* (relevant for parenthesis insertion in macro-generated syntax trees) consisted of 3 steps: 1. Convert `Expr` to `ExprPrecedence` using `.precedence()` 2. Convert `ExprPrecedence` to `i8` using `.order()` 3. Compare using `<` As far as I can guess, the reason for the separation between `precedence()` and `order()` was so that both `rustc_ast::Expr` and `rustc_hir::Expr` could convert as straightforwardly as possible to the same `ExprPrecedence` enum, and then the more finicky logic performed by `order` could be present just once. The mapping between `Expr` and `ExprPrecedence` was intended to be as straightforward as possible: ```rust match self.kind { ExprKind::Closure(..) => ExprPrecedence::Closure, ... } ``` although there were exceptions of both many-to-one, and one-to-many: ```rust ExprKind::Underscore => ExprPrecedence::Path, ExprKind::Path(..) => ExprPrecedence::Path, ... ExprKind::Match(_, _, MatchKind::Prefix) => ExprPrecedence::Match, ExprKind::Match(_, _, MatchKind::Postfix) => ExprPrecedence::PostfixMatch, ``` Where the nature of `ExprPrecedence` becomes problematic is when a single expression kind might be associated with multiple different precedence levels depending on context (outside the expression) and contents (inside the expression). For example consider what is the precedence of an ExprKind::Closure `$closure`. Well, on the left-hand side of a binary operator it would need parentheses in order to avoid the trailing binary operator being absorbed into the closure body: `($closure) + Rhs`, so the precedence is something lower than that of `+`. But on the right-hand side of a binary operator, a closure is just a straightforward prefix expression like a unary op, which is a relatively high precedence level, higher than binops but lower than method calls: `Lhs + $closure` is fine without parens but `($closure).method()` needs them. But as a third case, if the closure contains an explicit return type, then the precedence is an even higher level than that, never needing parenthesization even in a binop left-hand side or method call: `|| -> bool { false } + Rhs` or `|| -> bool { false }.method()`. You can see that trying to capture all of this resolution about expressions into `ExprPrecedence` violates the intention of `ExprPrecedence` being a straightforward one-to-one correspondence from each AST and HIR `ExprKind` variant. It would be possible to attempt that by doing stuff like `ExprPrecedence::Closure(Side::Leading, ReturnType::No)`, but I don't foresee the original envisioned benefit of the `precedence()`/`order()` distinction being retained in this approach. Instead I want to move toward a model that Syn has been using successfully. In Syn, there is a Precedence enum but it differs from rustc in the following ways: - There are [relatively few variants](https://github.com/dtolnay/syn/blob/2.0.87/src/precedence.rs#L11-L47) compared to rustc's `ExprPrecedence`. For example there is no distinction at the precedence level between returns and closures, or between loops and method calls. - We distinguish between [leading](https://github.com/dtolnay/syn/blob/2.0.87/src/fixup.rs#L293) and [trailing](https://github.com/dtolnay/syn/blob/2.0.87/src/fixup.rs#L309) precedence, taking into account an expression's context such as what token follows it (for various syntactic bail-outs in Rust's grammar, like ambiguities around break-with-value) and how it relates to operators from the surrounding syntax tree. - There are no hardcoded mysterious integer quantities like rustc's `PREC_CLOSURE = -40`. All precedence comparisons are performed via PartialOrd on a C-like enum. This PR is just a first step in these changes. As you can tell from Syn, I definitely think there is value in having a dedicated type to represent precedence, instead of what `order()` is doing with `i8`. But that is a whole separate adventure because rustc_ast doesn't even agree consistently on `i8` being the type for precedence order; `AssocOp::precedence` instead uses `usize` and there are casts in both directions. It is likely that a type called `ExprPrecedence` will re-appear, but it will look substantially different from the one that existed before this PR.

Inline ExprPrecedence::order into Expr::precedence The representation of expression precedence in rustc_ast has been an obstacle to further improvements in the pretty-printer (continuing from rust-lang#119105 and rust-lang#119427). Previously the operation of *"does this expression have lower precedence than that one"* (relevant for parenthesis insertion in macro-generated syntax trees) consisted of 3 steps: 1. Convert `Expr` to `ExprPrecedence` using `.precedence()` 2. Convert `ExprPrecedence` to `i8` using `.order()` 3. Compare using `<` As far as I can guess, the reason for the separation between `precedence()` and `order()` was so that both `rustc_ast::Expr` and `rustc_hir::Expr` could convert as straightforwardly as possible to the same `ExprPrecedence` enum, and then the more finicky logic performed by `order` could be present just once. The mapping between `Expr` and `ExprPrecedence` was intended to be as straightforward as possible: ```rust match self.kind { ExprKind::Closure(..) => ExprPrecedence::Closure, ... } ``` although there were exceptions of both many-to-one, and one-to-many: ```rust ExprKind::Underscore => ExprPrecedence::Path, ExprKind::Path(..) => ExprPrecedence::Path, ... ExprKind::Match(_, _, MatchKind::Prefix) => ExprPrecedence::Match, ExprKind::Match(_, _, MatchKind::Postfix) => ExprPrecedence::PostfixMatch, ``` Where the nature of `ExprPrecedence` becomes problematic is when a single expression kind might be associated with multiple different precedence levels depending on context (outside the expression) and contents (inside the expression). For example consider what is the precedence of an ExprKind::Closure `$closure`. Well, on the left-hand side of a binary operator it would need parentheses in order to avoid the trailing binary operator being absorbed into the closure body: `($closure) + Rhs`, so the precedence is something lower than that of `+`. But on the right-hand side of a binary operator, a closure is just a straightforward prefix expression like a unary op, which is a relatively high precedence level, higher than binops but lower than method calls: `Lhs + $closure` is fine without parens but `($closure).method()` needs them. But as a third case, if the closure contains an explicit return type, then the precedence is an even higher level than that, never needing parenthesization even in a binop left-hand side or method call: `|| -> bool { false } + Rhs` or `|| -> bool { false }.method()`. You can see that trying to capture all of this resolution about expressions into `ExprPrecedence` violates the intention of `ExprPrecedence` being a straightforward one-to-one correspondence from each AST and HIR `ExprKind` variant. It would be possible to attempt that by doing stuff like `ExprPrecedence::Closure(Side::Leading, ReturnType::No)`, but I don't foresee the original envisioned benefit of the `precedence()`/`order()` distinction being retained in this approach. Instead I want to move toward a model that Syn has been using successfully. In Syn, there is a Precedence enum but it differs from rustc in the following ways: - There are [relatively few variants](https://github.com/dtolnay/syn/blob/2.0.87/src/precedence.rs#L11-L47) compared to rustc's `ExprPrecedence`. For example there is no distinction at the precedence level between returns and closures, or between loops and method calls. - We distinguish between [leading](https://github.com/dtolnay/syn/blob/2.0.87/src/fixup.rs#L293) and [trailing](https://github.com/dtolnay/syn/blob/2.0.87/src/fixup.rs#L309) precedence, taking into account an expression's context such as what token follows it (for various syntactic bail-outs in Rust's grammar, like ambiguities around break-with-value) and how it relates to operators from the surrounding syntax tree. - There are no hardcoded mysterious integer quantities like rustc's `PREC_CLOSURE = -40`. All precedence comparisons are performed via PartialOrd on a C-like enum. This PR is just a first step in these changes. As you can tell from Syn, I definitely think there is value in having a dedicated type to represent precedence, instead of what `order()` is doing with `i8`. But that is a whole separate adventure because rustc_ast doesn't even agree consistently on `i8` being the type for precedence order; `AssocOp::precedence` instead uses `usize` and there are casts in both directions. It is likely that a type called `ExprPrecedence` will re-appear, but it will look substantially different from the one that existed before this PR.

dtolnay added 2 commits December 18, 2023 22:40

Test parenthesization of leftmost subexprs containing stmt boundaries

527e2ea

Fix parenthesization of subexprs containing statement boundary

17239d9

rustbot assigned WaffleLapkin Dec 19, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 19, 2023

WaffleLapkin approved these changes Dec 27, 2023

View reviewed changes

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 27, 2023

bors added the merged-by-bors This PR was explicitly merged by bors. label Dec 27, 2023

bors merged commit 89e2160 into rust-lang:master Dec 27, 2023
12 checks passed

rustbot added this to the 1.77.0 milestone Dec 27, 2023

dtolnay deleted the paren branch December 28, 2023 00:16

dtolnay added the A-pretty Area: Pretty printing (including `-Z unpretty`) label Dec 28, 2023

This was referenced Dec 28, 2023

pretty printer fails to include necessary parens around some block expressions #22450

Closed

Fix, document, and test parser and pretty-printer edge cases related to braced macro calls #119427

Merged

dtolnay mentioned this pull request May 16, 2024

Automatically insert necessary parentheses in ToTokens for Expr dtolnay/syn#1659

Merged

dtolnay mentioned this pull request Nov 17, 2024

Inline ExprPrecedence::order into Expr::precedence #133140

Merged

dtolnay mentioned this pull request Dec 20, 2024

Precedence improvements: closures and jumps #133782

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parenthesization of subexprs containing statement boundary #119105

Fix parenthesization of subexprs containing statement boundary #119105

dtolnay commented Dec 19, 2023

rustbot commented Dec 19, 2023

WaffleLapkin commented Dec 27, 2023 •

edited

Loading

dtolnay commented Dec 27, 2023

WaffleLapkin commented Dec 27, 2023

bors commented Dec 27, 2023

bors commented Dec 27, 2023

bors commented Dec 27, 2023

rust-timer commented Dec 28, 2023

	(match token_descr {
	Some(TokenDescription::ReservedIdentifier) => {
	ExpectedIdentifierFound::ReservedIdentifier
	}
	Some(TokenDescription::Keyword) => ExpectedIdentifierFound::Keyword,
	Some(TokenDescription::ReservedKeyword) => ExpectedIdentifierFound::ReservedKeyword,
	Some(TokenDescription::DocComment) => ExpectedIdentifierFound::DocComment,
	None => ExpectedIdentifierFound::Other,
	})(span)

	(match self.start_bound() {
	Included(start) => start <= item,
	Excluded(start) => start < item,
	Unbounded => true,
	}) && (match self.end_bound() {

Fix parenthesization of subexprs containing statement boundary #119105

Fix parenthesization of subexprs containing statement boundary #119105

Conversation

dtolnay commented Dec 19, 2023

False negatives

False positives

Alternatives considered

rustbot commented Dec 19, 2023

WaffleLapkin commented Dec 27, 2023 • edited Loading

dtolnay commented Dec 27, 2023

WaffleLapkin commented Dec 27, 2023

bors commented Dec 27, 2023

bors commented Dec 27, 2023

bors commented Dec 27, 2023

rust-timer commented Dec 28, 2023

Overall result: no relevant changes - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Binary size

WaffleLapkin commented Dec 27, 2023 •

edited

Loading