-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parenthesization of subexprs containing statement boundary #119105
Conversation
(rustbot has picked a reviewer for you, use r? to override) |
I'm surprised that Will review the code a bit later. |
For But there is only one AST pretty-printer, and |
@bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (89e2160): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 672.366s -> 670.933s (-0.21%) |
Fix, document, and test parser and pretty-printer edge cases related to braced macro calls _Review note: this is a deceptively small PR because it comes with 145 lines of docs and 196 lines of tests, and only 25 lines of compiler code changed. However, I recommend reviewing it 1 commit at a time because much of the effect of the code changes is non-local i.e. affecting code that is not visible in the final state of the PR. I have paid attention that reviewing the PR one commit at a time is as easy as I can make it. All of the code you need to know about is touched in those commits, even if some of those changes disappear by the end of the stack._ This is a follow-up to rust-lang#119105. One case that is not relevant to `-Zunpretty=expanded`, but which came up as I'm porting rust-lang#119105 and rust-lang#118726 into `syn`'s printer and `prettyplease`'s printer where it **is** relevant, and is also relevant to rustc's `stringify!`, is statement boundaries in the vicinity of braced macro calls. Rustc's AST pretty-printer produces invalid syntax for statements that begin with a braced macro call: ```rust macro_rules! stringify_item { ($i:item) => { stringify!($i) }; } macro_rules! repro { ($e:expr) => { stringify_item!(fn main() { $e + 1; }) }; } fn main() { println!("{}", repro!(m! {})); } ``` **Before this PR:** output is not valid Rust syntax. ```console fn main() { m! {} + 1; } ``` ```console error: leading `+` is not supported --> <anon>:1:19 | 1 | fn main() { m! {} + 1; } | ^ unexpected `+` | help: try removing the `+` | 1 - fn main() { m! {} + 1; } 1 + fn main() { m! {} 1; } | ``` **After this PR:** valid syntax. ```console fn main() { (m! {}) + 1; } ```
Inline ExprPrecedence::order into Expr::precedence The representation of expression precedence in rustc_ast has been an obstacle to further improvements in the pretty-printer (continuing from rust-lang#119105 and rust-lang#119427). Previously the operation of *"does this expression have lower precedence than that one"* (relevant for parenthesis insertion in macro-generated syntax trees) consisted of 3 steps: 1. Convert `Expr` to `ExprPrecedence` using `.precedence()` 2. Convert `ExprPrecedence` to `i8` using `.order()` 3. Compare using `<` As far as I can guess, the reason for the separation between `precedence()` and `order()` was so that both `rustc_ast::Expr` and `rustc_hir::Expr` could convert as straightforwardly as possible to the same `ExprPrecedence` enum, and then the more finicky logic performed by `order` could be present just once. The mapping between `Expr` and `ExprPrecedence` was intended to be as straightforward as possible: ```rust match self.kind { ExprKind::Closure(..) => ExprPrecedence::Closure, ... } ``` although there were exceptions of both many-to-one, and one-to-many: ```rust ExprKind::Underscore => ExprPrecedence::Path, ExprKind::Path(..) => ExprPrecedence::Path, ... ExprKind::Match(_, _, MatchKind::Prefix) => ExprPrecedence::Match, ExprKind::Match(_, _, MatchKind::Postfix) => ExprPrecedence::PostfixMatch, ``` Where the nature of `ExprPrecedence` becomes problematic is when a single expression kind might be associated with multiple different precedence levels depending on context (outside the expression) and contents (inside the expression). For example consider what is the precedence of an ExprKind::Closure `$closure`. Well, on the left-hand side of a binary operator it would need parentheses in order to avoid the trailing binary operator being absorbed into the closure body: `($closure) + Rhs`, so the precedence is something lower than that of `+`. But on the right-hand side of a binary operator, a closure is just a straightforward prefix expression like a unary op, which is a relatively high precedence level, higher than binops but lower than method calls: `Lhs + $closure` is fine without parens but `($closure).method()` needs them. But as a third case, if the closure contains an explicit return type, then the precedence is an even higher level than that, never needing parenthesization even in a binop left-hand side or method call: `|| -> bool { false } + Rhs` or `|| -> bool { false }.method()`. You can see that trying to capture all of this resolution about expressions into `ExprPrecedence` violates the intention of `ExprPrecedence` being a straightforward one-to-one correspondence from each AST and HIR `ExprKind` variant. It would be possible to attempt that by doing stuff like `ExprPrecedence::Closure(Side::Leading, ReturnType::No)`, but I don't foresee the original envisioned benefit of the `precedence()`/`order()` distinction being retained in this approach. Instead I want to move toward a model that Syn has been using successfully. In Syn, there is a Precedence enum but it differs from rustc in the following ways: - There are [relatively few variants](https://github.com/dtolnay/syn/blob/2.0.87/src/precedence.rs#L11-L47) compared to rustc's `ExprPrecedence`. For example there is no distinction at the precedence level between returns and closures, or between loops and method calls. - We distinguish between [leading](https://github.com/dtolnay/syn/blob/2.0.87/src/fixup.rs#L293) and [trailing](https://github.com/dtolnay/syn/blob/2.0.87/src/fixup.rs#L309) precedence, taking into account an expression's context such as what token follows it (for various syntactic bail-outs in Rust's grammar, like ambiguities around break-with-value) and how it relates to operators from the surrounding syntax tree. - There are no hardcoded mysterious integer quantities like rustc's `PREC_CLOSURE = -40`. All precedence comparisons are performed via PartialOrd on a C-like enum. This PR is just a first step in these changes. As you can tell from Syn, I definitely think there is value in having a dedicated type to represent precedence, instead of what `order()` is doing with `i8`. But that is a whole separate adventure because rustc_ast doesn't even agree consistently on `i8` being the type for precedence order; `AssocOp::precedence` instead uses `usize` and there are casts in both directions. It is likely that a type called `ExprPrecedence` will re-appear, but it will look substantially different from the one that existed before this PR.
Inline ExprPrecedence::order into Expr::precedence The representation of expression precedence in rustc_ast has been an obstacle to further improvements in the pretty-printer (continuing from rust-lang#119105 and rust-lang#119427). Previously the operation of *"does this expression have lower precedence than that one"* (relevant for parenthesis insertion in macro-generated syntax trees) consisted of 3 steps: 1. Convert `Expr` to `ExprPrecedence` using `.precedence()` 2. Convert `ExprPrecedence` to `i8` using `.order()` 3. Compare using `<` As far as I can guess, the reason for the separation between `precedence()` and `order()` was so that both `rustc_ast::Expr` and `rustc_hir::Expr` could convert as straightforwardly as possible to the same `ExprPrecedence` enum, and then the more finicky logic performed by `order` could be present just once. The mapping between `Expr` and `ExprPrecedence` was intended to be as straightforward as possible: ```rust match self.kind { ExprKind::Closure(..) => ExprPrecedence::Closure, ... } ``` although there were exceptions of both many-to-one, and one-to-many: ```rust ExprKind::Underscore => ExprPrecedence::Path, ExprKind::Path(..) => ExprPrecedence::Path, ... ExprKind::Match(_, _, MatchKind::Prefix) => ExprPrecedence::Match, ExprKind::Match(_, _, MatchKind::Postfix) => ExprPrecedence::PostfixMatch, ``` Where the nature of `ExprPrecedence` becomes problematic is when a single expression kind might be associated with multiple different precedence levels depending on context (outside the expression) and contents (inside the expression). For example consider what is the precedence of an ExprKind::Closure `$closure`. Well, on the left-hand side of a binary operator it would need parentheses in order to avoid the trailing binary operator being absorbed into the closure body: `($closure) + Rhs`, so the precedence is something lower than that of `+`. But on the right-hand side of a binary operator, a closure is just a straightforward prefix expression like a unary op, which is a relatively high precedence level, higher than binops but lower than method calls: `Lhs + $closure` is fine without parens but `($closure).method()` needs them. But as a third case, if the closure contains an explicit return type, then the precedence is an even higher level than that, never needing parenthesization even in a binop left-hand side or method call: `|| -> bool { false } + Rhs` or `|| -> bool { false }.method()`. You can see that trying to capture all of this resolution about expressions into `ExprPrecedence` violates the intention of `ExprPrecedence` being a straightforward one-to-one correspondence from each AST and HIR `ExprKind` variant. It would be possible to attempt that by doing stuff like `ExprPrecedence::Closure(Side::Leading, ReturnType::No)`, but I don't foresee the original envisioned benefit of the `precedence()`/`order()` distinction being retained in this approach. Instead I want to move toward a model that Syn has been using successfully. In Syn, there is a Precedence enum but it differs from rustc in the following ways: - There are [relatively few variants](https://github.com/dtolnay/syn/blob/2.0.87/src/precedence.rs#L11-L47) compared to rustc's `ExprPrecedence`. For example there is no distinction at the precedence level between returns and closures, or between loops and method calls. - We distinguish between [leading](https://github.com/dtolnay/syn/blob/2.0.87/src/fixup.rs#L293) and [trailing](https://github.com/dtolnay/syn/blob/2.0.87/src/fixup.rs#L309) precedence, taking into account an expression's context such as what token follows it (for various syntactic bail-outs in Rust's grammar, like ambiguities around break-with-value) and how it relates to operators from the surrounding syntax tree. - There are no hardcoded mysterious integer quantities like rustc's `PREC_CLOSURE = -40`. All precedence comparisons are performed via PartialOrd on a C-like enum. This PR is just a first step in these changes. As you can tell from Syn, I definitely think there is value in having a dedicated type to represent precedence, instead of what `order()` is doing with `i8`. But that is a whole separate adventure because rustc_ast doesn't even agree consistently on `i8` being the type for precedence order; `AssocOp::precedence` instead uses `usize` and there are casts in both directions. It is likely that a type called `ExprPrecedence` will re-appear, but it will look substantially different from the one that existed before this PR.
Rollup merge of rust-lang#133140 - dtolnay:precedence, r=fmease Inline ExprPrecedence::order into Expr::precedence The representation of expression precedence in rustc_ast has been an obstacle to further improvements in the pretty-printer (continuing from rust-lang#119105 and rust-lang#119427). Previously the operation of *"does this expression have lower precedence than that one"* (relevant for parenthesis insertion in macro-generated syntax trees) consisted of 3 steps: 1. Convert `Expr` to `ExprPrecedence` using `.precedence()` 2. Convert `ExprPrecedence` to `i8` using `.order()` 3. Compare using `<` As far as I can guess, the reason for the separation between `precedence()` and `order()` was so that both `rustc_ast::Expr` and `rustc_hir::Expr` could convert as straightforwardly as possible to the same `ExprPrecedence` enum, and then the more finicky logic performed by `order` could be present just once. The mapping between `Expr` and `ExprPrecedence` was intended to be as straightforward as possible: ```rust match self.kind { ExprKind::Closure(..) => ExprPrecedence::Closure, ... } ``` although there were exceptions of both many-to-one, and one-to-many: ```rust ExprKind::Underscore => ExprPrecedence::Path, ExprKind::Path(..) => ExprPrecedence::Path, ... ExprKind::Match(_, _, MatchKind::Prefix) => ExprPrecedence::Match, ExprKind::Match(_, _, MatchKind::Postfix) => ExprPrecedence::PostfixMatch, ``` Where the nature of `ExprPrecedence` becomes problematic is when a single expression kind might be associated with multiple different precedence levels depending on context (outside the expression) and contents (inside the expression). For example consider what is the precedence of an ExprKind::Closure `$closure`. Well, on the left-hand side of a binary operator it would need parentheses in order to avoid the trailing binary operator being absorbed into the closure body: `($closure) + Rhs`, so the precedence is something lower than that of `+`. But on the right-hand side of a binary operator, a closure is just a straightforward prefix expression like a unary op, which is a relatively high precedence level, higher than binops but lower than method calls: `Lhs + $closure` is fine without parens but `($closure).method()` needs them. But as a third case, if the closure contains an explicit return type, then the precedence is an even higher level than that, never needing parenthesization even in a binop left-hand side or method call: `|| -> bool { false } + Rhs` or `|| -> bool { false }.method()`. You can see that trying to capture all of this resolution about expressions into `ExprPrecedence` violates the intention of `ExprPrecedence` being a straightforward one-to-one correspondence from each AST and HIR `ExprKind` variant. It would be possible to attempt that by doing stuff like `ExprPrecedence::Closure(Side::Leading, ReturnType::No)`, but I don't foresee the original envisioned benefit of the `precedence()`/`order()` distinction being retained in this approach. Instead I want to move toward a model that Syn has been using successfully. In Syn, there is a Precedence enum but it differs from rustc in the following ways: - There are [relatively few variants](https://github.com/dtolnay/syn/blob/2.0.87/src/precedence.rs#L11-L47) compared to rustc's `ExprPrecedence`. For example there is no distinction at the precedence level between returns and closures, or between loops and method calls. - We distinguish between [leading](https://github.com/dtolnay/syn/blob/2.0.87/src/fixup.rs#L293) and [trailing](https://github.com/dtolnay/syn/blob/2.0.87/src/fixup.rs#L309) precedence, taking into account an expression's context such as what token follows it (for various syntactic bail-outs in Rust's grammar, like ambiguities around break-with-value) and how it relates to operators from the surrounding syntax tree. - There are no hardcoded mysterious integer quantities like rustc's `PREC_CLOSURE = -40`. All precedence comparisons are performed via PartialOrd on a C-like enum. This PR is just a first step in these changes. As you can tell from Syn, I definitely think there is value in having a dedicated type to represent precedence, instead of what `order()` is doing with `i8`. But that is a whole separate adventure because rustc_ast doesn't even agree consistently on `i8` being the type for precedence order; `AssocOp::precedence` instead uses `usize` and there are casts in both directions. It is likely that a type called `ExprPrecedence` will re-appear, but it will look substantially different from the one that existed before this PR.
Inline ExprPrecedence::order into Expr::precedence The representation of expression precedence in rustc_ast has been an obstacle to further improvements in the pretty-printer (continuing from rust-lang#119105 and rust-lang#119427). Previously the operation of *"does this expression have lower precedence than that one"* (relevant for parenthesis insertion in macro-generated syntax trees) consisted of 3 steps: 1. Convert `Expr` to `ExprPrecedence` using `.precedence()` 2. Convert `ExprPrecedence` to `i8` using `.order()` 3. Compare using `<` As far as I can guess, the reason for the separation between `precedence()` and `order()` was so that both `rustc_ast::Expr` and `rustc_hir::Expr` could convert as straightforwardly as possible to the same `ExprPrecedence` enum, and then the more finicky logic performed by `order` could be present just once. The mapping between `Expr` and `ExprPrecedence` was intended to be as straightforward as possible: ```rust match self.kind { ExprKind::Closure(..) => ExprPrecedence::Closure, ... } ``` although there were exceptions of both many-to-one, and one-to-many: ```rust ExprKind::Underscore => ExprPrecedence::Path, ExprKind::Path(..) => ExprPrecedence::Path, ... ExprKind::Match(_, _, MatchKind::Prefix) => ExprPrecedence::Match, ExprKind::Match(_, _, MatchKind::Postfix) => ExprPrecedence::PostfixMatch, ``` Where the nature of `ExprPrecedence` becomes problematic is when a single expression kind might be associated with multiple different precedence levels depending on context (outside the expression) and contents (inside the expression). For example consider what is the precedence of an ExprKind::Closure `$closure`. Well, on the left-hand side of a binary operator it would need parentheses in order to avoid the trailing binary operator being absorbed into the closure body: `($closure) + Rhs`, so the precedence is something lower than that of `+`. But on the right-hand side of a binary operator, a closure is just a straightforward prefix expression like a unary op, which is a relatively high precedence level, higher than binops but lower than method calls: `Lhs + $closure` is fine without parens but `($closure).method()` needs them. But as a third case, if the closure contains an explicit return type, then the precedence is an even higher level than that, never needing parenthesization even in a binop left-hand side or method call: `|| -> bool { false } + Rhs` or `|| -> bool { false }.method()`. You can see that trying to capture all of this resolution about expressions into `ExprPrecedence` violates the intention of `ExprPrecedence` being a straightforward one-to-one correspondence from each AST and HIR `ExprKind` variant. It would be possible to attempt that by doing stuff like `ExprPrecedence::Closure(Side::Leading, ReturnType::No)`, but I don't foresee the original envisioned benefit of the `precedence()`/`order()` distinction being retained in this approach. Instead I want to move toward a model that Syn has been using successfully. In Syn, there is a Precedence enum but it differs from rustc in the following ways: - There are [relatively few variants](https://github.com/dtolnay/syn/blob/2.0.87/src/precedence.rs#L11-L47) compared to rustc's `ExprPrecedence`. For example there is no distinction at the precedence level between returns and closures, or between loops and method calls. - We distinguish between [leading](https://github.com/dtolnay/syn/blob/2.0.87/src/fixup.rs#L293) and [trailing](https://github.com/dtolnay/syn/blob/2.0.87/src/fixup.rs#L309) precedence, taking into account an expression's context such as what token follows it (for various syntactic bail-outs in Rust's grammar, like ambiguities around break-with-value) and how it relates to operators from the surrounding syntax tree. - There are no hardcoded mysterious integer quantities like rustc's `PREC_CLOSURE = -40`. All precedence comparisons are performed via PartialOrd on a C-like enum. This PR is just a first step in these changes. As you can tell from Syn, I definitely think there is value in having a dedicated type to represent precedence, instead of what `order()` is doing with `i8`. But that is a whole separate adventure because rustc_ast doesn't even agree consistently on `i8` being the type for precedence order; `AssocOp::precedence` instead uses `usize` and there are casts in both directions. It is likely that a type called `ExprPrecedence` will re-appear, but it will look substantially different from the one that existed before this PR.
This PR fixes a multitude of false negatives and false positives in the AST pretty printer's parenthesis insertion related to statement boundaries — statements which terminate unexpectedly early if there aren't parentheses.
Without this fix, the AST pretty printer (including both
stringify!
andrustc -Zunpretty=expanded
) is prone to producing output which is not syntactically valid Rust. Invalid output is problematic because it means Rustfmt is unable to parse the output ofcargo expand
, for example, causing friction by forcing someone trying to debug a macro into reading poorly formatted code.I believe the set of bugs fixed in this PR account for the most prevalent reason that
cargo expand
produces invalid output in real-world usage.Fixes #98790.
False negatives
The following is a correct program —
cargo check
succeeds.But
rustc -Zunpretty=expanded main.rs
produces output that is invalid Rust syntax, because parenthesization is needed and not being done by the pretty printer.Piping this expanded code to rustfmt, it fails to parse.
Fixed output after this PR:
False positives
Less problematic, but worth fixing (just like #118726).
Output of
rustc -Zunpretty=expanded lib.rs
before this PR. There is no reason parentheses would need to be inserted there.After this PR:
Alternatives considered
In this PR I opted to parenthesize only the leading subexpression causing the statement boundary, rather than the entire statement. Example:
This PR produces the following pretty-printed contents for fn main:
A different equally correct output would be:
I chose the one I did because it is the only approach used by handwritten code in the standard library and compiler. There are 4 places where parenthesization is being used to prevent a statement boundary, and in all 4, the developer has chosen to parenthesize the smallest subexpression rather than the whole statement:
rust/compiler/rustc_codegen_cranelift/example/alloc_system.rs
Line 102 in b37d43e
rust/compiler/rustc_parse/src/errors.rs
Lines 1021 to 1029 in b37d43e
rust/library/core/src/future/poll_fn.rs
Line 151 in b37d43e
rust/library/core/src/ops/range.rs
Lines 824 to 828 in b37d43e