Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce SuiteStatement #5326

Closed
Tracked by #4798
MichaReiser opened this issue Jun 23, 2023 · 4 comments
Closed
Tracked by #4798

Introduce SuiteStatement #5326

MichaReiser opened this issue Jun 23, 2023 · 4 comments
Assignees

Comments

@MichaReiser
Copy link
Member

MichaReiser commented Jun 23, 2023

RustPython's AST structure for the following two programs is identical (the ranges differ)

# Program 1
if True:
    pass
elif False:
    pass


# Program 2
if True:
    pass
else:
    if False:
        pass
If(
    StmtIf {
        range: 0..38,
        test: Constant(
            ExprConstant {
                range: 3..7,
                value: Bool(
                    true,
                ),
                kind: None,
            },
        ),
        body: [
            Pass(
                StmtPass {
                    range: 13..17,
                },
            ),
        ],
        orelse: [
            If(
                StmtIf {
                    range: 18..38,
                    test: Constant(
                        ExprConstant {
                            range: 23..28,
                            value: Bool(
                                false,
                            ),
                            kind: None,
                        },
                    ),
                    body: [
                        Pass(
                            StmtPass {
                                range: 34..38,
                            },
                        ),
                    ],
                    orelse: [],
                },
            ),
        ],
    },
),

This is problematic because the formatter incorrectly assumes that the nested if in the second program is an elif and collapses the nested if. This isn't something a formatter should do. This representation does make sense for an interpreter. It's actually a neat little optimisation that the parser performs.

The solution for this is to change Suite from a Vec<Statement> to its own SuiteStatement AST node and change the orelse type (and any other field that represents a body) from Suite to Option<Stmt>.

This would change the representation of the first program to:

If(
    StmtIf {
        range: 0..38,
        test: Constant(
            ExprConstant {
                range: 3..7,
                value: Bool(
                    true,
                ),
                kind: None,
            },
        ),
        body: [
            Pass(
                StmtPass {
                    range: 13..17,
                },
            ),
        ],
        orelse: Some(If(
              StmtIf {
                  range: 18..38,
                  test: Constant(
                      ExprConstant {
                          range: 23..28,
                          value: Bool(
                              false,
                          ),
                          kind: None,
                      },
                  ),
                  body: [
                      Pass(
                          StmtPass {
                              range: 34..38,
                          },
                      ),
                  ],
                  orelse: None
              },
          ),
    }),
),

and of the second program:

If(
    StmtIf {
        range: 0..38,
        test: Constant(
            ExprConstant {
                range: 3..7,
                value: Bool(
                    true,
                ),
                kind: None,
            },
        ),
        body: [
            Pass(
                StmtPass {
                    range: 13..17,
                },
            ),
        ],
        orelse: Some(SuiteStatement(SuiteStatement([
            If(
                StmtIf {
                    range: 18..38,
                    test: Constant(
                        ExprConstant {
                            range: 23..28,
                            value: Bool(
                                false,
                            ),
                            kind: None,
                        },
                    ),
                    body: [
                        Pass(
                            StmtPass {
                                range: 34..38,
                            },
                        ),
                    ],
                    orelse: None,
                },
            ),
        ])),
    },
),

which is unambiguous

This requires changes to:

  • RustPython's parser
  • Ruff
  • The visitor implementations
  • The formatter

Alternatives

  • Extract the indent from the source to distinguish the two cases. That's what we currently do when extracting comments but it is so easy to get wrong. We would need to do the same in the unparse of the linter
@MichaReiser
Copy link
Member Author

@konstin Sorry for the misunderstanding. I'm excited that you want to take this on!

These are the Python.asdl changes that I already made and encode the AST changes that I had in mind

Index: ast/Python.asdl
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/ast/Python.asdl b/ast/Python.asdl
--- a/ast/Python.asdl	(revision caf6ebcefa80ce835aa689f392000e5cfd18f93d)
+++ b/ast/Python.asdl	(date 1687530270508)
@@ -9,16 +9,16 @@
         | FunctionType(expr* argtypes, expr returns)
 
     stmt = FunctionDef(identifier name, arguments args,
-                       stmt* body, decorator* decorator_list, expr? returns,
+                       stmt body, decorator* decorator_list, expr? returns,
                        string? type_comment)
           | AsyncFunctionDef(identifier name, arguments args,
-                             stmt* body, decorator* decorator_list, expr? returns,
+                             stmt body, decorator* decorator_list, expr? returns,
                              string? type_comment)
 
           | ClassDef(identifier name,
              expr* bases,
              keyword* keywords,
-             stmt* body,
+             stmt body,
              decorator* decorator_list)
           | Return(expr? value)
 
@@ -29,18 +29,18 @@
           | AnnAssign(expr target, expr annotation, expr? value, int simple)
 
           -- use 'orelse' because else is a keyword in target languages
-          | For(expr target, expr iter, stmt* body, stmt* orelse, string? type_comment)
-          | AsyncFor(expr target, expr iter, stmt* body, stmt* orelse, string? type_comment)
-          | While(expr test, stmt* body, stmt* orelse)
-          | If(expr test, stmt* body, stmt* orelse)
-          | With(withitem* items, stmt* body, string? type_comment)
-          | AsyncWith(withitem* items, stmt* body, string? type_comment)
+          | For(expr target, expr iter, stmt body, stmt? orelse, string? type_comment)
+          | AsyncFor(expr target, expr iter, stmt body, stmt? orelse, string? type_comment)
+          | While(expr test, stmt body, stmt? orelse)
+          | If(expr test, stmt body, stmt? orelse)
+          | With(withitem* items, stmt body, string? type_comment)
+          | AsyncWith(withitem* items, stmt body, string? type_comment)
 
           | Match(expr subject, match_case* cases)
 
           | Raise(expr? exc, expr? cause)
-          | Try(stmt* body, excepthandler* handlers, stmt* orelse, stmt* finalbody)
-          | TryStar(stmt* body, excepthandler* handlers, stmt* orelse, stmt* finalbody)
+          | Try(stmt body, excepthandler* handlers, stmt? orelse, stmt? finalbody)
+          | TryStar(stmt body, excepthandler* handlers, stmt? orelse, stmt? finalbody)
           | Assert(expr test, expr? msg)
 
           | Import(alias* names)
@@ -50,6 +50,7 @@
           | Nonlocal(identifier* names)
           | Expr(expr value)
           | Pass | Break | Continue
+          | Suite(stmt* statements)
 
           -- col_offset is the byte offset in the utf8 string the parser uses
           attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset)
@@ -106,7 +107,7 @@
 
     comprehension = (expr target, expr iter, expr* ifs, int is_async)
 
-    excepthandler = ExceptHandler(expr? type, identifier? name, stmt* body)
+    excepthandler = ExceptHandler(expr? type, identifier? name, stmt body)
                     attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset)
 
     arguments = (arg* posonlyargs, arg* args, arg? vararg, arg* kwonlyargs,
@@ -125,7 +126,7 @@
 
     withitem = (expr context_expr, expr? optional_vars)
 
-    match_case = (pattern pattern, expr? guard, stmt* body)
+    match_case = (pattern pattern, expr? guard, stmt body)
 
     pattern = MatchValue(expr value)
             | MatchSingleton(constant value)
  • To update the AST, run ./scripts/update_asdl.sh
  • To update the generated parser, run ~/.cargo/bin/lalrpop parser/src/python.lalrpop

You may want to base your work on top of astral-sh/RustPython-Parser#15

@MichaReiser
Copy link
Member Author

This would be nice to have but isn't something that we need to ship the formatter. Closing for now

@konstin
Copy link
Member

konstin commented Jul 31, 2023

I'd still like to have this, but as a general not-linter-limited improvement that could remove so many .unwrap() in both the linter and the formatter.

@MichaReiser
Copy link
Member Author

I'd still like to have this, but as a general not-linter-limited improvement that could remove so many .unwrap() in both the linter and the formatter.

I'm now tracking this in #6183 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants