Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve AST with respect to estree #2854

Closed
6 of 12 tasks
Boshen opened this issue Mar 29, 2024 · 21 comments
Closed
6 of 12 tasks

Improve AST with respect to estree #2854

Boshen opened this issue Mar 29, 2024 · 21 comments
Assignees
Labels
A-ast Area - AST P-high Priority - High
Milestone

Comments

@Boshen
Copy link
Member

Boshen commented Mar 29, 2024

This is not exhaustive but should cover all 'types' of differences to resolve:

  • Part of the AST is still based on the "import assertion" spec and should be updated. I want to tackle this I think it's a good next for me to do.
  • setProp(node, "typeArguments", node.typeParameters) is not a mistake, it's because TSESLint deprecated typeParameters in multiple nodes for typeArguments but keep both for now for backward compatibility. I'm not sure what is the right move for OXC here. For ref: fix: rename typeParameters to typeArguments where needed typescript-eslint/typescript-eslint#5384
  • Sometimes node are missing boolean, like prefix: true on UnaryExpression. I don't yet know how to do that in serde
  • StaticMemberExpression, ComputedMemberExpression & PrivateFieldExpression need to be serialized to MemberExpression with few changes
  • StringLiteral, BooleanLiteral, ... need to be serialized as Literal with raw value set. For BigInt, because it's not in the JSON spec, it would still requires some work on the JS side but this can be done as part of the reviver function when desrializing.
  • modifiers is not an ESTree concept. So anytime there is this prop in OXC it probably means you need to set one or two boolean on the equivalent node. Examples are VariableDeclaration.declare, ClassExpression.abstract, TSEnumDeclaration.const. My take here is that it will make the OXC AST more clean and easy to explore to have named booleans on nodes instead of a comment to say which one are 'allowed'. But I don't know the implication of this.
  • Some types that are nullable in OXC default to empty array in ESTree, like TSInterfaceDeclaration.extends, ClassExpression.implements
  • FunctionBody is a BlockStatement in ESTree
  • FormalParameters is closer to ESTree now, but should be inlined directly into <node>.params in multiple nodes
  • TSMappedTypeModifierOperator is boolean | '-' | '+' | undefined in TSESTree, I think my conversion is incomplete at the moment
  • UsingDeclaration is a VariableDeclaration (Not done in my wrapper)
  • Not sure yet, but the structure of hashbang, comments & directives needs some updates

Some changes could be seen as rename only, but as soon we touch the name of the nodes the type generation became complex to handle that why I didn't push more PRs for that for now.

For info the TSUnionType hack is really prettier specific.

But the biggest blocker to make my repo public is the utf16 conversion. The time to print some large files goes in seconds with the dump implementation, and because I saw you plan to solve that I don't want to optimize for it.

Some good news is that my repo successfully gave the same output of prettier on ~500 TSX files, ~8k dts files & 10k files inside my node_modules. This is obviously a very biased dataset, but still promising!

Originally posted by @ArnaudBarre in #2463 (comment)

@Boshen Boshen changed the title Improve AST in respect to estree Improve AST with respect to estree Mar 29, 2024
@Boshen
Copy link
Member Author

Boshen commented Mar 29, 2024

modifiers is not an ESTree concept. So anytime there is this prop in OXC it probably means you need to set one or two boolean on the equivalent node. Examples are VariableDeclaration.declare, ClassExpression.abstract, TSEnumDeclaration.const. My take here is that it will make the OXC AST more clean and easy to explore to have named booleans on nodes instead of a comment to say which one are 'allowed'. But I don't know the implication of this.

The "modifiers" concept was ported from a contributor from a long time ago. We may need to reevaluate this decision and make larger changes. The underlying problem is how recoverable our parsing should be.

Boshen pushed a commit that referenced this issue Apr 22, 2024
Don't include `trailing_comma` fields in JSON AST, for compat with
ESTree (#2854).
@sxzz

This comment was marked as resolved.

@sapphi-red
Copy link
Member

sapphi-red commented Jan 16, 2025

@overlookmotel
Copy link
Contributor

overlookmotel commented Jan 17, 2025

To surface all the obscure differences between our JS-side AST and ESTree, I think we ideally need conformance tests. Initially we'll have tons of failures, but we can chip away at them until there are none. I suggest:

  • Parse all Test262 fixtures with Acorn and write the resulting ASTs as JSON files.
  • Run Oxc parser on same fixtures and serialize to JSON with serde.
  • Conformance runner compare the two, and generate snapshot files showing which fixtures fail.

The Acorn AST JSON files will take time to compute and will be many many files. We could:

  • Create a new repo with the code to run Acorn.
  • Check in the resulting JSON files to that repo.
  • Link that repo into main Oxc repo as a git submodule (same as we do with Test262, Babel, etc fixtures).

Then the conformance tests could be written purely in Rust, and we only need to run Acorn once, not every time conformance runs.

That'll only cover us for JS syntax. I guess we could do similar for TS ASTs using fixtures from TypeScript repo and using typescript-eslint's parser to generate the JSON files. But personally I think we should concentrate on the JS AST first.

Does that sound like a good idea? If so, does anyone have an interest in building the infrastructure?

@ArnaudBarre
Copy link
Contributor

ArnaudBarre commented Jan 17, 2025

I think that a tool like this would be useful for interfacing with other tools that manipulate source code (my initial goal that I still want to pursue at some point is having a fast prettier parser). And for me source code is TS code and I don't know anyone still thinking that people write mainly JS (expect ESLint). So I think the infra should target TS code in priority, but I don't have the bandwidth to do it so maybe Test262 is better than nothing.

@overlookmotel
Copy link
Contributor

Just to clarify: The target is very much TS. But as TS is a superset of JS, in my opinion it'd be easier if we tackle JS first, and then go on to TS.

I am also keen to make sure that there aren't any places where JS and TS ASTs diverge from each other (aside from TS AST adding new types, and new properties to existing JS types). i.e. I want to make sure that we can satisfy both ESTree and TS AST compatibility with a single AST shape. If we can't, that will inform how the serializer needs to be designed (which is a work in progress - #6347).

@Boshen Boshen added P-high Priority - High A-ast Area - AST labels Jan 20, 2025
@Boshen Boshen added this to the estree milestone Jan 21, 2025
overlookmotel added a commit that referenced this issue Feb 5, 2025
- part of #2854

This PR attempts to handle estree ast incompatibility of
`Function.params: FormalParameters` as mentioned in the above issue:

> `FormalParameters` is closer to ESTree now, but should be inlined
directly into `<node>.params` in multiple nodes.

Estree spec has `Function.params: Pattern[]`
https://github.com/estree/estree/blob/master/es5.md#functions, but oxc
already has `interface Pattern`, so I named it to `Function.params:
ParamPattern[]` for now.

Also I'm not sure about the testing (I suppose that's a part of
#8630), so I snapshoted one
example code. For comparison, here is acorn's output
https://astexplorer.net/#/gist/25138c0605f82dcfc1a8fd363dc2a681/5ad30d36c9f276519063e6fd2e340c113d8c85b0

---------

Co-authored-by: overlookmotel <theoverlookmotel@gmail.com>
overlookmotel added a commit that referenced this issue Feb 6, 2025
Part of #2854

I added a bit weird macro `add_entry(computed = true)` to add constant.
Likely this is not the best way. I would appreciate any suggestion for a
better approach 🙏

---------

Co-authored-by: overlookmotel <theoverlookmotel@gmail.com>
overlookmotel pushed a commit that referenced this issue Feb 6, 2025
part of #2854
ref: https://github.com/estree/estree/blob/master/es2015.md#programs

I tried to move `via` customization on `struct SourceType`, but couldn't
find a way maybe because struct is defined in `oxc_span`. For now, I
made this as `via` on `Program::source_type` field.
@Boshen Boshen assigned overlookmotel and unassigned Boshen Feb 9, 2025
graphite-app bot pushed a commit that referenced this issue Feb 9, 2025
Part of #2854

diff of estree_test262.snap

```diff
-Positive Passed: 6233/44293 (14.07%)
+Positive Passed: 13042/44293 (29.44%)
```
graphite-app bot pushed a commit that referenced this issue Feb 9, 2025
Part of #2854

difff os estree_test262.snap

```diff
-Positive Passed: 13042/44293 (29.44%)
+Positive Passed: 16040/44293 (36.21%)
```
graphite-app bot pushed a commit that referenced this issue Feb 9, 2025
Part of #2854

```diff
-Positive Passed: 16040/44293 (36.21%)
+Positive Passed: 28924/44293 (65.30%)
```
graphite-app bot pushed a commit that referenced this issue Feb 9, 2025
Part of #2854

```diff
-Positive Passed: 28924/44293 (65.30%)
+Positive Passed: 30446/44293 (68.74%)
```
Dunqing pushed a commit that referenced this issue Feb 10, 2025
Part of #2854

diff of estree_test262.snap

```diff
-Positive Passed: 6233/44293 (14.07%)
+Positive Passed: 13042/44293 (29.44%)
```
Dunqing pushed a commit that referenced this issue Feb 10, 2025
Part of #2854

difff os estree_test262.snap

```diff
-Positive Passed: 13042/44293 (29.44%)
+Positive Passed: 16040/44293 (36.21%)
```
Dunqing pushed a commit that referenced this issue Feb 10, 2025
Part of #2854

```diff
-Positive Passed: 16040/44293 (36.21%)
+Positive Passed: 28924/44293 (65.30%)
```
Dunqing pushed a commit that referenced this issue Feb 10, 2025
Part of #2854

```diff
-Positive Passed: 28924/44293 (65.30%)
+Positive Passed: 30446/44293 (68.74%)
```
graphite-app bot pushed a commit that referenced this issue Feb 10, 2025
Part of #2854
Related: #8972 (comment)

I updated `append_to` macro to support concatenating two `Vec`. The logic is very similar so I reused it, but maybe it's better to introduce a new macro or maybe rename `append_to`?
overlookmotel pushed a commit that referenced this issue Feb 10, 2025
…ression` (#8988)

Part of #2854

Serialization of `ArrowFunctionExpression.body` requires knowing
`ArrowFunctionExpression.expression: boolean`, so I generalized
`ESTreeStructField::via` macro to cover this use case.

This alone doesn't improve conformance since
`ArrowFunctionExpression.generator/id` needs to be fixed first by
#8980.
@overlookmotel
Copy link
Contributor

overlookmotel commented Feb 11, 2025

We are now at 99.88% on the conformance tests vs Acorn for JS syntax. What's left is:

Two oddities:

And 2 groups of edge cases which we currently need to accept failure on:

With apologies, I do intend to destroy these fine results shortly by making the conformance test runner stricter - not ignoring extraneous fields, and requiring fields to be in the right order as well in the JSON. But all the test failures we'll get from those changes should be easy to fix.

Main next step, once the last few tests listed above are fixed, will be to go on to TS and JSX.

@overlookmotel
Copy link
Contributor

Looks like language/expressions/template-literal/tv-line-terminator-sequence.js fail is a bug in the lexer. I'll look into it tomorrow.

@Boshen
Copy link
Member Author

Boshen commented Feb 12, 2025

Consider this issue done when acorn estree compat is complete. We'll create another tracking issue for typescript eslint estree, but the core team will not spend time on it for the time being.

@overlookmotel
Copy link
Contributor

All JS tests now passing except the small number which fail due to #3526 and #9029. #9073 filters out those test cases, which leaves 100% pass!

Next step: Make the tester stricter to make sure no extraneous fields in our AST which aren't related to TS, and to ensure fields are in the right order (important for usability and consistent object "shapes" for JS engines). This will no doubt cause many of the passing tests to fail again, but they should be fairly quick to fix.

I'm going to check what Rollup (rollup/parseAst) does for TS and JSX - whether it guarantees compatibility with any particular standard. If it doesn't then, as Boshen says, we can leave TS/JSX compatibility until later.

@overlookmotel
Copy link
Contributor

overlookmotel commented Feb 12, 2025

Looks like Rollup's parseAst does aim to conform to typescript-eslint:

For encoded non-JavaScript AST nodes like TypeScript or JSX, we try to follow
the format of typescript-eslint, which can be derived from their playground
https://typescript-eslint.io/play/#showAST=es&fileType=.tsx
For JSX, see also https://github.com/facebook/jsx/blob/main/AST.md

https://github.com/rollup/rollup/blob/4b8745922d37d8325197d5a6613ffbf231163c7d/scripts/ast-types.js#L23-L26

@Boshen I assume this means that we do need to make our JS-side AST conform to typescript-eslint after all?

On the positive side, given how swiftly we've been able to achieve ESTree compatibility, and that our TS types are broadly similar to typescript-eslint's AST anyway, I doubt this would be a huge task.

@hi-ogawa
Copy link
Contributor

I don't think rollup/parseAst supports typescript. The code comment you referenced is likely stale or meant for something different. SWC syntax seems mostly fixed here https://github.com/rollup/rollup/blob/4b8745922d37d8325197d5a6613ffbf231163c7d/rust/parse_ast/src/lib.rs#L23-L30 and I confirmed ts syntax gives a parse error https://stackblitz.com/edit/vitejs-vite-ohgd7eku?file=repro.js

For jsx support, that's fairly recent addition in rollup https://github.com/rollup/rollup/releases/tag/v4.24.0, so I'd expect the downstream usage is very small or maybe none. For Vite ecosystem specifically, estree without jsx and typescript should suffice as far as I know.

@Boshen Boshen closed this as completed Feb 13, 2025
@overlookmotel
Copy link
Contributor

overlookmotel commented Feb 13, 2025

OK great!

I do think we should support JSX, since Rollup does. Thankfully it'll be pretty simple as there's very few JSX types.

I'll open a separate issue for that.

@overlookmotel
Copy link
Contributor

Re-opening this issue for the remaining work on JS syntax - correct property order, and ensuring no extraneous properties in the AST (except those related to TS).

@overlookmotel overlookmotel reopened this Feb 13, 2025
@overlookmotel
Copy link
Contributor

overlookmotel commented Feb 22, 2025

#9285 ticks off those last 2. After that's merged, JS-side AST is 100% ESTree compatible. The new stricter conformance tester flushed out a few bugs, which are now fixed. And the new JSON serializer is 25% faster than serde.

All that remains is to decide how to plumb it into the NAPI/WASM bindings.

@Boshen
Copy link
Member Author

Boshen commented Feb 26, 2025

Oxc v0.53.0 now returns an estree for direct consumption in JavaScript through napi and wasm APIs, no more ugly work arounds.

@Boshen Boshen closed this as completed Feb 26, 2025
@cernymatej
Copy link

@Boshen is there a way to access the information about whether an identifier is a reference one or not, like it was possible before? (in javascript oxc-parser) Or is it only estree from now on?

@Boshen
Copy link
Member Author

Boshen commented Feb 26, 2025

@Boshen is there a way to access the information about whether an identifier is a reference one or not, like it was possible before? (in javascript oxc-parser) Or is it only estree from now on?

Nope, it's estree, people wants it, it has less information than our Rust AST, sadly we can't go back now.

@cernymatej
Copy link

I see. Thank you!
Would you consider adding additional OXC-specific information to the nodes so that they maintain ESTree compatibility while preserving useful details? Or do you think that wouldn’t be worth exploring?

@overlookmotel
Copy link
Contributor

overlookmotel commented Mar 5, 2025

@cernymatej Sorry for belated response. We may consider in the future providing a way to access this information on JS side. But I'm afraid that's likely a way off. ESTree compatibility was the primary goal, as it's what's needed for interop with other tools.

@cernymatej
Copy link

Great, thank you for the info! 🙂
At least the identifier type (reference, binding,…) would be a big improvement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ast Area - AST P-high Priority - High
Projects
None yet
Development

No branches or pull requests

7 participants