MassaLabs: Implement a Intermediate Representation to improve the compilation process #359

Leo-Besancon · 2024-10-17T14:25:50Z

The goal of this PR is to introduce a Middle Intermediate Representation, to avoid making optimization on the AST directly, while keeping enough information to handle type checking, optimization for each pass.

See the initial issue and design discussion for additional context:

Putting it as draft now that we have:

Basic MIR graph structure. We basically started from the AIR and added the operations we needed to represent the same expressions as the AST.
Lowering from the AST. The tests that check whether various AirScript programs compile with the pipeline Parse to AST > Constant propagation > Lowering to MIR all pass, but the resulting MIR is not checked. We may need to improve testing on this side.
Start of the inlining pass from MIR to MIR. For now, we focused on only inlining the function bodies at their callsite, but we do not unroll vectors for instance. We feel it may be better to make this in a separate unrolling pass if possible.

Additionally, we have a partial pretty printer for the MIR to help us debug in the future (to ensure the graph constructed is what we expect after each pass)

@bitwalker, don't hesitate to comment on things that should be handled differently.
For now, we haven't fully settled on the various nodes of the MIR graph, as we add / change things depending on the needs of our implementation. We will also add checks made to ensure the proper diagnostics are raised (potentially after each pass as discussed previously), but we will probably do this at a later stage.

Translation from AST has been vastly commented out, to rework

Variable: add argument position and referenced function index Definition: new node to differenciate with function call

Updated inlining test template

…ng_from_ast

- remove leading and ending new lines - display calls as SSA - remove var name in function return signatures - handle Operation {Sub, Mul} - display Constants as Type(value)

+ remap operation's parametters to new body's NodeIndex

Soulthym · 2024-10-25T15:39:07Z

@bitwalker, first of all, thank you for this detailed answer, it will sure help a lot!
We have left several comments trying to address all the points you have raised. We hope that helps clarifying the design we proposed and how we plan to incorporate your feedback.

bitwalker · 2024-10-25T20:29:27Z

@bitwalker, first of all, thank you for this detailed answer, it will sure help a lot!

You bet! Sorry if presenting a sketch of the RVSDG design gave you a scare - didn't want to stress you out worrying about needing to refactor things, hopefully it just provided a useful reference for past and future conversations.

We have left several comments trying to address all the points you have raised. We hope that helps clarifying the design we proposed and how we plan to incorporate your feedback.

Yes, I believe all of my questions are addressed, except maybe a few small fresh ones I've left as comments. Ping me once a few of the more complex tests are implemented, and I'll re-review ASAP.

Thanks for all the hard work!

Soulthym · 2024-10-28T16:21:53Z

Sorry if presenting a sketch of the RVSDG design gave you a scare - didn't want to stress you out worrying about needing to refactor things, hopefully it just provided a useful reference for past and future conversations.

No problem! It will serve as a reference for further discussions, and as a target to tend towards as we refactor our implementation based on what we need.

Yes, I believe all of my questions are addressed, except maybe a few small fresh ones I've left as comments. Ping me once a few of the more complex tests are implemented, and I'll re-review ASAP.

I have left comments addressing these. For now I don't think the remaining points require immediate attention, we already have plenty to do based on your feedback. I'll be sure to ping you when a review is needed.

Thanks again for all your valuable feedback!

implement real depth first remove external stack in post-order visit abstract Visit trait with auto implem for VisitContext implems

implement Enf, Call, Fold, For, If, Vector, Matrix, Boundary, Parameter

added Eq support to operation few bugfixes

Leo-Besancon · 2024-11-14T13:44:28Z

Hello @bitwalker !

We wanted to give you an update on what we've been working on, and to get your input on a few things.

Our main rework working branch is the following: https://github.com/massalabs/air-script/tree/thy-rework-ir

We've mainly worked on:

A rework of the base structure for the Mir based on your proposal (without NodeIndex, Operation directly references their children / parents). @Soulthym will give details on this aspect below.
Implemented a generic visitor pattern for the Mir -> Mir passes
Improved Inlining based on previous comments
Implemented Unrolling (on the previous structure)
Implemented Lowering from Mir to Air (on the previous structure)

What we have to do:

Complete rework of the structure
Now that we have the full pipeline AST -> Mir -> AIR:
- Adapt a few pass behaviors to the new structure. For instance, unrolling List Comprehensions still relies on incomplete logic as it needs the new structure to be implemented cleanly
- Make current tests on the Air pass (with the whole pipeline, we can better debug edge cases).
  - We have unreachable()! triggered for a lot of tests, debug and resolve these issues
  - We have infinite loops for some tests to investigate
Fully test everything

As a result, it would be great if you could look at the structure and answer our questions below so we can finalize the design!

Thank you again.

Soulthym · 2024-11-14T13:45:37Z

Hey!

Below is a summary of the new structure for the IR.
We wonder which of 1.a) or 1.b) would be the best approach to take, what's your thought on this?

Our rework of the MirGraph structure:

Our current structure:

Implemented traits are marked [Trait].
Rc/RefCell/Weak are ignored for clarity.
Fields not referenced in the Problems & Solution sections are also omitted.

dyn Op: gives access to the Operation 
dyn Value: * shared behaviour by all nodes

Operand: [Value]
  owner: Operation
  value: dyn Value
OpResult: [Value]
  owner: dyn Op
Operation: [Value]
  owner: dyn Op
  Vec<Operand>
  result: OpResult

Add: [Op, Value]
  Operation
...
Matrix: [Op, Value]
  Operation

Parameter: [Value]
  owner: FunctionOrEvaluator

Block:
  owner: FunctionOrEvaluator
  Vec<dyn Op>

Fonction/Evaluator:
  params: Parameter
  blocks: Vec<Block>

Problems & Potential Solutions:

P: Dynamic dispatch makes things tricky due to casting to non-primitive types.

S: Replace with enums.

P: `Rc<dyn Op>` only works on operations.

S: Make Function, Block and Evaluator [Op] that wrap Operation.

P: `Rc<T>` is immutable.

S1: Replaced by RefCell in most cases, but we loose ref-counting.
S2: We can use Rc<RefCell<T>.

P: Ref-counting cycles leak memory.

S: use RefCel<Weak<T>>.

P: You can't always know the owner ahead of time.

S: Some need to be Option<T>, mainly when inserting existing nodes into a block.

Proposed solutions:

1) To help match the semantics to the actual structure and avoid dynamic dispatch.

Either:

a) Rename traits and make them static:

Replace dyn Op with impl Parent:
exposes:
- insert_child -> impl Child
- get_children -> Vec<impl Child>
- as_operation -> Operation
Implement Parent for Function, Evaluator and Block
Replace dyn Value with impl Child:
exposes:
- get_parent -> impl Parent
Implement Child for all except {Function, Evaluator}
Add empty Leaf trait (not sure if necessary):
- allows expressing not(Parent) in generic trait bounds.
Add enums for all node kinds, to restore the original api semantics:
- Node: * (all kinds) (old: Value)
- Ops: Add, Mul, Sub, Felt, Vector, Matrix, Enf, For, If, Call, Fold
- HasBlock: Function, Evaluator, If, For

b) Use traits on top of enums:

Add enums for all node kinds, to restore the original api semantics:
- Node (old: dyn Value): * (all kinds)
- Ops (old: dyn Op): Add, Mul, Sub, Felt, Vector, Matrix, Enf, For, If, Call, Fold
- HasBlock: Function, Evaluator, If, For
- Parents: Ops + HasBlock - (Felt, Vector, Matrix)
- Children: Ops + Block
Implement Op, Value, Child, Parent on top of the enums
Reference the enums in nodes instead of the nodes directly

2) To avoid memory leaks and make owners mutable

Instead of Rc<dyn Op>, make owner fields for structs implementing [Child]:

if 1.a)
- Optional<RefCell<Weak<impl Parent>>>
elif 1.b)
- Optional<RefCell<Weak<Ops>>>

Notes after further testing [added on Tue. 19/11/24]:

Replacing <dyn Trait> with <impl Trait> is only doable in function arguments so that won't be enough for structure fields.
I tested both Rc<RefCell<dyn NodeTrait>>> and Rc<RefCell<EnumOfNodeTypes>>, the second approach seems to make the code clearer and easier to add new cases, so I'll be moving ahead with an enum based dispatch design.

rename NodeTypes to NodeType split NodeType into LeafNode, RootNode, MiddleNode removed debugging

new_i32 replaced by generic new_value LeafNode::I32 renamed to Value implemented Felt Leaf node improved conversions removed debugging

removed MiddleNode::Node renamed new_body -> new_scope implement Default

Merge new IR structure

Soulthym · 2024-12-02T11:17:45Z

Hello @bitwalker,
Since our last discussion, we added:

new graph structure for the ir based on enum dispatching
split NodeType into sub-categories (LeafNode, MiddleNode, RootNode)
single depth-level uniquing within scopes
Derive macro for IsNode, used for operations that wrap a Node (operations containing other nodes like Function, Eval, Add, Mul...)

Next step is to replace the old structure by this new one, which will be done through this PR: massalabs#4
(The various sub-tasks are listed in the PR comment)

Leo-Besancon and others added 30 commits September 23, 2024 09:05

feat - move AlgebraicGraph from ir to codegen/air

24e687a

Base IR graph, to improve

92952bd

Translation from AST has been vastly commented out, to rework

Make base IR structure (passes / translation from AST..)

f4880ab

Additional structure (MIR > AIR lowering), added TODO MIR comments

33da4a3

Add comments to TODOs

c036336

Add typed nodes in the Mir

d018fd6

Update MirValues to account for all case of binding in the AST

5047968

Add type for new elements

f34d16c

Add function variable node type

a615a8b

Update value.rs

1af3508

Implement some elements for lowering from AST to MIR

e4ce487

Inlining unit test structure

7b554ba

MirType:

18c1870

Variable: add argument position and referenced function index Definition: new node to differenciate with function call

Add human readable serialization to MirGraph

21dc897

Updated inlining test template

Make IR test pass

7c180c9

Merge remote-tracking branch 'origin/thy-ir-inlining' into leo_loweri…

6e58c2a

…ng_from_ast

Fix conflict Option<>

e8fa708

fix Migraph pretty printer counters not being shared between recursions

b0df63d

Merge branch 'thy-ir-inlining' into feat-implement-ir

6e1ddc4

Reworked use_list, reworked placeholder mechanic + fmt

6ca0349

Add placeholder type to avoid issues with Constant(0)

ee7f04d

MirGraph pretty printer: add call support + track func indexes

9a2f4f8

Inlining test: complete Input

1d7ca53

cargo fmt + clippy

33e022d

Fix double Enf in cas of Bindary::Eq

7cadc6f

improve MirGraph pretty printer

65db04c

- remove leading and ending new lines - display calls as SSA - remove var name in function return signatures - handle Operation {Sub, Mul} - display Constants as Type(value)

Add root management to differentiate between dead nodes and root nodes

b7834cb

Add functions tests in MIR

65e0bc6

Inlining 1/2: replace call(func, args) site by func.body

04ff191

Inlining 2/2: swap variables for arg values in inlined body

2876cd8

+ remap operation's parametters to new body's NodeIndex

Leo-Besancon and others added 15 commits November 5, 2024 14:25

Add helpers to insert nodes for each operation

4804834

Generic visitor

35dd907

expose roots and graph to visitor

03744ad

inlining with generic visitor

64cf047

implement post order visitor

a999882

fix missing visit of stored nodes

9176471

Rename VisitOrder::DepthFirst to Manual

ac51b88

implement real depth first remove external stack in post-order visit abstract Visit trait with auto implem for VisitContext implems

Add unrolling and lowering

f60e664

rework Operation: base structure

e9f1bd7

fix ir2 module structure

4910417

implement Sub and Mul

9e43103

make Operands dyn Value

78ab06d

implement Enf, Call, Fold, For, If, Vector, Matrix, Boundary, Parameter

blanket migration for degree and trace

f066d6a

added Eq support to operation few bugfixes

added Eq, Clone and Debug support to Value and Operations

103f453

convert MirValue to the new structure

d130f97

Soulthym added 8 commits November 19, 2024 17:48

ir2: new graph structure based on enums

987aa35

Split graph module, and several improvements to the graph datastructure

a76e357

fix builder pattern duplication and split+rename NodeType

cbd6620

rename NodeTypes to NodeType split NodeType into LeafNode, RootNode, MiddleNode removed debugging

replace asserts by expects

858d012

new_i32 replaced by generic new_value LeafNode::I32 renamed to Value implemented Felt Leaf node improved conversions removed debugging

Scope with unique insertions

750ecf6

removed MiddleNode::Node renamed new_body -> new_scope implement Default

IsNode derive macro

345cbe7

fix module name

002dfa5

Merge pull request #3 from massalabs/thy-rework-ir

2f1bc62

Merge new IR structure

bitwalker mentioned this pull request Dec 4, 2024

swap-ir-impl starting point massalabs/air-script#4

Open

36 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MassaLabs: Implement a Intermediate Representation to improve the compilation process #359

MassaLabs: Implement a Intermediate Representation to improve the compilation process #359

Leo-Besancon commented Oct 17, 2024

Soulthym commented Oct 25, 2024

bitwalker commented Oct 25, 2024

Soulthym commented Oct 28, 2024

Leo-Besancon commented Nov 14, 2024

Soulthym commented Nov 14, 2024 •

edited

Loading

Soulthym commented Dec 2, 2024

MassaLabs: Implement a Intermediate Representation to improve the compilation process #359

Are you sure you want to change the base?

MassaLabs: Implement a Intermediate Representation to improve the compilation process #359

Conversation

Leo-Besancon commented Oct 17, 2024

Soulthym commented Oct 25, 2024

bitwalker commented Oct 25, 2024

Soulthym commented Oct 28, 2024

Leo-Besancon commented Nov 14, 2024

Soulthym commented Nov 14, 2024 • edited Loading

Our rework of the MirGraph structure:

Our current structure:

Problems & Potential Solutions:

P: Dynamic dispatch makes things tricky due to casting to non-primitive types.

P: Rc<dyn Op> only works on operations.

P: Rc<T> is immutable.

P: Ref-counting cycles leak memory.

P: You can't always know the owner ahead of time.

Proposed solutions:

1) To help match the semantics to the actual structure and avoid dynamic dispatch.

a) Rename traits and make them static:

b) Use traits on top of enums:

2) To avoid memory leaks and make owners mutable

Notes after further testing [added on Tue. 19/11/24]:

Soulthym commented Dec 2, 2024

Soulthym commented Nov 14, 2024 •

edited

Loading

P: `Rc<dyn Op>` only works on operations.

P: `Rc<T>` is immutable.