Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MassaLabs: Implement a Intermediate Representation to improve the compilation process #359

Draft
wants to merge 54 commits into
base: next
Choose a base branch
from

Conversation

Leo-Besancon
Copy link

The goal of this PR is to introduce a Middle Intermediate Representation, to avoid making optimization on the AST directly, while keeping enough information to handle type checking, optimization for each pass.

See the initial issue and design discussion for additional context:

Putting it as draft now that we have:

  • Basic MIR graph structure. We basically started from the AIR and added the operations we needed to represent the same expressions as the AST.
  • Lowering from the AST. The tests that check whether various AirScript programs compile with the pipeline Parse to AST > Constant propagation > Lowering to MIR all pass, but the resulting MIR is not checked. We may need to improve testing on this side.
  • Start of the inlining pass from MIR to MIR. For now, we focused on only inlining the function bodies at their callsite, but we do not unroll vectors for instance. We feel it may be better to make this in a separate unrolling pass if possible.

Additionally, we have a partial pretty printer for the MIR to help us debug in the future (to ensure the graph constructed is what we expect after each pass)

@bitwalker, don't hesitate to comment on things that should be handled differently.
For now, we haven't fully settled on the various nodes of the MIR graph, as we add / change things depending on the needs of our implementation. We will also add checks made to ensure the proper diagnostics are raised (potentially after each pass as discussed previously), but we will probably do this at a later stage.

Leo-Besancon and others added 30 commits September 23, 2024 09:05
Translation from AST has been vastly commented out,
to rework
Variable: add argument position and referenced function index
Definition: new node to differenciate with function call
Updated inlining test template
- remove leading and ending new lines
- display calls as SSA
- remove var name in function return signatures
- handle Operation {Sub, Mul}
- display Constants as Type(value)
+ remap operation's parametters to new body's NodeIndex
@Soulthym
Copy link

@bitwalker, first of all, thank you for this detailed answer, it will sure help a lot!
We have left several comments trying to address all the points you have raised. We hope that helps clarifying the design we proposed and how we plan to incorporate your feedback.

@bitwalker
Copy link
Contributor

@bitwalker, first of all, thank you for this detailed answer, it will sure help a lot!

You bet! Sorry if presenting a sketch of the RVSDG design gave you a scare - didn't want to stress you out worrying about needing to refactor things, hopefully it just provided a useful reference for past and future conversations.

We have left several comments trying to address all the points you have raised. We hope that helps clarifying the design we proposed and how we plan to incorporate your feedback.

Yes, I believe all of my questions are addressed, except maybe a few small fresh ones I've left as comments. Ping me once a few of the more complex tests are implemented, and I'll re-review ASAP.

Thanks for all the hard work!

@Soulthym
Copy link

Sorry if presenting a sketch of the RVSDG design gave you a scare - didn't want to stress you out worrying about needing to refactor things, hopefully it just provided a useful reference for past and future conversations.

No problem! It will serve as a reference for further discussions, and as a target to tend towards as we refactor our implementation based on what we need.

Yes, I believe all of my questions are addressed, except maybe a few small fresh ones I've left as comments. Ping me once a few of the more complex tests are implemented, and I'll re-review ASAP.

I have left comments addressing these. For now I don't think the remaining points require immediate attention, we already have plenty to do based on your feedback. I'll be sure to ping you when a review is needed.

Thanks again for all your valuable feedback!

@Leo-Besancon
Copy link
Author

Hello @bitwalker !

We wanted to give you an update on what we've been working on, and to get your input on a few things.

Our main rework working branch is the following: https://github.com/massalabs/air-script/tree/thy-rework-ir

We've mainly worked on:

  • A rework of the base structure for the Mir based on your proposal (without NodeIndex, Operation directly references their children / parents). @Soulthym will give details on this aspect below.
  • Implemented a generic visitor pattern for the Mir -> Mir passes
  • Improved Inlining based on previous comments
  • Implemented Unrolling (on the previous structure)
  • Implemented Lowering from Mir to Air (on the previous structure)

What we have to do:

  • Complete rework of the structure
  • Now that we have the full pipeline AST -> Mir -> AIR:
    • Adapt a few pass behaviors to the new structure. For instance, unrolling List Comprehensions still relies on incomplete logic as it needs the new structure to be implemented cleanly
    • Make current tests on the Air pass (with the whole pipeline, we can better debug edge cases).
      • We have unreachable()! triggered for a lot of tests, debug and resolve these issues
      • We have infinite loops for some tests to investigate
  • Fully test everything

As a result, it would be great if you could look at the structure and answer our questions below so we can finalize the design!

Thank you again.

@Soulthym
Copy link

Soulthym commented Nov 14, 2024

Hey!

Below is a summary of the new structure for the IR.
We wonder which of 1.a) or 1.b) would be the best approach to take, what's your thought on this?

Our rework of the MirGraph structure:

Our current structure:

Implemented traits are marked [Trait].
Rc/RefCell/Weak are ignored for clarity.
Fields not referenced in the Problems & Solution sections are also omitted.

dyn Op: gives access to the Operation 
dyn Value: * shared behaviour by all nodes

Operand: [Value]
  owner: Operation
  value: dyn Value
OpResult: [Value]
  owner: dyn Op
Operation: [Value]
  owner: dyn Op
  Vec<Operand>
  result: OpResult

Add: [Op, Value]
  Operation
...
Matrix: [Op, Value]
  Operation

Parameter: [Value]
  owner: FunctionOrEvaluator

Block:
  owner: FunctionOrEvaluator
  Vec<dyn Op>

Fonction/Evaluator:
  params: Parameter
  blocks: Vec<Block>

Problems & Potential Solutions:

P: Dynamic dispatch makes things tricky due to casting to non-primitive types.

S: Replace with enums.

P: Rc<dyn Op> only works on operations.

S: Make Function, Block and Evaluator [Op] that wrap Operation.

P: Rc<T> is immutable.

S1: Replaced by RefCell in most cases, but we loose ref-counting.
S2: We can use Rc<RefCell<T>.

P: Ref-counting cycles leak memory.

S: use RefCel<Weak<T>>.

P: You can't always know the owner ahead of time.

S: Some need to be Option<T>, mainly when inserting existing nodes into a block.

Proposed solutions:

1) To help match the semantics to the actual structure and avoid dynamic dispatch.

Either:

a) Rename traits and make them static:

  • Replace dyn Op with impl Parent:
    exposes:
    • insert_child -> impl Child
    • get_children -> Vec<impl Child>
    • as_operation -> Operation
  • Implement Parent for Function, Evaluator and Block
  • Replace dyn Value with impl Child:
    exposes:
    • get_parent -> impl Parent
  • Implement Child for all except {Function, Evaluator}
  • Add empty Leaf trait (not sure if necessary):
    • allows expressing not(Parent) in generic trait bounds.
  • Add enums for all node kinds, to restore the original api semantics:
    • Node: * (all kinds) (old: Value)
    • Ops: Add, Mul, Sub, Felt, Vector, Matrix, Enf, For, If, Call, Fold
    • HasBlock: Function, Evaluator, If, For

b) Use traits on top of enums:

  • Add enums for all node kinds, to restore the original api semantics:
    • Node (old: dyn Value): * (all kinds)
    • Ops (old: dyn Op): Add, Mul, Sub, Felt, Vector, Matrix, Enf, For, If, Call, Fold
    • HasBlock: Function, Evaluator, If, For
    • Parents: Ops + HasBlock - (Felt, Vector, Matrix)
    • Children: Ops + Block
  • Implement Op, Value, Child, Parent on top of the enums
  • Reference the enums in nodes instead of the nodes directly

2) To avoid memory leaks and make owners mutable

Instead of Rc<dyn Op>, make owner fields for structs implementing [Child]:

  • if 1.a)
    • Optional<RefCell<Weak<impl Parent>>>
  • elif 1.b)
    • Optional<RefCell<Weak<Ops>>>

Notes after further testing [added on Tue. 19/11/24]:

  • Replacing <dyn Trait> with <impl Trait> is only doable in function arguments so that won't be enough for structure fields.
  • I tested both Rc<RefCell<dyn NodeTrait>>> and Rc<RefCell<EnumOfNodeTypes>>, the second approach seems to make the code clearer and easier to add new cases, so I'll be moving ahead with an enum based dispatch design.

rename NodeTypes to NodeType
split NodeType into LeafNode, RootNode, MiddleNode
removed debugging
new_i32 replaced by generic new_value
LeafNode::I32 renamed to Value
implemented Felt Leaf node
improved conversions
removed debugging
removed MiddleNode::Node
renamed new_body -> new_scope
implement Default
@Soulthym
Copy link

Soulthym commented Dec 2, 2024

Hello @bitwalker,
Since our last discussion, we added:

  • new graph structure for the ir based on enum dispatching
  • split NodeType into sub-categories (LeafNode, MiddleNode, RootNode)
  • single depth-level uniquing within scopes
  • Derive macro for IsNode, used for operations that wrap a Node (operations containing other nodes like Function, Eval, Add, Mul...)

Next step is to replace the old structure by this new one, which will be done through this PR: massalabs#4
(The various sub-tasks are listed in the PR comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants