Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swap-ir-impl starting point #4

Open
wants to merge 53 commits into
base: feat-implement-ir
Choose a base branch
from
Open

Conversation

Soulthym
Copy link
Collaborator

@Soulthym Soulthym commented Dec 2, 2024

  • Define missing operation types: now under ir2/nodes/...

    • Leafs
      • Value
      • Variable Parameter
    • Binary ops
      • Add
      • Sub
      • Mul
    • Unary ops
      • Enf
      • Boundary
    • Aggregation
      • Vector
      • Matrix
    • Blocks
      • Function
      • Evaluator
      • For
      • If
    • Structured ops
      • Call
      • Fold
  • Implement Spanned trait for nodes:

#derive(Spanned)
struct ... {
  #[span]
  span: ...
  ...
}
  • Implement converters for:
Link<NodeType> -> Link<Node>
Link<NodeType> -> Link<Leaf<T>>
Link<NodeType> -> Link<Graph>

Helper traits are available (IsNode, NotNode, IsGraph, NotGraph, IsLeaf, NotLeaf)

  • And in reverse.

  • Match graph/mod.rs api, now under ir2/graph.rs

    • Pretty printer (for better SSA visualization): uncommited - waiting for all nodes to match
    • insert_op_* -> new_*().add_child() or Add::new(parent, lhs, rhs)
  • Redefine visitor pattern with the new structure

  • Redefine remaining APIs referencing NodeIndex, replace them with Link<NodeType>

  • Convert passes to use ir2 and updated visitor pattern

    • Lowering AST > MIR
    • Inlining
    • Unrolling
    • Lowering MIR > AIR
  • Add validation to the builder pattern

  • Maybe isolate builder pattern into its own trait

The updated check-list was moved to this comment: #4 (comment)

@bitwalker
Copy link
Collaborator

bitwalker commented Dec 4, 2024

@Soulthym I just realized that since this PR is in your fork, and not the 0xPolygonMiden repo, I can't leave review comments. Not sure if it's easier to open a PR stacked on 0xPolygonMiden/air-script#359 with these changes, or add me as a collaborator to this repo for reviews, but I'll have to wait for one or the other to provide my review feedback. Just let me know which!

@Soulthym
Copy link
Collaborator Author

Soulthym commented Dec 5, 2024

Hey @bitwalker, we just added you as a contributor to our fork. We hope that solves the issue.

Leo-Besancon and others added 9 commits December 10, 2024 09:06
* derive Hash for SpannedMirValue

* ir3: better node splitting, no macro, simpler api
missimg builder pattern

* replaced Link<Op> by Link<Owner> for Operations and Leaves

* added IndexAccess Operand

* renamed IndexAccess to Accessor
re use parser's AccessType
impl Hash for AccessType

* fix call being copied instead of pointed to by get_children()

* typesafe builder pattern for most ops. Misssing structured ops

* type-safe Builder pattern for structured ops

* MirGraph Builder editing

* Add Leaves to Op enum

* ir3: comment top level items

* isolate Leaf, Op, Owner, Root

* add converters to enums

* added converters to structs
renamed IndexAccess to Acccessor

* Initial merge ir3 > update_passes

* remove comments

* Update mod.rs

* Graph: implement public api

* Swapped in helpers of passes/mod.rs

* Node: all node types + converters

* Full inlining with Node + cargo fmt

* Remove Link for FoldOperator

* Swap unrolling

* Swap lowering Mir -> Air

* Update visitor (with suboptimal get_children() herlper, to refactor)

* Add MirType to Parameter

* Update inlining to handle evaluators

* most of translated adapted

* add type support for Parameter in translate

* in build_statement, handle Owner::None (integrity or boundary constraint)

---------

Co-authored-by: Thybault Alabarbe <thybault.alabarbe@gmail.com>
@Soulthym
Copy link
Collaborator Author

Soulthym commented Dec 12, 2024

Hey @bitwalker,

We have some tests passing on the end2end pipeline, missing the lowering pass ones.

Missing features:

  • tests for the lowering Mir -> Air
  • span fields
  • a rework of the visitor pattern specialized with the new structure
  • improve documentation
  • builder derive macro
  • evaluator argument mapping
  • rework the passes with the new visitor pattern.
  • conditional evaluators

Based on the mir crate tests, here is what is currently missing/broken:

  • Too restrictive handling of iterables during translation. Affects:
    • functions
    • integrity_constraints
    • list_folding
  • miden-diagnostics unwraps on File missing. I suspect it's due to the missing spans. Affects:
    • selectors
  • inlining does not mutate the Mir in all places.

Edit as of 13 dec.: all current tests fixed by relaxing constraints for iterables

Once the API is stabilized, we will refactor the boilerplate.

The updated check-list was moved to this comment: #4 (comment)

@Soulthym
Copy link
Collaborator Author

Soulthym commented Dec 20, 2024

Hey @bitwalker, here is a recap of the work left to do.

The latest version is available on a new branch of this fork: https://github.com/massalabs/air-script/tree/fix-ir-translate
It will get merged in the current branch once all issues are resolved.

Currently, the tests in the mir package (cargo test --package mir) translate from AST to MIR, and we've identified the problems detailed below in the obtained graphs. Otherwise the graphs seem consistent with what we expect.

There are currently 3 identified issues left to work on, plus the features not marked as resolved above.

Parameter/argument unpacking:

Parameter unpacking for *AccessBinding is currently done during translation from ast to mir, but should happen during inlining.

  • Don't unpack call arguments during build, and dont check the length of arguments.
  • Don't unpack parameters during translation from ast to Mir:
    • Instead add a binding to a Vector<Op::Parameter>
    • and replace Op<Value<... TraceAccessBinding>> in the function/evaluator's body by Accessor<Vector<Op::Parameter>, TraceAccess> to the corresponding bound value.
  • Unpack during inlining

Mutability is not really handled through enums and conversions.

Mutating one occurence does not mutate others because conversion is done through cloning the underlying structure and wrapping that in a new Link.

  • Add Link<T> to enum variants' content, ie. :
enum Op {
    Add(Add),
    ...
}

becomes

enum Op {
    Add(Link<Add>)
    ...
}
  • Rework the builder pattern accordingly:
    • Builder works on Links directly, and mutate references instead of consuming self.
    • remove the last .build() step to allow storing partial but valid objects as their corresponding enum variant without adding new variants (their types are different)
    • remove unneeded .edit() abstraction.
  • Change parent from BackLink<Op> to Vec<BackLink<Op>> to allow sharing of an operation node:
    • Enable automatic parent setting in the builder pattern when adding children. They are currently not set at all.

Rework passes

After the previous is done, we will be able to work on:

  • inlining pass which currently mostly doesnt mutate anything if at all properly. It has been excluded temporarily from the pipeline while we work on fixing it.
  • unrolling pass which is not currently in the pipeline as it relies on the inlining pass.

cleanup the codebase

  • add a few tests for identified edegecases tested via codegen diffs
  • convert relevant tests to compare parsed Mir between the input program and the expected optimized version. tested via codegen diffs
  • document all public interfaces
  • refactor the boilerplate:
    • extract the builder pattern into a derive macro Builder
    • extract converters into As* and TryAs* traits (AsNode, AsOp, AsRoot, AsAdd, ..., TryAsNode) and add generic implementations where possible. Not needed

The updated check-list was moved to this comment: #4 (comment)

@Soulthym
Copy link
Collaborator Author

Soulthym commented Jan 15, 2025

Hey,
After some experimentations you can find in the history of this branch, we have figured out a way to restore shared mutability when performing mutations on the IR.

For some context about why this change was needed, the current implementation has the following limitations:

  1. mutating a struct to another type of struct is simply not supported, even if they are the same kind of struct (e.g. changing a Link<Add> to Link<Sub> are both Op members).
  2. mutating a struct to a different struct with the same type does not propagate the update to existing enum wrappers (Link<Node>, Link<Op>, Link<Root>, Link<Leaf>, Link<Owner>)

Proposed solutions:

1) wrap struct in enums by default

  • we can wrap structs in an enum by default. This will allow us to mutate a struct to another variant of the same enum.
    After some tests, we found that it doesn't need to be Node, and a more specialized enum can be used, such as Op and Root, which cover all node types without overlap.

main caveat:

  • accessing the underlying struct from the enum is not ergonomic, and it requires a match/if let statement to unwrap the enum when needed.
    Converters to the struct type (as_${struct_name}) become fairly useless in this case, since we need to match on the enum to access the struct fields anyway.
    Note: there may be a way to restore functionality via RefCell::map or RefCell::map_split, but isnt a priority to restore functionality and we have not tested it yet.

2) Store Singletons

make enum wrappers singletons, so that they are shared across all instances of the struct.

  • 2.a) we modify other enum wrappers (Node, Owner, Leaf) to take a BackLink<Op>or BackLink<Root> to avoid reference cycles.
  • 2.b) we can then store a singleton of the struct's enum wrappers in the struct itself. The singleton will thus be shared across all instances of the struct.
  • 2.c) we expose a set(&self, value: &Self) helper method on Link<Op> and Link<Root> that will update the singleton when the struct is mutated or replaced with another variant of the same enum.

main caveat:

  • mutating manually the struct will not udpate the singletons. every mutation needs to happen through the obj.set(&value) helper method.

A minimum working example of the proposed solution is available Here, with a patched Link implementation

Identified codebase changes to implement the solution:

  • Merge patched Link implementation
  • Modify Node, Owner, Leaf to take a BackLink<Op>or BackLink<Root>, keep all variants, just make the field take a BackLink<Op>or BackLink<Root> instead of a Link<Add>, Link<Mul>, etc.
  • Add a field to all structs that will store the singletons of each enum wrapper.
    Easiest way to do this is to add the following fields to:
    • all structs that implement Parent:
    _node: Option<Link<Node>>,
    _owner: Option<Link<Owner>>,
    • all members of Leaf:
    _node: Option<Link<Node>>,
    _leaf: Option<Link<Leaf>>,
  • modify all methods to take Link<Op> or Link<Root> where applicable
  • add a set(&self, value: &Self) method to Link<Op> and Link<Root>, that updates the singleton when the struct is mutated or replaced with another variant of the same enum.
  • replace all mutations of the struct with the set method calls
  • replace all as_${struct_name} methods with match/if let statements to unwrap the enum when accessing the fields is needed.
  • update the Builder derive macro to ignore fields that start with _ to exclude the singletons from the builder pattern.
  • re-implement converters using RefCell::map or RefCell::map_split after testing.

The updated check-list was moved to this comment: #4 (comment)

@bitwalker
Copy link
Collaborator

I'm a bit concerned about the overall complexity of things here, largely resulting from the way Link/BackLink, and the various special entity types like Owner, Node, etc., interact. It seems particularly awkward, and the issues with mutating the IR I think largely fall out of the concrete representation we ended up with here, rather than the general idea behind it.

What do you think about simplifying this a bit, along the lines of the following approach:

  • Each operation type stands alone as a separate type, e.g. Call
  • Each operation type implements an object-safe Op trait, which exposes methods to query both children and parent of the op (see below), as well as request metadata that is generic over operations (e.g. arguments, results, attributes), and provide basic operations for mutating them.
  • A similar representation would be used for "values", so that you can operation on values generically, but also access the concrete type via downcasting when needed.
  • Define a type alias pub type Entity<T> = Rc<RefCell<T>>;. This will keep the visual clutter down, but also enable a lot of ergonomic improvements available via Rc that cannot be applied to custom smart pointer types without nightly features (for now at least). An equivalent alias for Weak would also be needed.

In short, you'd generally be operating on either an Entity<T> or an Entity<dyn Op>/Entity<dyn Value>. You can upcast from Entity<T> to Entity<dyn Trait>, and use the Rc::downcast_ref/Rc::downcast_mut methods to downcast to the concrete type, or Rc::is<T> for simple dynamic type checks. Weak references would be used for parent links, when the parent holds a strong reference to the child.

To sketch this out in more concrete terms, we'd end up with something that looks a bit like this:

pub type Entity<T> = Rc<RefCell<T>>;
pub type WeakEntity<T> = Weak<RefCell<T>>;

// The [Op] trait represents common behavior/actions over operations in the IR.
//
// NOTE: This trait definition omits some obviously useful methods to avoid
// too much clutter in this example. The key thing is that the trait must
// remain object-safe.
pub trait Op {
    // Operations with a parent must belong to a Block
    fn parent(&self) -> Option<Entity<Block>>;
    // This is needed to add/remove operations from a Block generically
    fn set_parent(&mut self, parent: Option<WeakEntity<Block>>);
    // A convenience for accessing the parent Op of the containing Block
    fn parent_op(&self) -> Option<Entity<dyn Op>> {
        let block = self.parent()?;
        Some(block.owner())
    }
    // Get access to the argument vector for this operation
    fn arguments(&self) -> &[Entity<dyn Value>];
    // Get mutable access to the argument vector for this operation
    fn arguments_mut(&mut self) -> &mut [Entity<dyn Value>];
    // Get the result defined by this operation, if it produces one
    fn result(&self) -> Option<Entity<dyn Value>>;
    fn body(&self) -> Option<Entity<Block>>;
    fn has_body(&self) -> bool;
    fn is_primitive(&self) -> bool {
        !self.has_body()
    }
    fn has_uses(&self) -> bool {
        if let Some(value) = self.result() {
            value.is_used()
        } else {
            // If an op is non-primitive, we treat it as presumptively live.
            !self.is_primitive()
        }
    }
}

// You can add convenience methods here making them available for all
// dyn Op references, whether &dyn Op, or Entity<dyn Op>, etc.
impl dyn Op {
    /// Insert this operation at the end of `block`
    pub fn insert_at_end(self: Entity<Self>, block: Entity<Block>) {
        assert!(self.parent().is_none(), "op is already attached to a block");
        {
            let mut op = self.borrow_mut();
            op.set_parent(Some(Entity::downgrade(&block)));
        }
        let mut block = block.borrow_mut();
        block.push(self);
    }
    
    /// Replaces any uses of `value` with `replacement` in the argument vector
    /// of this operation. It could be further extended to also visit any nested
    /// operations (if this op has a body).
    pub fn replace_uses_of(self: Entity<Self>, value: Entity<dyn Value>, replacement: Entity<dyn Value>) {
        let mut op = self.borrow_mut();
        for (i, arg) in op.arguments_mut().iter_mut().enumerate() {
            if Entity::ptr_eq(arg, &value) {
                let user = User {
                    user: Entity::downgrade(&self),
                    index: i,
                };
                arg.remove_user(&user);
                *arg = replacement.clone();
                arg.add_user(user);
            }
        }
    }
}

// NOTE: An example of a structured/non-primitive operation with a block
pub struct Function {
    /// The function name
    pub symbol: Ident,
    /// The function type signature
    pub signature: FunctionType,
    /// The function body
    ///
    /// NOTE: The block argument types must match the signaturea
    pub body: Entity<Block>,
}

impl Function {
    pub fn new(symbol: Ident, signature: FunctionType) -> Entity<Self> {
        Entity::new_cyclic(move |this| {
            let body = Block::new(this, &signature.arguments);
            Self {
                symbol,
                signature,
                body,
            }
        })
    }
}

impl Op for Function {
    fn parent(&self) -> Option<Entity<Block>> {
        // Functions are always top-level
        None
    }
    
    fn arguments(&self) -> &[Entity<dyn Value>] {
        // A function op never has arguments
        &[]
    }
    
    fn result(&self) -> Option<Entity<dyn Value>> {
        // A function never produces a result, only `call` does
        None
    }
    
    fn body(&self) -> Option<Entity<Block>> {
        Some(self.body.clone())
    }
    
    fn has_body(&self) -> bool {
        true
    }
}

/// The operation used to invoke [Function] operations (also [Evaluator]).
///
/// NOTE: This is an example of a primitive operation
pub struct Call {
    parent: Option<WeakEntity<Block>>,
    /// The function/evaluator to call
    ///
    /// This is dyn Op here, because both evaluators and functions are callable
    callee: Entity<dyn Op>,
    /// The values to use as arguments for the callee. These must match the
    /// callee type signature.
    ///
    /// NOTE: Ops do not own the individual arguments, but do hold strong refs
    /// to them.
    args: Vec<Entity<dyn Value>>,
    /// The result of the function/evaluator call. Evaluators produce none, 
    /// while functions always produce one.
    ///
    /// NOTE: Ops own their results
    result: Option<Entity<dyn Value>>,
}

impl Call {
    pub fn new<I>(callee: Entity<dyn Op>, args: I) -> Entity<Self>
    where
        I: IntoIterator<Item = Entity<dyn Value>>,
    {
        Entity::new_cyclic(|this| {
            let result_ty = Self::infer_result_type(&callee);
            let result = result_ty.map(|ty| OpResult::new(this, ty));
            Self {
                parent: None,
                callee,
                args: args.into_iter().collect(),
                result,
            }
        })
    }
}

impl Op for Call {
    fn parent(&self) -> Option<Entity<Block>> {
        WeakEntity::upgrade(self.parent.as_ref()?).ok()
    }
    
    fn arguments(&self) -> &[Entity<dyn Value>] {
        &self.args
    }
    
    fn result(&self) -> Option<Entity<dyn Value>> {
        self.result.clone()
    }
    
    fn body(&self) -> Option<Entity<Block>> {
        None
    }
    
    fn has_body(&self) -> bool {
        false
    }
}

/// A special type used for type checking function-like operations
///
/// It is not a member of MirType.
pub struct FunctionType {
    pub arguments: Vec<MirType>,
    pub result: MirType,
}

/// Represents a basic block with arguments
pub struct Block {
    owner: WeakEntity<dyn Op>,
    /// The parameter list of the block. Each of these values represents an
    /// SSA value definition. Operations within a Block can reference values
    /// outside the block, and thus block arguments can be elided, however that
    /// is not permitted from the body of a [Function] or [Evaluator], without
    /// first cloning the Block so that it can be inlined at a callsite.
    pub arguments: Vec<Entity<BlockArgument>>,
    pub body: Vec<Entity<dyn Op>>,
}

impl Block {
    pub fn new(owner: WeakEntity<dyn Op>, arguments: &[MirType]) -> Entity<Self> {
        Entity::new_cyclic(|this| {
            let arguments = arguments.iter().map(|arg| {
                BlockArgument::new(this.clone(), arg)
            }).collect();
            Self {
                owner,
                arguments,
                body: vec![],
            }
        })
    }
    
    /// Callers must ensure that `op` has parent set to this block, and that it
    /// was not already attached to some other block.
    pub fn push(&mut self, op: Entity<dyn Op>) {
        self.body.push(op);
    }
    
    pub fn owner(&self) -> Entity<dyn Op> {
        WeakEntity::upgrade(&self.owner).expect("stale owner reference")
    }
}

pub trait Value {
    // NOTE: A value always has an owning op, but is internally stored as Weak.
    // Here, ownership refers to the entity that defines the value, never an
    // entity that uses a value.
    fn owner(&self) -> Entity<dyn Op>;
    fn ty(&self) -> MirType;
    fn users(&self) -> &UseList;
    fn users_mut(&mut self) -> &mut UseList;
    fn is_used(&self) -> bool {
        !self.users().is_empty()
    }
    fn remove_user(&mut self, user: &User) {
        self.users_mut().remove(&user);
    }
    fn add_user(&mut self, user: User) {
        self.users_mut().insert(user);
    }
}

/// Represents the set of users of some value definition
/// 
/// This information is used to trace from uses to defs, and to determine if
/// a given value is used. An operation without side effects that has no uses
/// of its result(s), can be considered dead and stripped from the program.
#[derive(Default, Clone)]
pub struct UseList {
    users: Vec<User>,
}

impl UseList {
    /// Indicates whether the containing value has any uses
    pub fn is_empty(&self) -> bool {
        self.users.is_empty()
    }
    
    /// Remove a user from the list
    pub fn remove(&mut self, user: &User) {
        self.users.remove(user);
    }
    
    /// Add a new user to the list, if that use is not already present
    pub fn insert(&mut self, user: User) {
        if !self.users.contains(&user) {
            self.users.push(user);
        }
    }
    
    pub fn iter(&self) -> impl Iterator<Item = &User> {
        self.users.iter()
    }
}

#[derive(Clone)]
pub struct User {
    /// The using operation
    pub user: WeakEntity<dyn Op>,
    /// The index of the use in the operations' argument vector
    pub index: usize,
}

/// The value representation for block parameters. These represent distinct
/// SSA values, so that transformations within a block can be done without
/// having to know how the block is reached. These are equivalent
/// to Phi nodes in SSA literature, but in a more intuitive form.
pub struct BlockArgument {
    owner: WeakEntity<Block>,
    ty: MirType,
    users: UseList,
}

impl BlockArgument {
    pub fn new(owner: WeakEntity<Block>, ty: MirType) -> Entity<Self> {
        Entity::new(Self {
            owner,
            ty,
            users: UseList::default(),
        })
    }
}

impl Value for BlockArgument {
    fn owner(&self) -> Entity<dyn Op> {
        let block = WeakEntity::upgrade(&self.owner).expect("stale owner reference");
        block.owner()
    }
    
    fn ty(&self) -> MirType {
        self.ty.clone()
    }
    
    fn users(&self) -> &UseList {
        &self.users
    }
    
    fn users_mut(&mut self) -> &mut UseList {
        &mut self.users
    }
}

/// The value representation for operation results. Much like BlockArgument,
/// these define new SSA values.
pub struct OpResult {
    owner: WeakEntity<dyn Op>,
    ty: MirType,
    users: UseList,
}

impl OpResult {
    pub fn new(owner: WeakEntity<dyn Op>, ty: MirType) -> Entity<Self> {
        Self {
            owner,
            ty,
            users: UseList::default(),
        }
    }
}

impl Value for OpResult {
    fn owner(&self) -> Entity<dyn Op> {
        WeakEntity::upgrade(&self.owner).expect("stale owner reference")
    }
    
    fn ty(&self) -> MirType {
        self.ty.clone()
    }
    
    fn users(&self) -> &UseList {
        &self.users
    }
    
    fn users_mut(&mut self) -> &mut UseList {
        &mut self.users
    }
}

/// An example of a custom value type
pub struct TraceAccessBinding {
    owner: WeakEntity<dyn Op>,
    users: UseList,
    pub segment: Rc<TraceSegment>,
    /// The offset to the first column of the segment which is bound by this binding
    pub offset: usize,
    /// The number of columns which are bound
    pub size: usize,
}

impl Value for TraceAccessBinding {
    fn owner(&self) -> Entity<dyn Op> {
        WeakEntity::upgrade(&self.owner).expect("stale owner reference")
    }
    fn ty(&self) -> MirType {
        if size > 1 {
            MirType::Felt
        } else {
            MirType::Vector(self.size)
        }
    }
    
    fn users(&self) -> &UseList {
        &self.users
    }
    
    fn users_mut(&mut self) -> &mut UseList {
        &mut self.users
    }
}

You can then define some convenience builder APIs:

pub struct FunctionBuilder {
    function: Entity<Function>,
    block: Entity<Block>,
}

impl From<Entity<Function>> for FunctionBuilder {
    fn from(function: Entity<Function>) -> Self {
        let block = function.borrow().body.clone();
        Self {
            function,
            block,
        }
    }
}

impl FunctionBuilder {
    pub fn new(symbol: Ident, signature: FunctionType) -> Self {
        let function = Function::new(symbol, signature);
        Self::from(function)
    }
    
    pub fn get_argument(&self, index: usize) -> Entity<dyn Value> {
        let block = self.block.borrow();
        block.arguments()[index].clone()
    }
    
    pub fn ins(&mut self) -> InstBuilder<'_> {
        let owner = Entity::downgrade(&self.block);
        InstBuilder {
            owner,
            block: self.block.borrow_mut(),
        }
    }
    
    pub fn build(self) -> Entity<Function> {
        self.function
    }
}

pub struct InstBuilder<'f> {
    owner: WeakEntity<Block>,
    block: std::cell::RefMut<'f, Block>,
}

impl InstBuilder<'_> {
    /// This function returns a reference to the op because it may or may not
    /// produce a result, depending on what is being called.
    pub fn call<I>(&mut self, callee: Entity<dyn Op>, args: I) -> Entity<Call> 
    where
        I: IntoIterator<Item = Entity<dyn Value>>,
    {
        let call = Call::new(callee, args);
        self.insert_at_end(call.clone());
        call
    }
    
    /// This function returns a value reference, because it is a primitive op
    /// that always produces a single result.
    pub fn add(&mut self, lhs: Entity<dyn Value>, rhs: Entity<dyn Value>) -> Entity<dyn Value> {
        assert_eq!(lhs.ty(), rhs.ty(), "mismatched value types for binary operator");
        
        let add: Entity<dyn Op> = Add::new(lhs, rhs);
        let result = add.borrow().result().unwrap();
        self.insert_at_end(add);
        result
    }
    
    /// Block terminators need not return anything, but are expected to always
    /// be the last instruction in a block.
    pub fn ret(&mut self, result: Option<Entity<dyn Value>>) {
        self.insert_at_end(Ret::new(result));
    }
    
    fn insert_at_end(&mut self, op: Entity<dyn Op>) {
        {
            let mut op = op.borrow_mut();
            op.set_parent(Some(self.owner.clone()));
        }
        self.block.push(self);
    }
}

Those end up getting used something like this when constructing IR:

// Construct IR for a function defined as:
//
// fn adder(a: felt, b: felt) -> felt {
//     a + b
// }
let mut fb = FunctionBuilder::new("adder".into(), FunctionType::new([MirType::Felt, MirType::Felt], MirType::Felt));

let a = fb.get_argument(0);
let b = fb.get_argument(1);
let c = fb.ins().add(a, b);
fb.ins().ret(Some(c));

let adder = fb.build();

A key aspect of working with this IR structure, is that the connections between operations are represented completely in terms of SSA values, and they are never mutated, only created (defined) or replaced/destroyed. So let's say that you want to replace something like x = y * 2 with x = y + y. The actual process of doing that would play out like this:

  • Insert x' = y + y in the block after x = y * 2
  • Replace all uses of x with x'
  • Remove x = y * 2, as x is now unused, and thus the expression y * 2 is dead

I believe that will solve the issues you have encountered with handling mutation of the IR. More generally, the above structure is significantly simpler, as you need far fewer concepts (and corresponding structs/enums).

The primary awkwardness here is around the fact that all of the borrow checking is pushed to runtime, so you have to take care to limit when/how long you borrow the underlying data for an entity, to avoid mutable aliasing (and thus a panic). In practice though, this isn't too difficult with the structure I've outlined above, as adding ops to a Block doesn't require holding a mutable reference to the containing Op. It could also be made more ergonomic if we could use nightly features, but this is still fairly easy to work with.

Leo-Besancon and others added 15 commits January 29, 2025 09:26
* fix translate.rs not translating bodies

* expand fn and ev args, missing scoping

* expand all arguments in function and evaluators

* unpack vec in all cases

* lookup arguments in the access_map if not found in bindings

* stop unpacking call arguments, add Vector<Params> to bindings

* insert accessors to bound Vector<Param>

* fix Value Builder types

* insert links in enums

* added missing Child/Parent traits

* Remove fn edit(self) on builders

* Make translate_from_mir.rs compile

* fix double borrow

* Handle multiple parents

* remove some warns

* Adapt Visitor and Inlining

* Update inlining and visitor

* Fix translate of functions

* visitor with pre-scan

* Add unrolling2 + fmt

* Add comments to passes

* Add comment to translate.rs pass

* Update Inlining and Unrolling passes

Stilll some debug todo

* Cargo fmt

* Add TODOs

* Builder derive macro

* fix Builder derive macro compilation

* Builder derive macro: fix Default recursion

* Builder derive macro: fix mutability on non-link, fix transistion
derive for Function

* Builder derive macro: handle Vec<BackLink> + impl Sub

* Builder derive macro: docs & fix + test >2 required fields case

* Builder derive macro: derive on all structs

* remived unused buggy api to remoce_child

* mutability examples and potention solutions:
one bug left

* double borrow bug remaining

* fix test_mutability_wrap_enum: missing clone

* Fix doouble borrow in Link::update

* fix singleton updates in final design

* cleanup intermediary versions

* fix module name

* Translate - Split function / ev parameter translation

* Fix for_node children, don't put None selector

* Fix inlining - Needs mutability

* Cleanup unrolling

* cargo fmt

* fix warns

* Improve comments

* Improve context setting in Unrolling

* refactor checkpoint: wrap structs in Op/Root enums

* Fix let translation

* Add prints to unrolling

* Add diagnostics to translate

* swap with ir + fix Builder enumwrapper
checkpoint: does not compile

* Setup diagnostics for inlining and unrolling

* converer bug: return from local reference

* avoid double option in BackLink::to_link()

* wrap all structs in enum:
checkpoint: compiles, 81 tests pass

* revert examples

* fix missing ;

* fix compilation: 43 tests pass

* Remove warns, fix translate_from_mir.rs

* Fix Inlining

* Fix inlining after git merge inconsistency

* Have parameter reference ref_node, and better handle evaluator params / args

* Update inlining2.rs

* Only inline Params that target the ref_node we aim

* Improve diags on Evaluator args and params mismatch

* Improve diags again

* Reworked visit_if to work with selectors, kept visit_if_old

* cargo clippy

* Clippy fix

* Clippy fix and remove prints

* cargo fmt

* cargo clippy + fmt

* Remove prints

* Builder: ignore fields that start with underscore

* singleton converters

* patch link

* api for shared mutability

* transform BackLink to Link when comparing and hashing singlton enum
wrappers

* swap *obj.borrow_mut() = value to obj.set(value)

* use ptr as hashmap key to preserve 1 key per instance
restore PartialEq Invariant for HashMap key

* use pointer as HashMap keys

* Some fixes for translate + inlining

* fix BackLink fields' mutability in Builder derive macro

* cargo fmt

* fix Parameter.ref_node cyclic reference + restric to Owner + compare via get_ptr

fix Parameter.ref_node cyclic reference:
Parameter.ref_Node used to store a Link to its parent which caused
PartialEq and Hash to loop.
Replaced with a BackLink
 + restricted the field from Node -> Owner
 + expose inner get_ptr on Owner
 + comparison and hash via get_ptr

* Update translate, mod and inlining

* Fix nested evaluators

* fix Parameter PartialEq and Hash to work on disconnected identical graphs

* add debugging method to Op/Root + Link/BackLink

* filter stale node wrappers in visitor

* fix edgecase in `Link<Op>::set`

* manual pointer review checks to verify Op::set logic

* fix parameters

* Fix let translation

* Debug air-ir tests, begin codegen tests

* Set program name in Mir

* Fix mutability test

---------

Co-authored-by: Thybault Alabarbe <thybault.alabarbe@gmail.com>
@Soulthym
Copy link
Collaborator Author

Soulthym commented Feb 4, 2025

Hey @bitwalker,
I think I understand the design from a high-level perspective, and overall think it would be a very welcome upgrade to the current implementation.
I am however somewhat still confused about the following points:

Phi Nodes in Block

/// The value representation for block parameters. These represent distinct
/// SSA values, so that transformations within a block can be done without
/// having to know how the block is reached. These are equivalent
/// to Phi nodes in SSA literature, but in a more intuitive form.
pub struct BlockArgument {
    owner: WeakEntity<Block>,
    ty: MirType,
    users: UseList,
}

I'm not sure I understand the relationship with Phi nodes. I thought those reconciled Block results in SSA style IRs, as in they represent a ternary operation of both Blocks, as-in an if else statement?

Regardless, I understand their usage as replacement targets with bound values during inlining/unrolling, so it shouldn't be much of an issue in practice, I am mostly asking in case I missed other cases that would behave differently.

parent, owner, and use_list

In the current design, we use the field owner to store the operations that use an Op accessed with the Parent trait, and do not have the corresponding block stored.

Given that, am I understanding the following correctly?
Starting from the current design and ending at yours, semantically:

  • an owner becomes a user in the operation's UseList
  • a parent becomes an owner as in the parent in the Mir graph, minus the owning block (cf next)
  • parent now stores the operation's owning Block(Function, Evaluator, For, If,...) as well in all cases.

Future progressive migration

Checklist before merging

Our design is passing all existing tests correctly, but still needs some improvements.

Our plan is to do the following, roughly in this order:

  • add span fields
  • propagate spans through transformations (translation AST -> Mir and passes)
  • document public APIs

Checklist after merging (to do according to the priority of the other milestone)

Once merged upstream, we plan to:

  • simplify the boilerplate
  • refactor/simplify complex parts:
    • really use the visitor in passes. Currently passes re-implement dispatching instead of using the internal dispatcher.
    • replace the lazy part of singletons by creating them during struct instantation, simplifying their update mechanism (less cases to cover)
  • add a pass of constant propagation/folding in the IR (these causes the only differences in codegen for Winterfell):
    • remove useless nodes, such as multiplying by 1, adding 0, etc.
    • simplify exponentiation where possible. For example: 2.pow(3) turns into 2*2*2 but what we really want is 8.

Migrate in parallel

At this point we plan on working on the next steps, while migrating to your design progressively in the background.

Migration checklist:

  • add parents automatically within the builder pattern, currently not present in most places.
  • implement an alternative update mechanism, that mutates the contents of parent structs instead of primitive enums (Op and Root), to match your design's behaviour.
  • migrate shared mutability usage to the new update mechsnism
  • rename functions/methods/fields in our API to match yours where update behaviour is the same.
  • implement the Block abstraction, like it is in your design.
  • add owner references, like in your design.
  • Then, simulatenously:
    • replace our uses of primitive enums (Op and Root, those that own the inner struct) with their respective object safe dyn Traits
    • remove our uses of wrapper enums (mainly Owner and Node, those that contain a BackLink to the primitive enum) as they can now be used by dyn Traits
    • swap current remaining structs' implementation with your design.

I have updated previous checklists, marking items with strikethrough that became irrelevant with our latest modifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants