Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a HIR to the compiler #1191

Merged
merged 1 commit into from
Jul 24, 2015
Merged

Add a HIR to the compiler #1191

merged 1 commit into from
Jul 24, 2015

Conversation

nrc
Copy link
Member

@nrc nrc commented Jul 6, 2015

Add a high-level intermediate representation (HIR) to the compiler. This is
basically a new (and additional) AST more suited for use by the compiler.

This is purely an implementation detail of the compiler. It has no effect on the
language.

Note that adding a HIR does not preclude adding a MIR or LIR in the future.

Add a high-level intermediate representation (HIR) to the compiler. This is
basically a new (and additional) AST more suited for use by the compiler.

This is purely an implementation detail of the compiler. It has no effect on the
language.

Note that adding a HIR does not preclude adding a MIR or LIR in the future.
@nrc nrc added the T-compiler Relevant to the compiler team, which will review and decide on the RFC. label Jul 6, 2015
@nrc nrc self-assigned this Jul 6, 2015
@eddyb
Copy link
Member

eddyb commented Jul 6, 2015

I am not so keen on duplicating the AST, seems like a maintenance burden. Maybe if the current code is made more composable, somehow (there's a lot of it in syntax::fold and syntax::visit alone).

But I do agree that moving towards something other than the AST in the compiler is a good thing.
If nothing else, to get rid of all those cases handling ExprParen (and sometimes being wrong).

There's also the node IDs which don't really exist for libsyntax, only the compiler assigns and inspects them.
That would suggest a simpler AST, maybe more uses of Spanned.

Thinking more about it, it's possible the HIR could throw away most structure and keep expression, pattern and type trees, everything else already having a home in the compiler (in various tables), more or less.
Or at least flatten everything to a set of items, with less recursion and visit/fold logic.

language constructs to the lowering step. Further in the future, the HIR should
get more abstract and compact, and the AST should get closer to the surface
syntax.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RFC needs a kind of road-map of steps that need to be done (and each step should get an issue). It's too general right now.

  1. copy the AST to HIR after all AST transforms have been applied and have all other passes operate on the HIR
  2. remove some ast transforms
    • remove if let ast transform and apply it during the AST -> HIR step
    • remove while let ast transform and apply it during the AST -> HIR step
    • remove for ast transform and apply it during the AST -> HIR step
    • cfg-attribute application step?
    • erase all unresolved names from HIR and resolve names in the AST -> HIR step
    • elide lifetimes in the AST -> HIR step and erase unresolved lifetimes from the HIR
  3. things that can be moved from tables to HIR
    • i have no clue ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generalness is kind of deliberate. There's a lot we could do, what we actually do is down to prioritisation. Its also not clear how some of those steps tie into proposed work on reforming name resolution and syntax extensions. I hope this RFC spells out the first step and motivates it. Major stuff can get more RFCs (or some other kind of discussion) later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that makes sense. Maybe that statement belongs into the RFC itself? :)

@jroesch
Copy link
Member

jroesch commented Jul 6, 2015

I think this is a very good idea. Most other successful compilers have at least one if not multiple successive lowerings ideally making the job of the following phases less complex. I was just talking to @nikomatsakis about this today in relation to implementing default type parameter fallback, it seems like we have a lot of relatively similar code scattered around the middle of the compiler because we are operating on the full language instead of a simplified restricted core. One advantage that comes to mind is the ability to type check functions and methods in a more uniform fashion.

@arielb1
Copy link
Contributor

arielb1 commented Jul 6, 2015

+1. This would be done at the "indexing ast" stage of course (after for & if let are expanded). We could handle Span more sanely then. I think this is more important for the value-level language and possibly the type-level language - the item-level (e.g. struct vs. enum) is mostly handled from tables anyway.

@eddyb

The AST is rather static these days, and breaking refactorings would be caught pretty quickly by the type-system.

@torpak
Copy link

torpak commented Jul 7, 2015

How about using this one as inspiration, it has some quite useful properties:
http://compilers.cs.uni-saarland.de/papers/lkh15_cgo.pdf

@eddyb
Copy link
Member

eddyb commented Jul 7, 2015

@arielb1 I'm not worried about accidental breakage as much as having to modify several thousand lines. But seeing how HIR can be flattened and reduced to essentials, it's less of an issue.

Speaking of which, the current ast_map arena-allocates "roots", like the crate and inlined items.
We could just arena-allocate everything and have the HIR more like middle::Ty, in that all links to other nodes are &'tcx T.

I feel pretty good about flattening: resolve takes care of scoping, and we do preserve parent IDs already.
The only nested items that are not trivial to hoist out are closures - but we can have "closure items", more like a struct+impl pair, if not exactly that.

ast::Path desugaring is messy and implicit right now, but with HIR, it could be done by resolve:

  • paths to top-level items and locals would resolve to an ID and a single set of type/lifetime parameters (where applicable)
  • paths to trait-associated items would resolve to an ID, two sets of type/lifetime parameters (one for the trait and one for the associated items) and a Self type
    • <T as Trait<A, B>>::method::<X, Y> => Trait::method/[A, B], Self = T, [X, Y]
    • Trait::method::<X, Y> => Trait::method/[_, _], Self = _, [X, Y]
  • non-qualified associated paths would use a different node, holding their base type and the associated item name
    • T::method => <T>::method
  • partially resolved paths would desugar to multiple nodes:
    • Vec::<T>::Iter::Item::method => <<<Vec<T>>::Iter>::Item>::method
  • the equivalent of TyPath might need a way to specify "no explicit type parameters":
    • HashMap::insert => <HashMap<..>>::push
  • alternatively, since all type paths that allow type parameters are concrete, resolve could use the number of type parameters without defaults:
    • HashMap::insert => <HashMap<_, _>>::push

Another table-based resolve contortion is the "trait map" - each method call and non-qualified associated path is associated with a vector of traits in scope that contains an associated item with that name.
With HIR, every block could have a reference to an interned "scope chain" which holds all the traits (and perhaps all types, for picking shorter type paths in error reporting).

@arielb1
Copy link
Contributor

arielb1 commented Jul 7, 2015

@eddyb

Isn't that

struct TraitItemRef(Name, ParamList); // ::Foo::<x_1,x_2>:: *item::<y_1,y_2>*
struct TraitRef(Ty, DefId, ParamList); // <T as Trait<..>> (or equiv. T: Trait<..>)

enum Path_ {
    Static(Def, ParamList),                    // ::x::y::z::Foo::<x_1,x_2>
    TypeItem(Ty, TraitItemRef),                // <T>::Foo::<x_1,x_2> (or sugared T::Foo::<x_1,x_2>)
    TraitItem(TraitRef, TraitItemRef)          // <T as Trait>::Foo::<x_1,x_2>
                                               // Trait::<x_1,x_2>::item is desugared as
                                               // <_ as Trait<x_1,x_2>>::item
}

Path resolution should take place within astconv, giving a (ty::Ty, ty::TraitOrImplDef) or ty::TraitRef for the HIR's hir::Ty or hir::TraitRef.

@nikomatsakis
Copy link
Contributor

I'm in favor of this plan. I think a lot of the interesting questions are in the details, of course, and in particular I think we have to be very careful about what we stabilize.

Here are some thoughts:

  1. If we stabilize anything, it should be done through a crates.io crate, so that it can be versioned independently. It's an interesting design question to see how we can design the parse-tree API to be maximally forwards compaible as the language is extended -- but in the short term, you should be able to at least freeze yourself to a particular version of the crate.
  2. I'm not sure if doing full name resolution before translating to HIR makes sense, though it may. Basically (as you and I have talked about from time to time, @nrc, but I don't know that we've said publicly), I think we ought to refactor resolve to NOT do path resolution, but instead produce name resolution tables, and move the resolution of paths into type checking, when it can bring the full trait infrastructure to bear as needed. At least in the short term, this means that we'd be resolving paths after the translation to HIR (however, these name tables that I am talking about may well be part of the HIR itself!).
  3. Now, particularly if we move in the direction of the HIR being primarily concerned with function and constant bodies (which makes sense to me), and move the "metastructure" out of the "parse tree" and into global tables, it may well be that we can build up sufficient metastructure first to run name resolution in full before we actually construct the HIR itself for type-checking.
  4. In any case, the killer point for me is basically that we already copy the AST numerous times as part of macro expansion and -- if memory serves -- we even do a final copy where we assign ids. So if we replace that final copy with one that produces a HIR, and simultaneously do some of those "builtin" translations like removing for loops or what have you, I think we'll not pay much of an execution or memory price. We may even save memory since we can dump the parse tree which will probably be bigger.

@arielb1
Copy link
Contributor

arielb1 commented Jul 8, 2015

@nikomatsakis

Which part of resolve are you planning to route through the trait-system? I would prefer to at least handle locals in the HIR.

@nikomatsakis
Copy link
Contributor

Hear ye, hear ye. This RFC is now entering final comment period.

@nikomatsakis
Copy link
Contributor

@arielb1 what I was planning to do was to have resolve primarily concerned with expanding out the set of names mapped by use statements and so forth. Then when we encounter a path later, we can simply walk down and resolve it during type-checking, drawing on these tables. At that time we'll also have the information we need to complete UFCS resolution as well.

@arielb1
Copy link
Contributor

arielb1 commented Jul 10, 2015

@nikomatsakis

I would prefer to remove all anonymous modules etc. in the HIR, keeping only the trait-import tables (as an ItemMap<Interned<Vec<DefId>>> - I would like to get ItemMap/ItemId into the compiler anyway.)

@michaelwoerister
Copy link
Member

+1
I'm for giving this a try and see how it works out. I think it's a step in the right direction and I'm not too concerned about performance implications. I'm also sympathetic to @eddyb's idea of making this a rather flat set of items.

@bkoropoff
Copy link

I'm generally in favor of this. I'd also like to see a mid-level IR that would permit more sophisticated desugaring/lowering where type information is needed, e.g. converting closures into structs.

@nikomatsakis
Copy link
Contributor

Huzzah! The compiler subteam has decided to accept this RFC.

@nikomatsakis nikomatsakis merged commit f03c50b into rust-lang:master Jul 24, 2015
nikomatsakis added a commit that referenced this pull request Jul 24, 2015
@Centril Centril added the A-IR Proposals relating to intermediate representations. label Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-IR Proposals relating to intermediate representations. T-compiler Relevant to the compiler team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants