Add a HIR to the compiler #1191

nrc · 2015-07-06T04:27:15Z

Add a high-level intermediate representation (HIR) to the compiler. This is
basically a new (and additional) AST more suited for use by the compiler.

This is purely an implementation detail of the compiler. It has no effect on the
language.

Note that adding a HIR does not preclude adding a MIR or LIR in the future.

Add a high-level intermediate representation (HIR) to the compiler. This is basically a new (and additional) AST more suited for use by the compiler. This is purely an implementation detail of the compiler. It has no effect on the language. Note that adding a HIR does not preclude adding a MIR or LIR in the future.

eddyb · 2015-07-06T05:02:45Z

I am not so keen on duplicating the AST, seems like a maintenance burden. Maybe if the current code is made more composable, somehow (there's a lot of it in syntax::fold and syntax::visit alone).

But I do agree that moving towards something other than the AST in the compiler is a good thing.
If nothing else, to get rid of all those cases handling ExprParen (and sometimes being wrong).

There's also the node IDs which don't really exist for libsyntax, only the compiler assigns and inspects them.
That would suggest a simpler AST, maybe more uses of Spanned.

Thinking more about it, it's possible the HIR could throw away most structure and keep expression, pattern and type trees, everything else already having a home in the compiler (in various tables), more or less.
Or at least flatten everything to a set of items, with less recursion and visit/fold logic.

oli-obk · 2015-07-06T06:55:50Z

text/0000-hir.md

+language constructs to the lowering step. Further in the future, the HIR should
+get more abstract and compact, and the AST should get closer to the surface
+syntax.
+


This RFC needs a kind of road-map of steps that need to be done (and each step should get an issue). It's too general right now.

copy the AST to HIR after all AST transforms have been applied and have all other passes operate on the HIR

remove some ast transforms

remove if let ast transform and apply it during the AST -> HIR step

remove while let ast transform and apply it during the AST -> HIR step

remove for ast transform and apply it during the AST -> HIR step

cfg-attribute application step?

erase all unresolved names from HIR and resolve names in the AST -> HIR step

elide lifetimes in the AST -> HIR step and erase unresolved lifetimes from the HIR

things that can be moved from tables to HIR

i have no clue ;)

The generalness is kind of deliberate. There's a lot we could do, what we actually do is down to prioritisation. Its also not clear how some of those steps tie into proposed work on reforming name resolution and syntax extensions. I hope this RFC spells out the first step and motivates it. Major stuff can get more RFCs (or some other kind of discussion) later.

Ah that makes sense. Maybe that statement belongs into the RFC itself? :)

jroesch · 2015-07-06T19:57:14Z

I think this is a very good idea. Most other successful compilers have at least one if not multiple successive lowerings ideally making the job of the following phases less complex. I was just talking to @nikomatsakis about this today in relation to implementing default type parameter fallback, it seems like we have a lot of relatively similar code scattered around the middle of the compiler because we are operating on the full language instead of a simplified restricted core. One advantage that comes to mind is the ability to type check functions and methods in a more uniform fashion.

arielb1 · 2015-07-06T21:39:06Z

+1. This would be done at the "indexing ast" stage of course (after for & if let are expanded). We could handle Span more sanely then. I think this is more important for the value-level language and possibly the type-level language - the item-level (e.g. struct vs. enum) is mostly handled from tables anyway.

@eddyb

The AST is rather static these days, and breaking refactorings would be caught pretty quickly by the type-system.

torpak · 2015-07-07T05:13:04Z

How about using this one as inspiration, it has some quite useful properties:
http://compilers.cs.uni-saarland.de/papers/lkh15_cgo.pdf

eddyb · 2015-07-07T08:15:31Z

@arielb1 I'm not worried about accidental breakage as much as having to modify several thousand lines. But seeing how HIR can be flattened and reduced to essentials, it's less of an issue.

Speaking of which, the current ast_map arena-allocates "roots", like the crate and inlined items.
We could just arena-allocate everything and have the HIR more like middle::Ty, in that all links to other nodes are &'tcx T.

I feel pretty good about flattening: resolve takes care of scoping, and we do preserve parent IDs already.
The only nested items that are not trivial to hoist out are closures - but we can have "closure items", more like a struct+impl pair, if not exactly that.

ast::Path desugaring is messy and implicit right now, but with HIR, it could be done by resolve:

paths to top-level items and locals would resolve to an ID and a single set of type/lifetime parameters (where applicable)
paths to trait-associated items would resolve to an ID, two sets of type/lifetime parameters (one for the trait and one for the associated items) and a Self type
- <T as Trait<A, B>>::method::<X, Y> => Trait::method/[A, B], Self = T, [X, Y]
- Trait::method::<X, Y> => Trait::method/[_, _], Self = _, [X, Y]
non-qualified associated paths would use a different node, holding their base type and the associated item name
- T::method => <T>::method
partially resolved paths would desugar to multiple nodes:
- Vec::<T>::Iter::Item::method => <<<Vec<T>>::Iter>::Item>::method
the equivalent of TyPath might need a way to specify "no explicit type parameters":
- HashMap::insert => <HashMap<..>>::push
alternatively, since all type paths that allow type parameters are concrete, resolve could use the number of type parameters without defaults:
- HashMap::insert => <HashMap<_, _>>::push

Another table-based resolve contortion is the "trait map" - each method call and non-qualified associated path is associated with a vector of traits in scope that contains an associated item with that name.
With HIR, every block could have a reference to an interned "scope chain" which holds all the traits (and perhaps all types, for picking shorter type paths in error reporting).

arielb1 · 2015-07-07T09:17:46Z

@eddyb

Isn't that

struct TraitItemRef(Name, ParamList); // ::Foo::<x_1,x_2>:: *item::<y_1,y_2>*
struct TraitRef(Ty, DefId, ParamList); // <T as Trait<..>> (or equiv. T: Trait<..>)

enum Path_ {
    Static(Def, ParamList),                    // ::x::y::z::Foo::<x_1,x_2>
    TypeItem(Ty, TraitItemRef),                // <T>::Foo::<x_1,x_2> (or sugared T::Foo::<x_1,x_2>)
    TraitItem(TraitRef, TraitItemRef)          // <T as Trait>::Foo::<x_1,x_2>
                                               // Trait::<x_1,x_2>::item is desugared as
                                               // <_ as Trait<x_1,x_2>>::item
}

Path resolution should take place within astconv, giving a (ty::Ty, ty::TraitOrImplDef) or ty::TraitRef for the HIR's hir::Ty or hir::TraitRef.

nikomatsakis · 2015-07-08T15:05:39Z

I'm in favor of this plan. I think a lot of the interesting questions are in the details, of course, and in particular I think we have to be very careful about what we stabilize.

Here are some thoughts:

If we stabilize anything, it should be done through a crates.io crate, so that it can be versioned independently. It's an interesting design question to see how we can design the parse-tree API to be maximally forwards compaible as the language is extended -- but in the short term, you should be able to at least freeze yourself to a particular version of the crate.
I'm not sure if doing full name resolution before translating to HIR makes sense, though it may. Basically (as you and I have talked about from time to time, @nrc, but I don't know that we've said publicly), I think we ought to refactor resolve to NOT do path resolution, but instead produce name resolution tables, and move the resolution of paths into type checking, when it can bring the full trait infrastructure to bear as needed. At least in the short term, this means that we'd be resolving paths after the translation to HIR (however, these name tables that I am talking about may well be part of the HIR itself!).
Now, particularly if we move in the direction of the HIR being primarily concerned with function and constant bodies (which makes sense to me), and move the "metastructure" out of the "parse tree" and into global tables, it may well be that we can build up sufficient metastructure first to run name resolution in full before we actually construct the HIR itself for type-checking.
In any case, the killer point for me is basically that we already copy the AST numerous times as part of macro expansion and -- if memory serves -- we even do a final copy where we assign ids. So if we replace that final copy with one that produces a HIR, and simultaneously do some of those "builtin" translations like removing for loops or what have you, I think we'll not pay much of an execution or memory price. We may even save memory since we can dump the parse tree which will probably be bigger.

arielb1 · 2015-07-08T17:31:40Z

@nikomatsakis

Which part of resolve are you planning to route through the trait-system? I would prefer to at least handle locals in the HIR.

nikomatsakis · 2015-07-09T21:05:16Z

Hear ye, hear ye. This RFC is now entering final comment period.

nikomatsakis · 2015-07-09T21:07:49Z

@arielb1 what I was planning to do was to have resolve primarily concerned with expanding out the set of names mapped by use statements and so forth. Then when we encounter a path later, we can simply walk down and resolve it during type-checking, drawing on these tables. At that time we'll also have the information we need to complete UFCS resolution as well.

arielb1 · 2015-07-10T11:25:39Z

@nikomatsakis

I would prefer to remove all anonymous modules etc. in the HIR, keeping only the trait-import tables (as an ItemMap<Interned<Vec<DefId>>> - I would like to get ItemMap/ItemId into the compiler anyway.)

michaelwoerister · 2015-07-12T18:27:48Z

+1
I'm for giving this a try and see how it works out. I think it's a step in the right direction and I'm not too concerned about performance implications. I'm also sympathetic to @eddyb's idea of making this a rather flat set of items.

bkoropoff · 2015-07-14T02:44:02Z

I'm generally in favor of this. I'd also like to see a mid-level IR that would permit more sophisticated desugaring/lowering where type information is needed, e.g. converting closures into structs.

nikomatsakis · 2015-07-24T19:01:14Z

Huzzah! The compiler subteam has decided to accept this RFC.

nrc added the T-compiler Relevant to the compiler team, which will review and decide on the RFC. label Jul 6, 2015

nrc self-assigned this Jul 6, 2015

oli-obk reviewed Jul 6, 2015
View reviewed changes

nikomatsakis merged commit f03c50b into rust-lang:master Jul 24, 2015

nikomatsakis added a commit that referenced this pull request Jul 24, 2015

RFC #1191 is HIR

b6e22b9

RalfJung mentioned this pull request Aug 5, 2015

[RFC] Introduce a mid-level IR (MIR) in the compiler that will drive borrowck, trans #1211

Merged

Centril added the A-IR Proposals relating to intermediate representations. label Nov 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a HIR to the compiler #1191

Add a HIR to the compiler #1191

nrc commented Jul 6, 2015

eddyb commented Jul 6, 2015

oli-obk Jul 6, 2015

nrc Jul 6, 2015

oli-obk Jul 6, 2015

jroesch commented Jul 6, 2015

arielb1 commented Jul 6, 2015

torpak commented Jul 7, 2015

eddyb commented Jul 7, 2015

arielb1 commented Jul 7, 2015

nikomatsakis commented Jul 8, 2015

arielb1 commented Jul 8, 2015

nikomatsakis commented Jul 9, 2015

nikomatsakis commented Jul 9, 2015

arielb1 commented Jul 10, 2015

michaelwoerister commented Jul 12, 2015

bkoropoff commented Jul 14, 2015

nikomatsakis commented Jul 24, 2015

Add a HIR to the compiler #1191

Add a HIR to the compiler #1191

Conversation

nrc commented Jul 6, 2015

eddyb commented Jul 6, 2015

oli-obk Jul 6, 2015

Choose a reason for hiding this comment

nrc Jul 6, 2015

Choose a reason for hiding this comment

oli-obk Jul 6, 2015

Choose a reason for hiding this comment

jroesch commented Jul 6, 2015

arielb1 commented Jul 6, 2015

torpak commented Jul 7, 2015

eddyb commented Jul 7, 2015

arielb1 commented Jul 7, 2015

nikomatsakis commented Jul 8, 2015

arielb1 commented Jul 8, 2015

nikomatsakis commented Jul 9, 2015

nikomatsakis commented Jul 9, 2015

arielb1 commented Jul 10, 2015

michaelwoerister commented Jul 12, 2015

bkoropoff commented Jul 14, 2015

nikomatsakis commented Jul 24, 2015