Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Support Incremental Compilation #594

Closed
wants to merge 1 commit into from

Conversation

michaelwoerister
Copy link
Member

This RFC proposes an incremental compilation strategy for rustc that allows for translation, codegen, and parts of static analysis to be done in an incremental fashion, without precluding the option of later expanding incrementality to parsing, macro expansion, and resolution.

This RFC is purely about the architecture and implementation of the Rust compiler. It does not propose any changes to the language. I also don't expect it to be acted on any time before 1.0 is out of the door, but I wanted to get this out into the open, so that it can discussed as part of the RustC Architecture Improvement Initiative (that's right, RAII) that I invented just now and that will begin to discuss how the Rust compiler can get as good as possible once the language has become a more stable target.

Rendered

It would be possible to make dependency tracking aware of the kind of reference one item makes to another. If an item `A` mentions another item `B` only via some reference type (e.g. `&T`), then item `A` only needs to be updated if `B` is removed or `B` changes its 'sized-ness'. This is comparable to how forward declarations in C are handled. In the dependency graph this would mean that there are different kinds of edges that trigger for different kinds of changes to items.

### Global Switches Influencing Codegen
There are many compiler flags that change the way the generated code looks like, e.g. optimization and debuginfo levels. A simple strategy to deal with this would be to store the set of compiler flags used for building the cache and clearing the cache completely if another set of flags is used. Another option is to keep multiple caches, each for a different set of compiler flags (e.g. keeping both on disk, a 'debug build cache' and a 'release build cache').
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash the relevant flags for the subdir name? I'd expect a lot of -C options affect the cache, and only storing one set wouldn't help at all for some usage patterns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, something like that. I'd like to see how big such a cache gets.

@nrc
Copy link
Member

nrc commented Jan 18, 2015

cc @epdtry who implemented most of a very similar scheme last summer.

We talked about this as incremental codegen (as opposed to proper incremental compilation). He only kept around object files, not llvm ir too.

It would be great if @epdtry could link to his WIP branch and explain the concepts, etc. here.

It should not be too hard to let the compiler keep track of which parts of the program change infrequently and then let it speculatively build object files with more than one function in them. For these aggregate object files inter-function LLVM optimizations could then be enabled, yielding faster object code at little additional cost. Other strategies for controlling cache granularity can be implemented in a similar fashion.

### Parallelization
If some care is taken in implementing the above concepts it should be rather easy to do translation and codegen in parallel for all items, since by design we already have (or can deterministically compute) all the information we need.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already can do codegen in parallel, although there is a bug preventing most use atm.

@dhardy
Copy link
Contributor

dhardy commented Jan 19, 2015

Please don't make the acronym RAII more confusing by adding a competing definition. Arguably Resource Allocation Is Initialisation is the wrong name anyway, but that's no reason to make the term more difficult to explain to new-comers.

@bstrie
Copy link
Contributor

bstrie commented Jan 19, 2015

I agree with @dhardy. As an alternative, may I propose "So Far, Incrementalism Necessitates An Exegesis"?

@michaelwoerister
Copy link
Member Author

@dhardy Sorry for the confusion :) That suggestion was not meant entirely serious.

@bstrie I am in awe. Now do TANSTAAFL !

@Diggsey
Copy link
Contributor

Diggsey commented Jan 20, 2015

@dhardy
You're right that is the wrong name :P, it's "Resource Acquisition Is Initialisation"
(also we have the same first name...)

@nrc nrc self-assigned this Jan 22, 2015
}
```

The dependency tracking system as described above contains `node templates` for `program item` definitions on a syntactic level, that is, for each `struct`, `enum`, `type`, `trait`, there is one `node template`, for each `fn`, `static`, and `const` there are two (one for the interface, one for the body). However, as seen in the section on generics, the codebase can refer to monomorphized instances of program items that cannot be identified by a single identifier as described above. A reference like `Option<String>` is a composite of multiple `program item` IDs, a tree of program item IDs in the general case:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the subject of monomorphized identifiers: you'll probably need to do something about symbol naming for monomorphizations of functions. Right now the name includes the hash of the pointers to the Tys representing the type arguments (which is random, thanks to ASLR). This does fine at preventing collisions, but it means you'll need to either record the mapping of (polymorphic function, type arguments) -> (symbol name) for use in later incremental builds, or fix symbol naming to produce something consistent. I tried to do the latter, but it wound up being a little more complicated than I expected (ADT Tys reference the struct/enum definition by its DefId, which is not stable) and I don't remember if I ever got it working.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a problem. I'd probably try to find a more stable symbol naming scheme.

@nrc nrc added the T-compiler Relevant to the compiler team, which will review and decide on the RFC. label May 15, 2015
@nikomatsakis
Copy link
Contributor

I am expanding and adapting this RFC. After some discussion with @michaelwoerister we decided to close this existing PR for the time being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-compiler Relevant to the compiler team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.