-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Support Incremental Compilation #594
Conversation
It would be possible to make dependency tracking aware of the kind of reference one item makes to another. If an item `A` mentions another item `B` only via some reference type (e.g. `&T`), then item `A` only needs to be updated if `B` is removed or `B` changes its 'sized-ness'. This is comparable to how forward declarations in C are handled. In the dependency graph this would mean that there are different kinds of edges that trigger for different kinds of changes to items. | ||
|
||
### Global Switches Influencing Codegen | ||
There are many compiler flags that change the way the generated code looks like, e.g. optimization and debuginfo levels. A simple strategy to deal with this would be to store the set of compiler flags used for building the cache and clearing the cache completely if another set of flags is used. Another option is to keep multiple caches, each for a different set of compiler flags (e.g. keeping both on disk, a 'debug build cache' and a 'release build cache'). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hash the relevant flags for the subdir name? I'd expect a lot of -C
options affect the cache, and only storing one set wouldn't help at all for some usage patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, something like that. I'd like to see how big such a cache gets.
cc @epdtry who implemented most of a very similar scheme last summer. We talked about this as incremental codegen (as opposed to proper incremental compilation). He only kept around object files, not llvm ir too. It would be great if @epdtry could link to his WIP branch and explain the concepts, etc. here. |
It should not be too hard to let the compiler keep track of which parts of the program change infrequently and then let it speculatively build object files with more than one function in them. For these aggregate object files inter-function LLVM optimizations could then be enabled, yielding faster object code at little additional cost. Other strategies for controlling cache granularity can be implemented in a similar fashion. | ||
|
||
### Parallelization | ||
If some care is taken in implementing the above concepts it should be rather easy to do translation and codegen in parallel for all items, since by design we already have (or can deterministically compute) all the information we need. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already can do codegen in parallel, although there is a bug preventing most use atm.
Please don't make the acronym |
I agree with @dhardy. As an alternative, may I propose "So Far, Incrementalism Necessitates An Exegesis"? |
@dhardy |
} | ||
``` | ||
|
||
The dependency tracking system as described above contains `node templates` for `program item` definitions on a syntactic level, that is, for each `struct`, `enum`, `type`, `trait`, there is one `node template`, for each `fn`, `static`, and `const` there are two (one for the interface, one for the body). However, as seen in the section on generics, the codebase can refer to monomorphized instances of program items that cannot be identified by a single identifier as described above. A reference like `Option<String>` is a composite of multiple `program item` IDs, a tree of program item IDs in the general case: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the subject of monomorphized identifiers: you'll probably need to do something about symbol naming for monomorphizations of functions. Right now the name includes the hash of the pointers to the Ty
s representing the type arguments (which is random, thanks to ASLR). This does fine at preventing collisions, but it means you'll need to either record the mapping of (polymorphic function, type arguments) -> (symbol name) for use in later incremental builds, or fix symbol naming to produce something consistent. I tried to do the latter, but it wound up being a little more complicated than I expected (ADT Ty
s reference the struct/enum definition by its DefId
, which is not stable) and I don't remember if I ever got it working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a problem. I'd probably try to find a more stable symbol naming scheme.
I am expanding and adapting this RFC. After some discussion with @michaelwoerister we decided to close this existing PR for the time being. |
This RFC proposes an incremental compilation strategy for
rustc
that allows for translation, codegen, and parts of static analysis to be done in an incremental fashion, without precluding the option of later expanding incrementality to parsing, macro expansion, and resolution.This RFC is purely about the architecture and implementation of the Rust compiler. It does not propose any changes to the language. I also don't expect it to be acted on any time before 1.0 is out of the door, but I wanted to get this out into the open, so that it can discussed as part of the RustC Architecture Improvement Initiative (that's right, RAII) that I invented just now and that will begin to discuss how the Rust compiler can get as good as possible once the language has become a more stable target.
Rendered