-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental compilation RFC #1298
Incremental compilation RFC #1298
Conversation
Awesome! This is about incremental builds for a single-crate, right? If so, it's worth calling that out. Also, if I'm correct, these caches are not meant to be shared across build machines, right? |
@nikomatsakis The summary talks about debug builds specifically, but IIRC we discussed how this would apply to release builds as well? (I.e., a story a bit like parallel codegen units, where you'd be trading incrementality against optimization potential due to passing LLVM smaller units of code) |
@larsbergstrom actually, I believe incremental builds across crates can be done relatively easily, though I didn't discuss it. I will add a TODO item to summarize how that would work. @aturon yes I updated the summary, my mistake. |
@larsbergstrom added a brief note about cross-crate dependencies |
That is correct. |
How does member function name ( |
That "something" is the IR tables that indicate what traits are in scope at a given point, as well as those that collect all the impls for a trait (I did not add an exhaustive listing to the RFC). Those will presumably be linked up something like the following:
That is roughly the idea. Make sense? |
Yes, that makes sense; thanks. |
strategies can be used to enable lazy or parallel compilation at later | ||
times. (Eventually, though, it might be nice to restructure the | ||
compiler so that it operates in more of a demand driven style, rather | ||
than a series of sweeping passes.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind elaborating on what "demand driven style" entails and how it differs from our current approach?
As an example of what I think is incremental compilation done right, see MSVC. Not only does it have an incremental compilation + linking mode that works fairly well, but it also has an incremental LTCG mode where it does full link time optimization, just incrementally. |
impl Type { // Path: <root>::foo::<impl1> | ||
fn bar() {..} // Path: <root>::foo::<impl1>::bar | ||
} | ||
impl Type { } // Path: <root>::foo::<impl2> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you don't indicate that every path has a unique integer, this seems to imply that you'd have to know if there are any duplicate children before you start naming, or have some amount of mutability to go back and "fix" the first child when you see the second child.
Is there a possibility to simply leave the first one as <impl>
and then mark the second one as <impl2>
?
As I understand it, a large benefit of incremental compilation is speed, but there's no mention of tests that attempt to quantify or ensure that the new world order will be faster. Is there anything more beyond |
- Object files | ||
- This represents the final result of running LLVM. It may be that | ||
the best strategy is to "cache" compiled code in the form of an | ||
rlib that is progessively patched, or it may be easier to store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"progessively" → "progressively"
Not a very helpful comment, but: 👍👍👍👍👍 |
By demand-driven style, what I meant was that we would build a dependency graph that we use to drive compilation. So, for example, we would begin by saying "we need to trans the |
In the actual implementation, every path element also has a disambiguating integer. This begins as zero, but when we create a new def-id, we check if the parent already has a child with that name and, if so, increment the disambiguating integer as many times as we have to until we get a unique name. I can tweak the RFC to reflect the impl more precisely. |
I don't know what you mean here, actually. Do you mean that if we inlined, then the graph would be wrong? Because that is not the case: this graph refers to the front end's view of things, which is before inlining etc will take place. When we actually do codegen, if |
Hear ye, hear ye. This RFC is now entering final comment period. |
I think this RFC is good to go. Conceptually it seems sound to me and it contains enough of a concrete outline to start implementing. |
I get that as a mere rust user that has never contributed to rustc, I'm basically pontificating on these design decisions that don't affect any public interface. But might somebody comment on whether the alternative of building the dependency graph explicitly and then processing it (lazily or otherwise) as I wrote earlier was considered? |
Sorry, I meant to reply to your comment earlier. I did consider that design On Wed, Oct 21, 2015 at 1:33 PM, John Ericson notifications@github.com
|
@nikomatsakis Thank you, that is very reassuring. I absolutely agree on the soundness issue; in fact I'd say without refactoring to make sure the graph traversals are correct by construction, there's hardly any point in taking my route. Sounds like your view is the implicit dependency route is a good way to accurately catch all dependencies without forcing the big refactor, but explicit dependencies is a decent end goal? |
I think there will always be some of both. Some dependencies at least On Wed, Oct 21, 2015 at 7:33 PM, John Ericson notifications@github.com
|
Ah. I envisioned stuff like that working by the traversal of one graph creating another. |
To clarify. Suppose we have something like token tree -(macros)-> collection of items -(type-checking and method resolution...)-> collection of MIR -(llvm)-> collection of bitcode. To really do laziness right with this, not only would the graphs be traversed lazily, but also created lazily. The MIR for each function would be bundled with a thunk to generate the MIR for all referenced functions. [For any Nix users out there (cough @eddyb cough) this is related to doing things like |
Finally, I mentioned earlier I'd love to right some generic library to persist/cache all that. To make that a bit more concrete I was thinking of something like https://github.com/dmbarbour/haskell-vcache or https://github.com/mirage/irmin along with some infrastructure to serialize thunks. |
This looks great to me. The greatest challenge is going to be building a dependency graph that is as precise as possible (to get maximum benefit) without introducing unsoundness. I don't see any silver bullets here; just "be really careful" and "test a lot". There may be an interesting class of source code changes affecting lifetime or variance inference where typechecking artifacts are invalidated, but it is theoretically possible to avoid invalidating trans artifacts since lifetimes are erased by then. I haven't thought of any concrete examples that would be worth exploiting, however. |
One thing that is not mentioned in the RFC at all yet is monomorphization and the consequences it has.
One complication I can see here is that we can only know after type-checking which monomorphizations are still used, but the proposed algorithm already wants to garbage-collect the on-disk cache right after building the HIR. This has to be accounted for somehow. |
True. We don't actually know what monomorphizations we want until trans. I've also been thinking about what it would take to do an early target that
On Fri, Oct 23, 2015 at 10:20 AM, Michael Woerister <
|
We already save the type-checked body of monomorphizable fns. |
Yes, but what we are mostly talking about is preserving the monomorphized Well, I guess I was saying that for a first draft, it might not be worth On Fri, Oct 23, 2015 at 1:18 PM, arielb1 notifications@github.com wrote:
|
Maybe convert all translation to use inlining and save the serialized data (we would also need to have some way of stably comparing it for this to work). Using serialized MIR instead of serialized AST may make this easier, but I feel like the issues are orthogonal. |
@arielb1 I'm not clear on what problem you are proposing to solve here? (I On Fri, Oct 23, 2015 at 4:16 PM, arielb1 notifications@github.com wrote:
|
Isn't it proposed anyway that the complete set of items hashed on every compilation? |
## Basic usage | ||
|
||
The basic usage will be that one enables incremental compilation using | ||
a compiler flag like `-C incremental-compilation=TMPDIR`. The `TMPDIR` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you expect that Cargo will pass this flag by default for all projects?
Huzzah! The compiler team has decided to accept this RFC. The expectation is that the actual impl will discover numerous surprises (we've already found a few) that require adjustments, and that we will come back and update the RFC to be more inline with the final design when that has shaken out a bit. |
There's lots of interesting talk about incremental computation in this thread, which is great! In case anyone was wondering about PL research literature on this topic, these researchers have also been thinking about incremental, demand-driven compilation / computation:
The first paper is more recent, and specialized to a situation similar to the one described in the discussion above (incremental compilation, using demand-driven, dynamic dependency graphs). The second paper gives a general approach for such incremental, demand-driven computations. There is follow-on work on adapton.org. |
Any update on the state of implementation? |
@White-Oak Creation of a dependency graph is being done in rust-lang/rust#30532 |
High-level strategy for incremental compilation.
cc @rust-lang/compiler
Rendered