Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MIR] Initial implementation of inlining #36593

Closed
wants to merge 12 commits into from

Conversation

Aatch
Copy link
Contributor

@Aatch Aatch commented Sep 20, 2016

Implements basic inlining for MIR functions.

Inlining is currently only done if mir-opt-level is set to 2.

Does not handle debuginfo completely correctly at the moment.

@rust-highfive
Copy link
Collaborator

r? @arielb1

(rust_highfive has picked a reviewer for you, use r? to override)

@Aatch
Copy link
Contributor Author

Aatch commented Sep 20, 2016

/cc @pcwalton

@Aatch
Copy link
Contributor Author

Aatch commented Sep 20, 2016

I'm currently changing the spans in the inlined MIR to the span of the callsite, which isn't ideal. However, keeping the spans from the original function triggers a multibyte-related assertion.

@michaelwoerister any ideas about what might cause that?

@petrochenkov
Copy link
Contributor

Where can I read about motivation for this? Why does MIR inlining make sense in presence on LLVM inlining?
(I try to follow MIR development, but PRs like this just appear out of nowhere without any context.)

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Sep 20, 2016

@petrochenkov From what I've been told, the main motivation for any optimization on MIR is to improve generic code before it's copied all over the place as part of monomorphization (similar reasoning applies to #[inline] functions I suppose), so that LLVM doesn't need to perform inlining and other optimizations N times when they could be done once on MIR.

(I asked on IRC because, like you, I couldn't find a document describing this. There might not be one.)

@Aatch
Copy link
Contributor Author

Aatch commented Sep 20, 2016

One major motivation is that it improves compile times. Not by a huge amount, but I saw about a 10% improvement in codegen+LLVM pass times when compiling libstd.


impl fmt::Debug for Location {
fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result {
write!(fmt, "{:?}[{}]", self.block, self.statement_index)
}
}

impl<'tcx> TypeFoldable<'tcx> for Mir<'tcx> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these go to mir/tcx?

let terminator = caller_mir[callsite.bb].terminator.take().unwrap();
match terminator.kind {
TerminatorKind::Call {
func: _, args, destination: Some(destination), cleanup } => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: weird style

let mut first_block = true;
let mut cost = 0;

for blk in callee_mir.basic_blocks() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a MIR visitor instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The early returns makes the direct way a bit easier. It could be a visitor, but I'd rather land this and then refactor once the early returns aren't necessary any more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I don't really care, but it seems like we could use cost.saturating_add() and just set cost to INT_MAX or something instead of early return.)

}

struct Integrator<'a, 'tcx: 'a> {
block_idx: usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on what this is?

@michaelwoerister
Copy link
Member

However, keeping the spans from the original function triggers a multibyte-related assertion.

Not off the top of my head, unfortunately.

@nagisa
Copy link
Member

nagisa commented Sep 20, 2016

@Aatch Handling unwinding should be pretty easy as far as I can tell. Namely you have three patterns:

  • The callee has a block with resume terminator but call has no unwind edge;
    • Fine to inline the whole function verbatim (which I think this PR already does);
  • The callee has a block(s) with resume terminator and call has the unwind edge;
    • Replace any resume with goto to a block which was pointed to by the unwind edge;
  • The callee has no resume terminator and call has unwind edge;
    • Ignore the presence of the unwind edge (essentially removing it).

Note, how you do not need to handle any of the unwind edges inside the callee or caller other than remembering which block the unwind branch was pointing in the call terminator you’re inlining. Bonus points for making sure that any function, after inlining is complete, has only one resume terminator, but there’s nothing inherently requiring this to be true.

@pcwalton
Copy link
Contributor

@petrochenkov To add to what others have said, it improves other optimizations. For example (and this is what I want it for), if you inline Vec::new() and Box::new(), then this opens the door to removing calls to memcpy at runtime.

@bors
Copy link
Contributor

bors commented Sep 20, 2016

☔ The latest upstream changes (presumably #36388) made this pull request unmergeable. Please resolve the merge conflicts.

@Aatch
Copy link
Contributor Author

Aatch commented Sep 20, 2016

@nagisa I did have it inlining in more cases, but I ran into issues. I reduced the scope of the pass so I can get something landed first. Handling more cases can be done later.

@Aatch
Copy link
Contributor Author

Aatch commented Sep 21, 2016

@nagisa I'm not sure about that third case. If there's no cleanup to be done, then there won't be a resume in the callee function, but unwinding still needs to be routed correctly if the call has an unwind edge. A lack of resume is does not mean the function doesn't unwind.

@nagisa
Copy link
Member

nagisa commented Sep 21, 2016

Ah right, you are absolutely right. Its also a problem in the other cases
(presence of a resume does not imply every unwind ends up executing that
statement)

Ideally we'd want something like the nounwind LLVM attribute but relying on
further inlining removing the unnecessary edges also sounds fine to me.

On Sep 21, 2016 5:34 AM, "James Miller" notifications@github.com wrote:

@nagisa https://github.com/nagisa I'm not sure about that third case.
If there's no cleanup to be done, then there won't be a resume in the
callee function, but unwinding still needs to be routed correctly if the
call has an unwind edge. A lack of resume is does not mean the function
doesn't unwind.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#36593 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AApc0imuao3GBDqnZpUptDZFjGWoHZkQks5qsJevgaJpZM4KBM9H
.

@pcwalton
Copy link
Contributor

Anything I can do to help this along?

@Aatch
Copy link
Contributor Author

Aatch commented Sep 22, 2016

@pcwalton I'm just about to push a rebased version with a few other improvements (I fixed the unwinding-related problem). I still need to figure out the debuginfo thing. It's not strictly necessary, as the pass is currently opt-in, but I'd like to at least figure out why it's being breaking, even if it doesn't get fixed so I can make a note of it like I did with the reachable symbols thing.

So investigating debuginfo would probably be a big help.

@pcwalton
Copy link
Contributor

@Aatch How do I reproduce this bug? I've tried in various ways on Linux and can't seem to.

@pcwalton
Copy link
Contributor

@bors: try

@bors
Copy link
Contributor

bors commented Sep 23, 2016

⌛ Trying commit 312b9df with merge 41d36b4...

@Aatch Aatch changed the title [WIP] Initial implementation of inlining Initial implementation of inlining Sep 24, 2016
@Aatch
Copy link
Contributor Author

Aatch commented Sep 24, 2016

I feel confident in saying this is good enough for a first pass, with the understanding that the pass itself is still unstable and may have unknown bugs.

@Aatch Aatch changed the title Initial implementation of inlining [MIR] Initial implementation of inlining Sep 24, 2016
Copy link
Contributor

@pcwalton pcwalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Only had a few nits. Thanks so much for your work!

@nikomatsakis I think you should take a look at this too.

return_ty: self.return_ty.fold_with(folder),
var_decls: self.var_decls.fold_with(folder),
arg_decls: self.arg_decls.fold_with(folder),
temp_decls: self.temp_decls.fold_with(folder),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: odd indentation here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


let kind = match self.kind {
Assign(ref lval, ref rval) => Assign(lval.fold_with(folder), rval.fold_with(folder)),
SetDiscriminant { ref lvalue, variant_index } => SetDiscriminant{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: put a space before the { at end of line to be consistent with the pattern

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Call { ref func, ref args, ref destination, cleanup } => {
let dest = if let Some((ref loc, dest)) = *destination {
Some((loc.fold_with(folder), dest))
} else { None };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could use destination.map() here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh but then you need destination.as_ref().map() -- but that's ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

#[inline]
pub fn get_mut<'a>(&'a mut self, index: I) -> Option<&'a mut T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you can lifetime-elide the lifetimes here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

callgraph
}

pub fn scc_iter<'g>(&'g self) -> SCCIterator<'g> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you can lifetime elide 'g here

false
} else {
true
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just have an is_nop() function on Statement. I probably should have written one, sorry about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be in SimplifyCfg?


if Some(*target) == *cleanup {
*cleanup == None;
} else if !self.in_cleanup_block{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a space before the { at end of line

@@ -532,8 +528,58 @@ impl<'a, 'tcx> Inliner<'a, 'tcx> {
let return_block = destination.1;

// Copy the arguments if needed.
let args : Vec<_> = {
let args : Vec<_> = if is_box_free {
Copy link
Contributor

@pcwalton pcwalton Sep 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yuck! Shouldn't this be done at HIR→MIR lowering time?

This is fine for now, but I'd factor it out into a separate function, with a big ol' FIXME on top of it :)

highlight_end: h_end,
}
h_end: usize) -> Option<DiagnosticSpanLine> {
fm.get_line(index).map(|text| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment here saying what could cause this to return None (as the commit message does).

note_const_eval_err(bcx.tcx(), &err, span, "expression", &mut diag);
diag.emit();
if let Some(span) = self.diag_span(span, terminator.source_info.scope) {
let err = ConstEvalErr{ span: span, kind: err };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: put a space before {. (I know you just copy and pasted code from above, but might as well clean this up while you're here.)

@bors
Copy link
Contributor

bors commented Sep 24, 2016

☔ The latest upstream changes (presumably #36639) made this pull request unmergeable. Please resolve the merge conflicts.

Inlining needs to apply the substitutions from the callsite.
Also implements TypeFoldable on IndexVec
The deaggregator will generate assignments that aren't allowed in const
expressions.

This also refactors `is_const_fn` into a method on `TyCtxt`.

match self.kind {
Assign(ref lval, ref rval) => { lval.visit_with(visitor) || rval.visit_with(visitor) }
SetDiscriminant { ref lvalue, .. } |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: similarly here, can we spell out the fields like x: _

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

use mir::repr::TerminatorKind::*;

match self.kind {
If { ref cond, .. } => cond.visit_with(visitor),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: similarly here I'd avoid the .. if we can; it's so easy for these visitors to go wrong, and so hard to track down

fn super_fold_with<'gcx: 'tcx, F: TypeFolder<'gcx, 'tcx>>(&self, folder: &mut F) -> Self {
match self {
&Lvalue::Projection(ref p) => Lvalue::Projection(p.fold_with(folder)),
_ => self.clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: same here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

kind: &TerminatorKind<'tcx>, _loc: Location) {
if let TerminatorKind::Call {
func: Operand::Constant(ref f)
, .. } = *kind {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this formatting is pretty hard to read... can't we get it on one line? maybe use a match? Anything but this :P

children: graph::AdjacentTargets<'g, DefId, ()>
}

pub struct SCCIterator<'g> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be factored into librustc_data_structures, and ideally the graph algorithms portion of that code. But for now just open a FIXME about it.


//! MIR-based callgraph.
//!
//! This only considers direct calls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an issue, but it's interesting that this is an "optimistic" call graph -- i.e., it is only those things that we know will be called. Often people would assume a pessimistic approach. Maybe worth highlighting just a bit in the comment here.

};
let src = MirSource::from_node(self.tcx, id);
if let MirSource::Fn(_) = src {
let mir = if let Some(m) = map.map.get(&def_id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason for this to be absent? I guess cross-crate cases, eh?

continue;
};
let src = MirSource::from_node(self.tcx, id);
if let MirSource::Fn(_) = src {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd rather see

match MirSource::from_node(self.tcx, id) {
    MirSource::Fn(_) => continue,
    _ => (),
}

let mut changed = false;

loop {
local_change = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why not let mut local_change = false;?


csi -= 1;
if scc.len() == 1 {
callsites.swap_remove(csi);
Copy link
Contributor

@nikomatsakis nikomatsakis Sep 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I feel like the ordering tricks here are getting a bit subtle.

First we sorted callsites so that things in the SCC were at the end -- but now we are calling swap_remove, which usually only makes sense if ordering doesn't matter -- and this is part of a loop? So on the next iteration, the ordering is going to be messed up?

Perhaps the scc.len() == 1 condition means this is not true?

Maybe it'd be simpler to just keep a parallel "merged" array of booleans and set merged[csi] = true or something?

EDIT: Oh, and we are growing the array too...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If scc.len() == 1 then there's only one function in the current SCC, so the order doesn't matter. The order is just between "not in SCC" and "in SCC". The thing is, most SCCs are a single function, so in that case, the order doesn't matter at all. It's just a heuristic, the order doesn't matter for correctness, it just prioritises inlining from outside the SCC first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aatch

If scc.len() == 1 then there's only one function in the current SCC, so the order doesn't matter. The order is just between "not in SCC" and "in SCC". The thing is, most SCCs are a single function, so in that case, the order doesn't matter at all. It's just a heuristic, the order doesn't matter for correctness, it just prioritises inlining from outside the SCC first.

OK, that makes sense. This should at minimum be a comment. We can iterate on the overall structure later.

foreign_mir.as_ref().map(|m| &**m)
};

let callee_mir = if let Some(m) = callee_mir {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this ever be None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calls to functions that don't have a MIR. Foreign functions, for example.

let terminator = bb_data.terminator();
if let TerminatorKind::Call {
func: Operand::Constant(ref f), .. } = terminator.kind {
if let ty::TyFnDef(callee_def_id, substs, _) = f.ty.sty {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort of duplicates the logic in the callgraph code, right? Perhaps we can have a helper that enumerates the direct callees of a given MIR block?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also duplicates the logic just a half page up in this same function, right?

Copy link
Contributor

@arielb1 arielb1 Sep 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the ordering important? Won't we iterate up to a fixed point anyway?

if let TerminatorKind::Call {
func: Operand::Constant(ref f), .. } = terminator.kind {
if let ty::TyFnDef(callee_def_id, substs, _) = f.ty.sty {
// Don't inline the same function multiple times.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, more specifically this prevents us from inlining recursive calls from the callee -- but these could be indirect and still be a problem, right?

For example:

fn foo() {
    bar();
}

fn bar() {
    baz();
}

fn baz() {
    bar();
}

What prevents us from inlining bar into foo, then inlining baz into foo, then bar again?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's even cost-effective across the board to inline non-leaves without a more complex analysis.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Sep 27, 2016

I didn't review all the commits yet. This looks pretty nice, good work @Aatch. I know there's a ton of nits from a lot of people, I wanted to call attention to one or two specific comments of mine that I'd like some feedback on (from @Aatch or others):

EDIT: Finished reviewing.

// Attempts to get an appropriate span for diagnostics. We might not have the source
// available if the span is from an inlined crate, but there's usually a decent span
// in an ancestor scope somewhere
fn diag_span(&self, span: Span, scope: mir::VisibilityScope) -> Option<Span> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have tests for this?

* Iterator over strongly-connected-components using Tarjan's algorithm[1]
*
* [1]: https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ///, not that I care

@arielb1
Copy link
Contributor

arielb1 commented Sep 28, 2016

Why is the ordering important? Won't we iterate up to a fixed point anyway?

I think the nicest way to do this is:

    repeat until fixed point:
        for every function in SCC:
            try to inline all call-sites in function
            if function changed:
                do post-inlining cleanup passes

We might want to keep a cache for can_inline (just remember to invalidate it every time we inline that function!).

I think we need something to avoid being utterly slow with huge SCCs (e.g. not inlining any functions with >30 basic blocks, and not even trying conditional inlining).

@nikomatsakis
Copy link
Contributor

@arielb1

Why is the ordering important? Won't we iterate up to a fixed point anyway?

I confess I don't quite know. @Aatch wrote this: "It's just a heuristic, the order doesn't matter for correctness, it just prioritises inlining from outside the SCC first." I think I'm sort of lacking the big picture here.

I think we need something to avoid being utterly slow with huge SCCs (e.g. not inlining any functions with >30 basic blocks, and not even trying conditional inlining).

Probably so. That said, I think I'd like to be landing nice patches like this, particularly since it's gated on -O2. I guess there is some question but I feel like iterating in tree (behind a flag) is usually better than spinning on PRs for too long.

We do however need to build up some good infrastructure for doing crater-like testing that tracks compile time though -- we wound up kind of skimping there for MIR and, while I don't regret it, it's a shame. It'd be great to be able to build up a MIR optimization infrastructure and test it before we "throw the switch". I think though that cargo-apply is getting pretty robust between the improvements that @brson and I made. So most of the pieces are probably in place.

@arielb1
Copy link
Contributor

arielb1 commented Sep 28, 2016

I'm not sure SCC inlining is such a good idea - for one, I can't figure out a good way of extending it to a case where we try to inline (non-specializable) trait methods, and for another, I can't find any good "killer app" for it.

I would prefer to just never inline functions in the same SCC - this can deal with trait dependencies by executing Tarjan's algorithm online.

@bors
Copy link
Contributor

bors commented Sep 29, 2016

☔ The latest upstream changes (presumably #36752) made this pull request unmergeable. Please resolve the merge conflicts.

@pcwalton
Copy link
Contributor

@Aatch Let me know if you're going to fix this up soon—if not then I can do it :)

@alexcrichton
Copy link
Member

ping, what's the status of this?

@brson
Copy link
Contributor

brson commented Nov 9, 2016

@pcwalton are you still interested in following up on this work? Seems important this keeps moving.

@alexg117
Copy link

@pcwalton Are you still doing this? If not I'd like to pick it up.

@mrhota
Copy link
Contributor

mrhota commented Dec 5, 2016

@Aatch @pcwalton pinging for status check. It's that time of year where people start to have odd, very busy schedules. 😉

@mrhota
Copy link
Contributor

mrhota commented Jan 3, 2017

@Aatch @pcwalton status check?

@nikomatsakis
Copy link
Contributor

@alexg117

Are you still doing this? If not I'd like to pick it up.

I think it's safe to assume @pcwalton is otherwise engaged. I'd love to see this PR find a new champion! Let me know if I can help in any way.

@mrhota
Copy link
Contributor

mrhota commented Jan 10, 2017

@alexg117 looks like you can pick this up. What say you?

@mrhota
Copy link
Contributor

mrhota commented Jan 14, 2017

@alexg117 @nikomatsakis @pcwalton @Aatch I will update this. I'll let you know when you can close this PR in lieu of a freshly-rebased PR.

@nikomatsakis
Copy link
Contributor

@mrhota great! I think I'll just close it now because of the long period of inactivity, but please do open another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.