Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NLL: Redo two-phase borrows with conservative, narrow focus #48431

Closed
pnkfelix opened this issue Feb 22, 2018 · 10 comments
Closed

NLL: Redo two-phase borrows with conservative, narrow focus #48431

pnkfelix opened this issue Feb 22, 2018 · 10 comments
Labels
A-NLL Area: Non-lexical lifetimes (NLL) E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.

Comments

@pnkfelix
Copy link
Member

pnkfelix commented Feb 22, 2018

The current implementation of two-phase borrows (PRs #46537, #47489, #48197) should be thrown out and replaced with something more tailored to specific goals of the original RFC 2025.

The current implementation was written as if we could apply the concept of two-phase borrowing to any &mut-borrow that we find in the MIR, and then we tacked on a restriction to certrain autorefs. The latter restriction was originally motivated by not wanting the borrowck analysis to be too tightly tied to the particular details of what strategy is used to construct MIR (the general form as implemented would accept or reject code based on whether one e.g. introduced extra moves of temps).

However, in our attempts to get two-phase borrows working as we expected, we have run into some issues (e.g. #48070, #48418). These seem to mostly be arising because the existing code was trying to be general purpose, but there is not an obvious straight-forward way to fix the aforementioned bugs (which tend to be either regressions taking the form of rejecting code that is accepted w/o two-phase borrows or diagnostics regressions), at least not without injecting soundness bugs.

After some discussion with @nikomatsakis, we decided the best way forward would be to redo the two-phase borrow support.

In order to have some context, in the details immediately below here is an outline of how two-phase borrow support is currently implemented.

(note that the plan may be to throw away much of this code, at least the code in the rustc_mir crate. Don't worry too much about trying to preserve the overall algorithm outlined here.)

  • The types representing &mut borrows track whether they support two-phase borrows (i.e. they track whether they arose from method-call autoref adjustments); see the field allow_two_phase_borrow in AutoBorrowMutability.
  • That type information is fed into the MIR construction; see allow_two_phase_borrow in mir::BorrowKind.
  • When we do dataflow analysis, we have one analysis to compute the reservations of borrows, and a second analysis to compute the activation points. This second analysis is really just computing all of the uses that follow some reservation; it is up to the code using the analysis to know whether it needs the first such use in the control flow, or if it wants any such use.
    • (This sort of weirdness is in part why this code base is making it hard to resolve issues like two-phase borrows creates extra error messages #48418.)
    • Also, the dataflow does not consult the allow_two_phase_borrow information carried in the mir::BorrowKind. The dataflow results are the same regardless of that setting; the only difference is in how the dataflow is interpreted. (pnkfelix's original motivation for this was that he thought this would make it easier to debug code, by ensuring that everyone is staring at the same dataflow results...)
  • In the MIR-borrowck code, we consult both of the above two analysis results (reservations and activations).
    • For each reservation, we see if it allows two-phase borrows; if so, we check it for conflicts using one special path in the compiler. If not, then we use the old path that is more stringent.
    • Every activation is also checked (for whether it conflicts with any shared borrows, namely ones that were started during the reservation of that borrow).

With those details out the way, in these additional details below I outline the basic idea of the two-phase borrow rewrite:

  • One of the motivations of the original code was to express two-phase borrows in a manner that worked with both lexical lifetimes (yet still using mir-borrowck) and non-lexical lifetimes (NLL). This mattered when NLL was under very active development, but now it seems safe to introduce a tight coupling between NLL and two-phase borrows.
    • In other words, its okay e.g. to require some notion of a non-lexical reservation region that ends at the point of the activation, even though such a thing cannot be expression under the lexical regions model.
  • Use the existing information on the types and the MIR to track which &mut-borrows support two-phase borrows. Note that this flag is only saying that such a borrow may allow two-phases; further conditions need to hold in the constructed MIR for the phases to be observable.
  • Revise the mir::dataflow::impls::borrows to track activations more precisely, both in the sense of using the allow_two_phase_borrow flag, and also in terms of encoding the following activation semantics:
    • For any borrow tmp = &mut place that says it allows two-phase borrows, determine if there is a unique use of tmp that post-dominates the borrow. Also determine if the borrow dominates that use.
      • (In other words, does the use of tmp have solely that borrow as its definition, and does the definition have that use as its only use.)
    • If this condition does not hold (i.e., if the use has more than one definition, or the borrow can flow to more than one "first use"), then the borrow just immediately activates the borrow; we don't have phasing here. To use dataflow terminology, the borrow statement tmp = &mut place, when tmp does not have a uniquely determined "first use", causes the gen-bits for both the reservation and for the activation to be set to 1.
    • If the condition does hold, then the borrow tmp = &mut place just reserves (i.e. the borrow statement sets the reservation gen-bit to 1), and the (uniquely determined) associated use activates (i.e. use statement sets the activation gen-bit to 1).
    • If tmp does have a uniquely determined first-use, but there are also control-flow splits before that use is reached (e.g. due to unwind paths from function calls), then it may suffice to again says that the borrow immediately activates. But another option may be to say that the borrow solely reserves, and any control flow branch is not post-dominated by the unique first use causes an immediate activation.
      • @nikomatsakis noted that in such scenarios, the tmp in question should be considered dead, and thus the NLL region won't cover the flow of such branches anyway. So this may be an irrelevant detail. But @pnkfelix just wanted to point it out in case it arises...
  • It may be simplest, in terms of reusing the existing borrow-check code, to continue to allow the reservation and activation bits to be set to 1 at the same time. But its possible this intuition is wrong; @nikomatsakis had outlined a desired to actually represent distinct regions, one for the reservation, and one for the activation. That might be implemented by actually representing such regions, or it might be implemented by having the dataflow bits reflect that the reservation and activation bits are never both set to 1 for a given place.
  • We believe that with the new conditions above, the resulting dataflow should ensure that you never have an activation setting a bit to 1 that was already set to 1, at least for two-phased borrows. This should resolve some problems we were wrestling with, in terms of duplicate errors. It may also open up potential for simplifying the supporting code in rustc_mir::borrow_check.
  • The current rustc_mir::borrow_check support for reservations and activations may be salvageable, depending on the details of how the dataflow changes are done. You would probably throw away the special-case reading of allow_two_phase_borrows in rustc_mir::borrowck, since this information should now be influencing the dataflow results and there should be no need for the mir-borrowck to also incorporate it into its own analysis.
    • The current mir-borrowck has checks both at the point of reservation of a borrow and at its (separate) activation.
    • You would want to continue some sort of checking at the point of reservation, since we need to continue ensuring that reservations act just as shared borrows do.
    • How to handle the activation point is more of an open question. Either
    • you can do checking at the activation point (which requires finding out whether it has a conflict with any shared borrows in scope, taking care that an activation not interfere with its own reservation), or
    • you can do the checking of any borrows that occur during the reservation and before the activation, somehow ensuring that they do not extend beyond the activation point. @nikomatsakis outlined the option to @pnkfelix attributing the mental model to @RalfJung, but @pnkfelix honestly is not clear on how to actually implement this check without doing something that ends up looking a lot like a check at the activation point anyway...
@pnkfelix pnkfelix added E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. A-NLL Area: Non-lexical lifetimes (NLL) WG-compiler-nll labels Feb 22, 2018
@nikomatsakis
Copy link
Contributor

Some notes, giving my take on what @pnkfelix wrote:

For any borrow tmp = &mut place that says it allows two-phase borrows, determine if there is a unique use of tmp that post-dominates the borrow. Also determine if the borrow dominates that use.

I would rather say that, for any borrow tmp = &mut place, there must be a unique use of tmp that post-dominates the borrow (modulo unwinding paths), or else the MIR is ill-formed. Moreover, tmp must be dead (no longer used) in those unwinding paths.

I believe that to be true by construction, since we only enable 2-phase borrows for auto-refs on method calls (foo.bar()) and indexing (foo[bar]), and in both of those cases we will basically generate code like:

TMP0 = &mut foo; // create the receiver in a temporary the user cannot access
... // prepare the other arguments; this may introduce basic blocks etc
Foo::bar(TMP0, ...)

I don't know how any use of TMP0 could be inserted in between its creation and the call itself -- but it could certainly happen that there is unwinding, since the other arguments may do fallible things. That unwinding should not access TMP0, though, particularly since dropping a &mut borrows is a no-op.

We can (I believe) find this unique activation by doing the following:

  • Start a DFS from the &mut borrow with region R. Stop when we encounter a use U of the temporary, adding U to the set of activations. If we exit the region R, stop the walk. If we encounter a RESUME or other "final" terminator, that's a bug.
    • There may be some edge case where this doesn't work? But if so I'd be interested to see it.
  • This process should terminate with an activation set of cardinality 1, or that's a bug.
    • Again, if this is not true, then I would very much like to see the counterexample so I can update my mental model. =)

If the borrow is not marked as 2-phase, then the unique activation is just the borrow itself.

@RalfJung
Copy link
Member

you can do the checking of any borrows that occur during the reservation and before the activation, somehow ensuring that they do not extend beyond the activation point. @nikomatsakis outlined the option to @pnkfelix attributing the mental model to @RalfJung, but @pnkfelix honestly is not clear on how to actually implement this check without doing something that ends up looking a lot like a check at the activation point anyway...

I laid this out in rust-lang/rfcs#2025 (comment). However, I understand that borrowck doesn't actually work the way I described there. I think that translating this into constraints the way NLL works (well, the way I think it works; I don't have a very good understanding of the actual implementation) would require a new kind of constraint: When a &'r 'w mut2 T is used, there has to be a constraint that this point of the CFG and everything that follows is NOT contained in 'r (i.e., 'r has ended). Not sure if that's even remotely compatible with NLL...

@RalfJung
Copy link
Member

RalfJung commented Feb 23, 2018

I think that translating this into constraints the way NLL works (well, the way I think it works; I don't have a very good understanding of the actual implementation) would require a new kind of constraint: When a &'r 'w mut2 T is used, there has to be a constraint that this point of the CFG and everything that follows is NOT contained in 'r (i.e., 'r has ended). Not sure if that's even remotely compatible with NLL...

Thinking about this again, something like this must already be in NLL. How is reborrowing handled? In

let x = &mut *foo; // foo: &mut T
*foo = ...;
x = ...;

somehow NLL must notice that foo is used again before the lifetime of the reborrow expires. Couldn't the same mechanism also do the check for two-phase borrows?

EDIT: Ah, of course -- this is handled by borrowck, whereas the region inference is done earlier. So I guess what I am proposing is that when a two-phase borrow is reserved (tmp = &'r 'w mut2 lv), there immediately is a shared loan of tmp for 'r. Then borrowck should do all the checks that are necessary, would it not?

@sapphire-arches
Copy link
Contributor

I'm interested in working on this. From reading the writeups above it seems like my approach should be something along the lines of:

  • Remove the Reservation and ActiveBorrows and bring back the old Borrows
  • Extend Borrows to handle 2-phase borrows via the algorithm outlined by @nikomatsakis above.
    I should be able to take a stab at this tomorrow and will hopefully have an implementation before Wednesday.

@pnkfelix
Copy link
Member Author

@bobtwinkles that sounds great!

If you want to ask any questions or just discuss how things are going, please feel free to join the NLL-dedicated gitter chat at https://gitter.im/rust-impl-period/WG-compiler-nll

@sapphire-arches
Copy link
Contributor

Unfortunately I ended up having less time to work on this than expected so I suspect this will slip until sometime before Monday of next week. Sorry all.

@KiChjang
Copy link
Member

KiChjang commented Mar 4, 2018

@bobtwinkles How far are you in this?

@sapphire-arches
Copy link
Contributor

I've taken a few stabs at it but nothing has panned out quite yet. My first two attempts failed due to the issue outlined in this comment (namely, the dataflow system doesn't supported modifying gen/kill sets during fixed point iteration). The next thing I'm going to try is modifying the GatherBorrows pass to also compute the activation point for borrows and replace the Reservations/ActiveBorrows passes to use that information.

@sapphire-arches
Copy link
Contributor

I have things mostly working on this branch https://github.com/bobtwinkles/rust/tree/two_phase_borrows_rewrite.

I'm still failing a handful of tests, but I think the issues are tractable. Hopefully I can get this wrapped up tomorrow. The main issue I'm fighting with is balancing a narrow focus with adding complexity that's just going to get removed later when 2-phase borrows are extended later. Specifically, the two-phase-activation-sharing-inference.rs, two-phase-reservation-sharing-inference.rs, two-phase-nonrecv-autoref.rs, and two-phase-allow-access-during-reservation.rs compile-fail tests are giving me trouble. There are also a few run-pass tests that fail but I have yet to look in to those. It may be due to how I've implemented this bit of code). For now I'm going to sleep on it and come back with fresh eyes tomorrow =). I will be around in the Gitter at after 2PM EST tomorrow should anyone feel like banging their head against this code with me.

sapphire-arches added a commit to sapphire-arches/rust that referenced this issue Mar 9, 2018
See rust-lang#48431 for discussion as to why this was necessary and what we hoped to
accomplish. A brief summary:
   - the first implementation of 2-phase borrows was hard to limit in the way we
   wanted. That is, it was too good at accepting all 2-phase borrows rather than
   just autorefs =)
   - Numerous diagnostic regressions were introduced by 2-phase borrow support
   which were difficult to fix
@nikomatsakis
Copy link
Contributor

I believe we can close this, now that #48770 has landed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-NLL Area: Non-lexical lifetimes (NLL) E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.
Projects
None yet
Development

No branches or pull requests

5 participants