-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make borrow check results available to Clippy #108328
Conversation
r? @TaKO8Ki (rustbot has picked a reviewer for you, use r? to override) |
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
☔ The latest upstream changes (presumably #108325) made this pull request unmergeable. Please resolve the merge conflicts. |
r? @oli-obk I want to make sure this is the right approach, as just exposing some internal APIs is a refactoring hazard |
I read this PR and the clippy PR and got a few questions now:
|
To answer your questions, I need to enumerate a couple of my assumptions. The way I use these results is to answer questions of the form: are there borrowers of a variable beyond some set of known borrowers? (Please see here for my details.) The assumptions are:
Assuming 1 and 2, yes.
I'm a little unsure about the "exactly" part. But assuming 1, yes.
Maybe? I know very little about Polonius, but there are a couple of things that concern me. First, the intent of this PR is to give Clippy access to the same borrow check results the compiler uses in deciding whether code is valid. If I understand correctly, Polonius is not currently used in production. So I would be concerned that using it in Clippy could inadvertently cause bugs in Clippy. A secondary concern is this:
I understand the compiler APIs provide no stability guarantees, generally. But I worry about Clippy relying on a feature that is explicitly unstable. |
@oli-obk Were there any other questions I could answer on this topic? As an aside, if you're concerned about exposing internal APIs, another idea could be the following: add a flag to the compiler configuration to clone the various borrow checker structures after they are created, and then expose the cloned structures through a query. |
Sorry, I read that but then must have dismissed the notification before replying So... about obtaining the right MIR: My recommendation would be to change For the This refactoring seems useful on its own, as it allows clippy to stop using The second part is obtaining the borrowing information you need. Since what you actually want is to access Do you think that plan could work? |
I don't think I understand. When you say "overwrite the Are you suggesting to somehow inline (I'm sorry if I'm being dense.)
rust/compiler/rustc_interface/src/passes.rs Line 841 in 22f247c
The most straightforward way I can see realize your idea would be to add something like a "middle" pass that runs just after this: rust/compiler/rustc_interface/src/passes.rs Lines 786 to 788 in 22f247c
And then to change Do you think that could work?
I think this part makes sense to me. |
Sorry, I got compiler-driver brain worms. Let me explain from the beginning: the rust/src/tools/clippy/src/driver.rs Line 131 in 01ffa51
Line 263 in 2625518
so... basically you make a new query, and within that query you can call rust/compiler/rustc_borrowck/src/lib.rs Line 136 in 7e2ecb3
Mode::Clippy .
This way you get the NllOutput and you can then run the logic of RedundantClone::check_fn directly, without needing to build a LintContext or similar. It's a different way of doing lints, for any lints that don't actually visit expressions but only work on MIR, there's no fundamental difference. There are some things you'll have to do manually, as you don't have a LintContext that gives you a |
@oli-obk Thanks very much for your patient response. I didn't know it was possible overwrite a query's function pointer. I think I'm going to have to try implement this idea to understand its implications. I'll reply once I've had a chance to do so. Thanks again. |
Oh one thing I just remembered: you can do the "run lints in borrowck query" part even without changing borrowck or rustc at all. This may be worthwile on its own and can done entirely within the clippy repo. Should probably bug some other clippy devs about whether that approach seems maintainable and generally good |
Switching to waiting on author to incorporate changes (by readin the latest comments). Feel free to request a review with @rustbot author |
Hi, @jyn514. Thanks for checking in. Yes, I still plan to work on this. I have simply had other priorities. I will try to get back to this soon. Thanks. |
Expose more information in `get_body_with_borrowck_facts` Verification tools for Rust such as, for example, Creusot or Prusti would benefit from having access to more information computed by the borrow checker. As a first step in that direction, rust-lang#86977 added the `get_body_with_borrowck_facts` API, allowing compiler consumers to obtain a `mir::Body` with accompanying borrow checker information. At RustVerify 2023, multiple people working on verification tools expressed their need for a more comprehensive API. While eventually borrow information could be part of Stable MIR, in the meantime, this PR proposes a more limited approach, extending the existing `get_body_with_borrowck_facts` API. In summary, we propose the following changes: - Permit obtaining the borrow-checked body without necessarily running Polonius - Return the `BorrowSet` and the `RegionInferenceContext` in `BodyWithBorrowckFacts` - Provide a way to compute the `borrows_out_of_scope_at_location` map - Make some helper methods public This is similar to rust-lang#108328 but smaller in scope. `@smoelius` Do you think these changes would also be sufficient for your needs? r? `@oli-obk` cc `@JonasAlaif`
By overriding the `mir_borrowck` query
@oli-obk I've been trying to implement your suggestion. (Here is the latest version of the associated Clippy PR.) However, I've run into some difficulties. TBH, even if these difficulties can be overcome, I am concerned that writing lints this way is too prohibitive. Is exposing The problems I still face include the following:
If exposing |
The job Click to see the possible cause of the failure (guessed by this bot)
|
It's not off the table, I would just like to figure out how to ensure we don't end up exposing everything in the compiler that isn't actually used within the compiler. I'd like to have a good way for clippy to just register its very own queries, but that's not possible at present. Stepping back from the exact implementation: What you want is a way to compute (and obtain!) things from inside the compiler. As a side constraint, we don't want to make the compiler internal APIs more complex. So, the options I see are:
So yea, I'm not sure where to go from here, but I guess as long as clippy is ok with that lint getting removed at a moments notice if borrowck makes a large enough refactoring so that we can't provide this information anymore easily, we can go with your original option. |
Thanks very much for your detailed response.
May I ask, can you see a point in the future when the borrow checker is more stable? My impression is that "keeping Clippy working" is a factor influencing API changes today. Put another way, API changes may require changes to a Clippy lint, but the information the lint needs will still be available in some form. "Keeping Clippy working" is one of the reasons why Clippy is mirrored in the Rust repo, for example. (Is this impression wrong?) Can you see a point in the future when a Clippy lint could rely on borrow checker results, and the same standard (assuming it's correct) could apply? |
yea, I'd assume once we get poloniusification (not switching to, but migrating to a similar scheme) done, we have a certain stability of the output for a decade or so. But that migration may be unduly burdened by having to support clippy. It may be reasonable to port, or it may require rewriting every lint using borrowck results from scratch. |
It sounds then like the most reasonable option would be to wait for the Polonius migration to complete, and revisit this issue thereafter. Would you agree? |
That seems reasonable, but you may be waiting over a year.
No, this impression is spot on. Maybe it would be ok for the lint to just not do anything for a bit while it gets patched up, instead of having to fix it immediately when a big refactoring happens? |
The lints affected by these changes would
Is that right? (Sorry if I'm misunderstanding.) |
Basically we'd just suddenly feed you empty borrowck data. This will break those lints, and it would then be up to the refactoring author and reviewer to decide whether to ask you to fix it or whether they can fix it in a follow up PR. This may end up in stable for a release or two if it can't be done fast enough |
@Jarcho Can I ask your opinion on having two versions of Based on #108328 (comment) and what follows, I see the implied subgoals as:
The easiest way I can see to achieve the above is:
Both options would need to be enabled to use the What you do think? (@oli-obk Please correct me if I've misinterpreted or contradicted anything you wrote.) |
I don't think we need cargo features. Just having borrowck return an option that we can turn into None at some point seems sufficient. |
The situation I was imagining was (say) the contents of the |
I expect changes to be either small enough for that to be ok, or massive enough for it to become unrecognizeable. We could keep the old datastructures, so no code breakage, but return Though at this point I'm feeling like this puts too much burden on the maintenance side of clippy. I'm sorry for making you go through all these hoops only to end up accepting that your original proposal is probably the best one. Sticking the data you need into a field that rustc sets to |
Maintaining two implementations of every lint that uses the borrowck results isn't a great option. Especially as more lints start to make use of it the maintenance burden will become untenable. Would it be reasonable to commit to a narrow interface wrapping the results? With the current two lints the only thing that needs to be queried is if a local has more than a single borrower as a given location. |
That seems like something any change to the borrow checker could keep around, yes |
That sounds like a good way forward then. As far as deciding whether or not to emit a lint, we should be fine with just three queries:
More might be needed to improve lint messages, but losing anything here temporarily wouldn't be too big of a deal. |
@Jarcho You're imagining something like the possible borrower adaptation I proposed, but in the compiler instead of in Clippy? Before going down that path, please let me clear up some confusion I may have caused. When I said "two versions" of those lints, I meant in an intuitive sense, e.g., there would be a handful of locations where those lints would choose between one of two code paths. To be a little more concrete, I imagined the
Adding to that, I wouldn't expect there to be much practical difference between having just a In short, I wouldn't expect Clippy's maintenance burden to be very high, either way. Does this affect your opinion at all, @Jarcho? |
For the lints that currently exist, the maintenance burden isn't all that high. With that said, both forms would still need to be fully tested. Even with testing it still runs the risk that switching over could still completely break the lints (very unlikely for the current lints, but could be more so for future lints). See #108944 as an example of completely breaking My main point here is having a limited interface that can, with reasonable ease, also be implemented by future borrowck implementations is better than picking through things on clippy's side. So the answer to your original question is yes, but limited to the minimum interface clippy actually needs. |
OK. I'll try to make it happen. |
☔ The latest upstream changes (presumably #113734) made this pull request unmergeable. Please resolve the merge conflicts. |
Allowing re-implementation of mir_drops_elaborated query For our use case of the rust compiler interface (a rust verifier called [Prusti](https://github.com/viperproject/prusti-dev/)), it would be extremely useful if we were able to "copy" the implementation of the `mir_drops_elaborated_and_const_checked` query to override it. This would mean that the following items would need to be made public: >https://github.com/rust-lang/rust/blob/6d55184d05c9bd3c46b294dcad3bfb1d0907e871/compiler/rustc_mir_transform/src/lib.rs#L434 >https://github.com/rust-lang/rust/blob/6d55184d05c9bd3c46b294dcad3bfb1d0907e871/compiler/rustc_mir_transform/src/inline.rs#L32 (for the latter its module needs to be public or it needs to be re-exported) To explain why (we think) this is necessary: I am currently working on a new feature, where we try to modify the generated executables by inserting certain additional checks, and potentially perform some optimizations based on verification results. We are using the rust compiler interface and most of our goals can be achieved by overriding queries, in our case this is currently `mir_drops_elaborated_and_const_checked`. However, at the moment this approach is somewhat limited. When overriding queries, we can call and steal the base-query and then modify the results before allocating and returning those. The problem is that the verification works with a copy of `mir_promoted`. For the modifications we want to make to the mir, we would often want to rely on results of the verifier that refer to Locations in the `mir_promoted`. We can not modify the `mir_promoted` query using these results, because to run the verification we also need the results of `mir_borrowck()`, which means `mir_promoted` will already be constructed and cached. The Locations we get from the verifier are also no longer usable to modify `mir_drops_elaborated_and_const_checked`, because the MIR obviously changes between those 2 phases. Tracking all Locations between the two seems to be pretty much unfeasible, and would also be extremely unstable. By being able to override the query with its original implementation, we could modify the MIR before drop elaboration and the various other passes are performed. I have spent quite a bit of time investigating other solutions, and didn't find any other way solving this problem. If I still missed something I would of course be happy to hear any suggestions that do not require exposing more internal compiler functionality. However, I think being able to re-implement certain queries could also benefit other use cases in the future, for example in PR rust-lang#108328 one of the approaches discussed involved doing the same thing for `mir_promoted`.
Allowing re-implementation of mir_drops_elaborated query For our use case of the rust compiler interface (a rust verifier called [Prusti](https://github.com/viperproject/prusti-dev/)), it would be extremely useful if we were able to "copy" the implementation of the `mir_drops_elaborated_and_const_checked` query to override it. This would mean that the following items would need to be made public: >https://github.com/rust-lang/rust/blob/6d55184d05c9bd3c46b294dcad3bfb1d0907e871/compiler/rustc_mir_transform/src/lib.rs#L434 >https://github.com/rust-lang/rust/blob/6d55184d05c9bd3c46b294dcad3bfb1d0907e871/compiler/rustc_mir_transform/src/inline.rs#L32 (for the latter its module needs to be public or it needs to be re-exported) To explain why (we think) this is necessary: I am currently working on a new feature, where we try to modify the generated executables by inserting certain additional checks, and potentially perform some optimizations based on verification results. We are using the rust compiler interface and most of our goals can be achieved by overriding queries, in our case this is currently `mir_drops_elaborated_and_const_checked`. However, at the moment this approach is somewhat limited. When overriding queries, we can call and steal the base-query and then modify the results before allocating and returning those. The problem is that the verification works with a copy of `mir_promoted`. For the modifications we want to make to the mir, we would often want to rely on results of the verifier that refer to Locations in the `mir_promoted`. We can not modify the `mir_promoted` query using these results, because to run the verification we also need the results of `mir_borrowck()`, which means `mir_promoted` will already be constructed and cached. The Locations we get from the verifier are also no longer usable to modify `mir_drops_elaborated_and_const_checked`, because the MIR obviously changes between those 2 phases. Tracking all Locations between the two seems to be pretty much unfeasible, and would also be extremely unstable. By being able to override the query with its original implementation, we could modify the MIR before drop elaboration and the various other passes are performed. I have spent quite a bit of time investigating other solutions, and didn't find any other way solving this problem. If I still missed something I would of course be happy to hear any suggestions that do not require exposing more internal compiler functionality. However, I think being able to re-implement certain queries could also benefit other use cases in the future, for example in PR rust-lang#108328 one of the approaches discussed involved doing the same thing for `mir_promoted`.
@smoelius any updates on this? |
@Dylan-DPC Not currently. But thank you for the ping. I will try to get back to this soon. |
@Jarcho Could you please elaborate on how you see these being used by Clippy's existing lints? EDIT: The |
I'm going to close this. I may revisit this at some point in the future. |
These changes allow Clippy to reconstruct the MIR on which the borrow checker is run, so that Clippy can reproduce its results.
Most of the changes are about making functions public. A notable exception is the addition of the
NllCtxt
structure. The purpose of this struct is to make it easier to callnll::compute_regions
. IMHO, the introduction of this struct also untangles the code a bit (i.e., helps to expose what depends on what), as an additional benefit.For context, Clippy now uses a MIR visitor called
PossibleBorrowers
. Intuitively, this visitor produces an approximation of the compiler's borrower check results. But its imprecision can lead to false negatives in certain circumstances (see rust-lang/rust-clippy#9386 (comment), for example). My previous attempt to makePossibleBorrowers
more precise did not end well: it had bad worst case complexity, which was exposed by some in-the-wild programs.The reviewer (@Jarcho) and I have discussed ways of improving
PossibleBorrowers
. But one could argue that the "right" approach is to use "ground truth" rather than an approximation, "ground truth" being the results of the compiler's borrow check analysis.You can see how these changes will be used in rust-lang/rust-clippy#10173. Relevant discussion starts at rust-lang/rust-clippy#10173 (comment).