-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP toward LLVM Code Coverage for Rust #70680
Conversation
rust-lang#34701 Prototype coverage implementation with partial support for the various branch types to be covered. cargo -vv rustc -- -Z instrument-coverage With the `-Z instrument-coverage` flag, the compiler injects calls to a new std::coverage module, adding counters at various points in the code. At the end of the compile, a list of coverage regions (`Span`s) and corresponding coverage retion IDs are printed to stdout. When executing the program, the counters increment statically, and upon normal exit, the counters are printed to stdout with the same corresponding coverage region IDs. The stdout results will be replaced by output files in a format compatible with LLVM coverage reports. Currently counts the following coverage regions: * All blocks ending in a semicolon or expression * break, return, and yield statements, with or without a value expression * continue statment Not yet supported are: * match pattern arms without a block * closures without a block * expressions ending in '?' (that return on Result::Err) * lazy boolean right-hand-side expressions that may not be invoked
rust-lang#34701 Prototype coverage implementation with partial support for the various branch types to be covered. cargo -vv rustc -- -Z instrument-coverage With the `-Z instrument-coverage` flag, the compiler injects calls to a new std::coverage module, adding counters at various points in the code. At the end of the compile, a list of coverage regions (`Span`s) and corresponding coverage retion IDs are printed to stdout. When executing the program, the counters increment statically, and upon normal exit, the counters are printed to stdout with the same corresponding coverage region IDs. The stdout results will be replaced by output files in a format compatible with LLVM coverage reports. Currently counts the following coverage regions: * All blocks ending in a semicolon or expression * break, return, and yield statements, with or without a value expression * continue statment Not yet supported are: * match pattern arms without a block * closures without a block * expressions ending in '?' (that return on Result::Err) * lazy boolean right-hand-side expressions that may not be invoked
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @varkor (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
@@ -0,0 +1,83 @@ | |||
//! Code coverage counters and report |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this being added to the standard library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to add the library of rustc-injected coverage APIs to a reserved namespace available only to rustc. "core" doesn't make sense to me. I'm not sure if there are any alternatives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One approach is a dedicated crate that is distributed along with the rustc and injected as a dependency when requested during compilation. For example, -Zprofile
injects profiler_builtins
crate, which you might potentially also find useful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually started with "coverage_builtins" following that model, but wasn't sure if/how that would make it into the program's library dependencies, and since "coverage_builtins" is not a truly reserved namespace, it made me uneasy.
builtins::coverage would be great, perhaps, if the Rust community is open to reserving the "builtins" namespace for instrumentation APIs. (Or is it already?)
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using LLVM's runtime support in compiler_rt instead (along with the llvm.instrprof.increment intrinsic).
Relevant issue #34701 |
I'm surprised this is not using the LLVM intrinsics at all. Also, I don't think "statements" are that meaningful in Rust. Back in 2018 when a client wanted instrumentation-based code coverage (which didn't actually end up happening), I did a bit of research and had some plans. IIRC, I wanted to instrument Rust code at the MIR level, and have counters for each basic block and for each branch edge, that matched the respective features of LLVM coverage tooling. Either way, I would prefer not to instrument Rust code before MIR, as much as possible. |
The most recent recommendations in the discussion last year at #34701 was to use pre-lowered (and post-macro-expansion) ASTs, and so far I think it is working well. It gives a good mapping of regions to the actual source, so when users view the source with coverage annotations, it's clear what was and was not executed. The coverage regions will (eventually) be hashed representations of the source locations (file, function, and some best-attempt at identifying internal branches based on the original source code content). I'm no expert on MIR though. I just didn't invest a lot of time into it since that wasn't the recommendation. But based on this approach, I think that explains why LLVM intrinsics are not relevant to the instrumentation. The plan is to export the coverage regions and the counters to LLVM-compatible formats so LLVM coverage analysis tools can be used. |
That can be done from the MIR, using
Ah, nobody pointed me to #34701 so I had no idea there was discussion on this topic already.
|
@eddyb - Thanks for these recommendations. There's time to consider migrating the instrumentation to the MIR if it turns out to have some strong advantages. I would point out one other comment that particularly caught my eye since it was from @eggyal, who did attempt coverage instrumentation at the MIR level initially:
Reading this and @bob-wilson's other comments pushed me toward my current approach. |
Great timing! Since I hadn't seen any recent progress on #34701 , I just spent the last few days trying to start an implementation following the same approach that we used in Clang. I'll take a look at this soon. I still think the coverage should be based on the ASTs, not MIR or even HIR. |
That's great Bob! Following your recommendation, I looked at and drew inspiration form the Clang implementation. |
@bob-wilson I'm working with llvm developers our our team, and we discussed leveraging the existing llvm intrinsic counters (@eddyb, FYI, it does look like llvm intrinsics will help me here!):
I think this will fit well into my implementation. We discussed ways I might implement the parameters to this function, and had a question or two you might be able to answer. Per the docs, the How important this recommendation? Do any downstream coverage tools actually use the function name? (Especially since this is not C++ code?) If it is assumed to be a function name, and interpreted as such, what are the use cases for that? I'm asking because my assumption was that the downstream tools really only need the coverage regions (source file and span) and the counts. Let me know if that's not enough (if you know). Thanks! |
// add more visit_???() functions for language constructs that are branched statements without | ||
// blocks, such as: | ||
// visit match arm expr | ||
// if not block, wrap expr in block and inject counter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we care much about blocks in particular at all (unless they happen to be the block which defines a function body). Looking directly for branch expressions seems like what we want.
In clang focusing on blocks makes a bit more sense, since blocks usually correspond to some kind of branch in C++. (Even then it doesn't seem like exactly the right thing to me, but maybe I'm missing something.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point and simplifies the solution by not making non-block branch expressions a special case. Cool!
I still need to support function blocks though, I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I need to give that more thought ;-)
return (and yield), break, and continue may complicate the idea that this is simpler. I'll think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just following up on this: I've been experimenting with various ways to inject the counters, in a number of different (and some obscure) branch types, and as it turns out (as best I can tell), my initial approach seems to be the best way to handle block expressions.
Non-block expressions are straightforward enough, but I can't apply the same simplicity to blocks because of the possibility that the block might contain a return, break, or continue.
(If the block doesn't contain a return, break, or continue, I think the block can be handled just like a non-block expression, but intuitively I think it's better to handle all blocks the same and not worry about what might or might not be inside them.)
For example:
match var1 {
Pattern1 => expr(),
...
I can count the execution of the Pattern1 arm with:
match var1 {
Pattern1 => count(..., expr()),
...
But (if I interpretted your suggestion correctly), to ignore blocks and just inject counters based on expressions, I tried the following (which certainly looks tempting):
match var1 {
Pattern1 => count(..., {
do_one_thing();
get_result()
}),
...
No problem there, but:
match var1 {
Pattern1 => count(..., {
println!("Bad pattern");
return Err(...);
}),
...
This will not compile because it is clear (to the compiler) that count()
will never be executed.
So this is what led me to the current prototype solution for blocks, in this case:
match var1 {
Pattern1 => {
println!("Bad pattern");
return count(..., Err(...));
},
...
There are a few other edge cases like this that require handling blocks differently from non-block branched expressions. I'll try to cover these in future documentation, with my branch analysis.
Let me know if I missed something about your suggestion.
Thanks!
Definitely we should use the LLVM intrinsics! The runtime support in LLVM organizes the execution count data by function name. I'm sure there are other ways to do it, but I recommend following the precedent. If the Rust implementation fits into the existing usage, it will probably save some trouble in the initial implementation and should also be easier to maintain going forward. There are some other LLVM functions that we should also use. At least some of these are not currently exported in the LLVM C APIs, but it shouldn't be a problem to fix that:
One complication here is that I don't think the Rust compiler should be calling these LLVM functions outside of the librustc_codegen_llvm crate. I don't know the architecture of the Rust compiler, but calling those functions directly in a pass over the AST seems like a bad idea. I was thinking to insert the instrprof.increment intrinsic calls with a dummy argument for the variable and then swap in the real value when handling the intrinsic in librustc_codegen_llvm. It would make sense to generate the coverage mapping directly from the code walking the AST, and that will also need to use the same function naming scheme. If we can reimplement the equivalent of llvm::getPGOFuncName in Rust, that will be fine. I believe that's the only LLVM function that is needed outside of librustc_codegen_llvm. |
Yes! I suggested the same idea yesterday with my team. I'll pursue that idea and see where it goes. Thanks for the suggestions! |
Discussed in today's T-compiler ad-hoc triage meeting. We think that a change like this warrants at least a Major Change Proposal aka "MCP" (*) It may also warrant a design meeting, but for now we can start with an MCP since that is lighter weight and won't require waiting for a synchronous design meeting. (*) a protocol that is in the midst of being finalized via a proposed final comment period ("FCP"), but I think you can safely create such proposals now, before the FCP has finished. |
Sounds good. I was already planning to put together some kind of RFC, but will use the MCP format instead. Thanks for providing this guidance! |
@pnkfelix - Just a quick note to let you know I've completed a draft of the MCP. I'm circulating it to my internal team to make sure it addresses the requirements we need for Fuchsia's toolchain (the main reason I've been asked to work on this), and to ensure I'm addressing any concerns, before posting the MCP for the Rust compiler team's review. I expect to be able to post the MCP in the next couple of business days. Thanks! |
@richkadel Ping from triage, any updates here? Thanks! |
@crlf0710 - Thanks for the ping. Apologies if the status is not clear. I followed the steps required by the compiler team--as requested by @pnkfelix on this thread--and submitted an MCP (linked just above your comment): Implement LLVM-compatible source-based code coverage #278 I am waiting for someone on the compiler team to "second" the proposed feature. There are a variety of opinions on where coverage regions should be calculated, and coverage instrumentation should be integrated, both in the original issue, and in the Zulip thread associated with the above MCP. Some good arguments were made by one participant in the Zulip chat for moving the instrumentation to a MIR-MIR pass, so I am attempting to prototype a MIR->MIR solution (with guidance from @tmandry ). I've also started to implement the LLVM intrinsic integration. The MCP proposal was intentionally written to be open to recommended alternatives per feedback from the compiler team, if feasible. I acknowledged this in the Zulip chat, I said I am willing to try some of the alternatives proposed. But to make this even more clear, I will edit the MCP to refine it based on the recent Zulip chat. (The MCP process does not say if editing the MCP is required, in response to the Zulip discussions, but I want to make sure the compiler team sees the adjustments I agreed to make.) It may be counterproductive to update this PR until the compiler team seconds, reviews, and accepts the MCP. Thanks! |
I finished updating the MCP. Waiting for feedback on the MCP from the compiler team. Thanks! |
Quick update: The MCP was seconded by a member of the compiler team and has entered a 10 day waiting period before final approval (if there are objections). In the mean time, I continue to make changes based on the new approach described in the revised MCP, and will update the PR if there is anything significant to share. Thanks! |
Based on feedback during the MCP process, I'm changing the approach from an AST-based solution to an implementation on MIR. Since the new approach is significantly different, I'm abandoning this PR. Please follow the new changes, starting with PR #73011. |
… r=tmandry first stage of implementing LLVM code coverage This PR replaces rust-lang#70680 (WIP toward LLVM Code Coverage for Rust) since I am re-implementing the Rust LLVM code coverage feature in a different part of the compiler (in MIR pass(es) vs AST). This PR updates rustc with `-Zinstrument-coverage` option that injects the llvm intrinsic `instrprof.increment()` for code generation. This initial version only injects counters at the top of each function, and does not yet implement the required coverage map. Upcoming PRs will add the coverage map, and add more counters and/or counter expressions for each conditional code branch. Rust compiler MCP rust-lang/compiler-team#278 Relevant issue: rust-lang#34701 - Implement support for LLVMs code coverage instrumentation ***[I put together some development notes here, under a separate branch.](https://github.com/richkadel/rust/blob/cfa0b21d34ee64e4ebee226101bd2ef0c6757865/src/test/codegen/coverage-experiments/README-THIS-IS-TEMPORARY.md)***
… r=tmandry first stage of implementing LLVM code coverage This PR replaces rust-lang#70680 (WIP toward LLVM Code Coverage for Rust) since I am re-implementing the Rust LLVM code coverage feature in a different part of the compiler (in MIR pass(es) vs AST). This PR updates rustc with `-Zinstrument-coverage` option that injects the llvm intrinsic `instrprof.increment()` for code generation. This initial version only injects counters at the top of each function, and does not yet implement the required coverage map. Upcoming PRs will add the coverage map, and add more counters and/or counter expressions for each conditional code branch. Rust compiler MCP rust-lang/compiler-team#278 Relevant issue: rust-lang#34701 - Implement support for LLVMs code coverage instrumentation ***[I put together some development notes here, under a separate branch.](https://github.com/richkadel/rust/blob/cfa0b21d34ee64e4ebee226101bd2ef0c6757865/src/test/codegen/coverage-experiments/README-THIS-IS-TEMPORARY.md)***
… r=tmandry first stage of implementing LLVM code coverage This PR replaces rust-lang#70680 (WIP toward LLVM Code Coverage for Rust) since I am re-implementing the Rust LLVM code coverage feature in a different part of the compiler (in MIR pass(es) vs AST). This PR updates rustc with `-Zinstrument-coverage` option that injects the llvm intrinsic `instrprof.increment()` for code generation. This initial version only injects counters at the top of each function, and does not yet implement the required coverage map. Upcoming PRs will add the coverage map, and add more counters and/or counter expressions for each conditional code branch. Rust compiler MCP rust-lang/compiler-team#278 Relevant issue: rust-lang#34701 - Implement support for LLVMs code coverage instrumentation ***[I put together some development notes here, under a separate branch.](https://github.com/richkadel/rust/blob/cfa0b21d34ee64e4ebee226101bd2ef0c6757865/src/test/codegen/coverage-experiments/README-THIS-IS-TEMPORARY.md)***
I work on Google's Fuchsia OS, and was recently asked to implement code coverage for Rust with output to LLVM's coverage tooling.
I've completed a working prototype for instrumenting Rust programs with calls to coverage counters, and generating reports (coverage region map output from the compiler, and coverage counts upon exiting the instrumented executable.
If someone with more experience developing in rustc can give this an early look and send feedback, that would be really helpful.
Apologies in advance for the current lack of tests and limited documentation. I'll be working on this full time for a while, so it will improve rapidly.
Thanks!