-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-enable improvements to debug emission for optimized code #38894
Conversation
Reverts dotnet#33021, which was a reversion of dotnet#2107. Fix the IG boundary sensitivity for debug emission. Allow debug info to be summarized in a way that makes it amenable to analyze with our jitutils tooling (namely, `jit-analyze`).
Leaving as draft for now, until I can pass the cross-crossgen check. Updated jit-diffs tooling (will PR shortly) shows number of variables reported is strictly greater for the new change:
Number of debug clasuses increases by around 20%. Merging kicks fairly often an in some cases reduces the overall debug volume for a method. As noted in #33189 there are future improvements to merging possible which should result in substantial reductions and (when implemented) we may well end up with less debug volume than we had before we did any of this.
|
Also need to verify no-diffs for debuggable codegen. |
Have at least one failure to debug (linux arm64 crossgen):
|
No diffs in debug codegen. Debug debug gen is different (debug clauses re-ordered), but all methods in SPC have the same number of clauses and same number of variables reported. So semantically "no-diff". Note the prolog/body merging opportunities seen in opt codegen are not as prevalent as most incoming args are in regs and get spilled to the stack. This was measured with |
Example of "improved" optimized debug gen, from clause merging. Also note the debug range for var1 is a bit more extensive, as it can include parts of blocks (and a prolog-body merge opportunity for var 0):
becomes
|
Example of "rearranged" debug debug gen:
becomes
@BruceForstall was asking about the need for those prolog records, and I also wonder whether they are useful, especially here in debug codegen. IL offset 0 is at 37 here, so presumably a breakpoint at the open brace would used the body ranges, and the prolog ranges might only really be valid early on in the prolog and go stale part way through. Need to investigate this as a follow-up. Apparently at one time we generated fully accurate prolog debug, see the various runtime/src/coreclr/src/jit/scopeinfo.cpp Lines 1669 to 1682 in 239b0c4
|
The "rearranged" ranges seems potentially problematic depending on how the data is laid out and how the debugger looks for it: does it binary search for range starts, for example? i.e., does it need to be sorted by range start address? @cshung Looked into the output pre-/post- this new format, so might have some comments here. |
I believe the variable debug info is simply accessed through a linear search, so as long as the ranges do not overlap, the order should not matter. The runtime/src/coreclr/src/debug/di/module.cpp Lines 3678 to 3754 in 97e553f
@dotnet/dotnet-diag |
For the assert, seems like we're doing two updates to a live range.
First update happens with the first liveness change (via Seems like perhaps the update in |
Deferring the update when spilling allows crossgen to get past the point of failure above, but it nows hit a similar looking issue later on in the same method. Not sure I completely understand what we're trying to do here:
The So the variable life tracker honors the spill, and leaves an open live range. Then V10 is redefined by the call, and we go to start a new live range, but there's already a live range open from the spill, and we assert. Codegen is:
@CarolEidt this looks like a dead spill to me....? This is on Linux arm64, crossgenning |
The arm64 ABI guarantees the low half of some of the floating point registers are caller preserved, but the callee must preserve the upper half. This looks like code generated for that purpose. |
@sdmaclea thanks -- the upper restore before the call makes sense, but not the spill/kill sequence; seems like
|
I'd have to look in more detail at this, but I would guess that there's some interaction between identifying live vectors at the call (that need to be partially saved) and vectors that are passed into the call (and therefore, in some sense, live at the call) but are last use. It would seem that there must be some other subtlety here, otherwise this would seem like it would be a more common issue, but I can't venture a guess at what that might be. |
Ok, here's another oddity -- doesn't explain what goes wrong above, but does suggest why this method may be a fairly stressful case. runtime/src/libraries/System.Private.CoreLib/src/System/Numerics/Matrix4x4.cs Lines 1329 to 1343 in 6072e4d
We really should not be crossgenning an
Currently, failed "does not return" inline methods don't result in control flow dead ends . So these all look like normal calls in the root method that might return. As a result we have a ton of call-crossing vector lifetimes and are giving the upper save/restore logic a good workout. I don't know yet if crossgenning this method is an artifact of running cross crossgen (that is, I am running a crossgen hosted on x64 here) or a "feature". I wonder if the |
Yes, I believe it should. There are a few requirements/assumptions around checks that exist in S.P.Corelib, one of which I believe is that the method using a relevant intrinsic must also do the IsSupported check. I thought there might have been an exception for the baseline ISAs, but I might be misremembering. CC. @davidwrighton, do we have the rules documented somewhere? |
@CarolEidt I definitely see what look like dead spills, eg cases in I can save off a jit dump if you like... (edit:...trying, but getting timeouts creating a gist) |
The asserts we see here are essentially ones where the live range tracker sees two consecutive "start" or "end" events for some variable and so is not sure how to properly update live ranges. Seems like in general we may want to decouple the debug tracking from a strict reliance on getting consistent liveness updates; there's no reason why a dead spill (even if suboptimal) should lead to a debug assert. On the other hand we expect these cases of inconsistent liveness to be quite rare and should not be willing to accept them as a matter of routine. Also seems likely that the debug gen will need re-examination for multi-reg variables, and perhaps also for eh write thru; such variables can have multiple/duplicate locations. |
Here are the rules for HWIntrinsics and S.P.Corelib:
The rule is as I remembered:
However, maybe this is incorrect when viewing ARM64 vs SSE/SSE2? |
This statement is a bit confusing. It is not clear how that could be accomplished except for intrinsics that are enabled always on all platforms. |
Agree, it seems like any platform-dependent intrinsic use must always be guarded by an I did a quick scan through the code and this method is the only one I found not following the rules. But my scan wasn't comprehensive. |
@tannergooding @AndyAyersMS The rules as written are around correctness and I don't believe that one violates the rules I meant to write, as any platform that might implement SSE is also guaranteed to implement SSE. However, as you say it doesn't make all that much sense to compile the SSE special helper on Arm64, and it would likely be a bit cleaner if the SseImpl function had its own IsSupported check. OTOH, it seems to have found a lovely little corner case bug in the jit, so... yay? |
Though this method is only ever callable by code that's already done the right IsSupported check, add a redundant check to the method itself, so that when crossgenning SPC on non-xarch platforms we won't try and compile the xarch specific parts of this method. This should save some time and a bit of file size for non-xarch SPCs. See notes on dotnet#38894 for context.
Going to hold off on this until after .NET 5. |
Though this method is only ever callable by code that's already done the right IsSupported check, add a redundant check to the method itself, so that when crossgenning SPC on non-xarch platforms we won't try and compile the xarch specific parts of this method. This should save some time and a bit of file size for non-xarch SPCs. See notes on #38894 for context.
Though this method is only ever callable by code that's already done the right IsSupported check, add a redundant check to the method itself, so that when crossgenning SPC on non-xarch platforms we won't try and compile the xarch specific parts of this method. This should save some time and a bit of file size for non-xarch SPCs. See notes on dotnet#38894 for context.
Reverts #33021, which was a reversion of #2107.
Fix the IG boundary sensitivity for debug emission.
Allow debug info to be summarized in a way that makes it amenable to analyze
with our jitutils tooling (namely,
jit-analyze
).