-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Consolidate layout passes into one phase #112004
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
/azp run runtime-coreclr libraries-jitstress, runtime-coreclr libraries-pgo |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run runtime-coreclr libraries-pgo |
Azure Pipelines successfully started running 1 pipeline(s). |
This reverts commit ce79655.
/azp run runtime-coreclr libraries-pgo |
Azure Pipelines successfully started running 1 pipeline(s). |
Splitting this out of #112004 to simplify diff triage. The cost of executing a handler region should be dominated by the runtime overhead of exception handling, so I don't think we're losing anything meaningful in not reordering them. In the case of finally regions, if they are sufficiently hot, the JIT should've copied them into the main method region.
Split off from #112004. Excluding cold blocks from the loop-aware RPO simplifies how we compute the initial layout to feed into 3-opt, as it eliminates the need to manually move cold blocks out-of-line, instead allowing them to sink to the end of the method. This has the consequence of changing -- most likely worsening -- the layout of cold sections. However, our current threshold for "cold" is low enough that I don't think this churn matters: While BB_COLD_WEIGHT is 0.01, we compare this to normalized weights scaled by BB_UNITY_WEIGHT (100), so normalized weights must be below 0.0001 to be considered cold. In other words, profile data must suggest a block executes less than 0.01% (not 1%) of the time to be excluded from reordering.
/azp run runtime-coreclr libraries-jitstress, runtime-coreclr libraries-pgo |
Azure Pipelines successfully started running 2 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
if (!block->hasHndIndex() && (!block->isBBWeightCold(this) || block->IsFirst())) | ||
{ | ||
// Set the block's ordinal. | ||
block->bbPreorderNum = numHotBlocks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to repurpose bbPreorderNum
for the ordinal instead of bbPostorderNum
, since we can modify the former during the loop-aware RPO computation without breaking it. I probably should've split this out into a separate PR, though I've created enough review burden with this work as-is.
// | ||
void Compiler::fgRebuildEHRegions() | ||
void Compiler::fgFindTryRegionEnds() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code isn't anything new. Since we no longer need to move cold EH blocks back in line after layout, we can bring back the EH repair logic I introduced in #108634. But now that we don't reorder handler regions, I've simplified this code to only reset try region entries in the main method body.
@@ -5345,7 +5232,8 @@ bool Compiler::ThreeOptLayout::CompactHotJumps() | |||
// If we aren't sure which successor is hotter, and we already fall into one of them, | |||
// do nothing. | |||
BasicBlock* const unlikelyTarget = unlikelyEdge->getDestinationBlock(); | |||
if ((unlikelyEdge->getLikelihood() == 0.5) && (unlikelyTarget->bbPostorderNum == (i + 1))) | |||
if ((unlikelyEdge->getLikelihood() == 0.5) && isCandidateBlock(unlikelyTarget) && | |||
(unlikelyTarget->bbPreorderNum == (i + 1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to check if the unlikely target is in the candidate span of blocks before checking its ordinal. This isn't a correctness issue per se, but switching ordinals from bbPostorderNum
to bbPreorderNum
created churn here.
// If we moved this region within another region, recompute the try region end blocks. | ||
if (parentIndex != EHblkDsc::NO_ENCLOSING_INDEX) | ||
{ | ||
compiler->fgFindTryRegionEnds(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can probably do something cheaper than potentially iterating the whole main method body here, though this path seems cold enough to not noticeably affect TP diffs. I'll try something in a follow-up and see if it's worth it.
cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs look like a PerfScore improvement overall, except for in |
Part of #107749. This replaces our RPO layout and cold code motion phases with a single loop-aware RPO computation (with cold blocks ignored) that we feed into 3-opt as its initial layout. This has the nice property of needing to reorder the block list only once, and never leaving it in a temporarily invalid state (i.e. with EH regions broken up).
My initial plan was for this PR to be zero-diff since I split out all the invariant changes (like not reordering cold/handler blocks at all) into separate PRs, but I found it necessary to implement moving try regions around to avoid regressing layout quality. Suppose we have some initial layout that looks like this:
When reordering the block list, we want to only reorder within try regions to avoid breaking their contiguity. Suppose 3-opt wants the block list to look like
BB01 -> BB03 -> BB04 -> BB05 -> BB02
. If we only reorder within regions, we cannot placeBB03
afterBB01
, so the layout remains as-is, with the cold block inline. And ifBB05
were somewhere else in the initial layout, we wouldn't be able to move it afterBB04
because they're in different regions.We can be a bit clever and remember the last hot block in each region, so that EH region boundaries don't trip us up. However, this only enables better movement within regions, and nested regions end up sinking down the method. We end up with the following layout:
The hot part of the main method body is compact, but the try region is interleaved with cold code. After adding support for try region movement, we can now move try regions up to their ideal successors, when doing so doesn't break EH nesting invariants:
Thus, this change has some churn, but it looks mostly like a PerfScore win.