Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Consolidate layout passes into one phase #112004

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

amanasifkhalid
Copy link
Member

@amanasifkhalid amanasifkhalid commented Jan 30, 2025

Part of #107749. This replaces our RPO layout and cold code motion phases with a single loop-aware RPO computation (with cold blocks ignored) that we feed into 3-opt as its initial layout. This has the nice property of needing to reorder the block list only once, and never leaving it in a temporarily invalid state (i.e. with EH regions broken up).

My initial plan was for this PR to be zero-diff since I split out all the invariant changes (like not reordering cold/handler blocks at all) into separate PRs, but I found it necessary to implement moving try regions around to avoid regressing layout quality. Suppose we have some initial layout that looks like this:

[BB01 (hot)]
[BB02 (cold)]
[BB03 (hot, try entry)]
[BB04 (hot, try exit)]
[BB05 (hot)]

When reordering the block list, we want to only reorder within try regions to avoid breaking their contiguity. Suppose 3-opt wants the block list to look like BB01 -> BB03 -> BB04 -> BB05 -> BB02. If we only reorder within regions, we cannot place BB03 after BB01, so the layout remains as-is, with the cold block inline. And if BB05 were somewhere else in the initial layout, we wouldn't be able to move it after BB04 because they're in different regions.

We can be a bit clever and remember the last hot block in each region, so that EH region boundaries don't trip us up. However, this only enables better movement within regions, and nested regions end up sinking down the method. We end up with the following layout:

[BB01 (hot)]
[BB05 (hot)]
[BB02 (cold)]
[BB03 (hot, try entry)]
[BB04 (hot, try exit)]

The hot part of the main method body is compact, but the try region is interleaved with cold code. After adding support for try region movement, we can now move try regions up to their ideal successors, when doing so doesn't break EH nesting invariants:

[BB01 (hot)]
[BB03 (hot, try entry)]
[BB04 (hot, try exit)]
[BB05 (hot)]
[BB02 (cold)]

Thus, this change has some churn, but it looks mostly like a PerfScore win.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 30, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

amanasifkhalid added a commit that referenced this pull request Feb 6, 2025
Follow-up to #111989. Now that we only run one pass of 3-opt, we can remove some cruft needed to maintain state across 3-opt passes. This is a meek attempt to reduce the size of #112004 by separating out some of the no-diff changes.
@amanasifkhalid
Copy link
Member Author

/azp run runtime-coreclr libraries-jitstress, runtime-coreclr libraries-pgo

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@amanasifkhalid
Copy link
Member Author

/azp run runtime-coreclr libraries-pgo

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@amanasifkhalid
Copy link
Member Author

/azp run runtime-coreclr libraries-pgo

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

amanasifkhalid added a commit that referenced this pull request Feb 10, 2025
Splitting this out of #112004 to simplify diff triage. The cost of executing a handler region should be dominated by the runtime overhead of exception handling, so I don't think we're losing anything meaningful in not reordering them. In the case of finally regions, if they are sufficiently hot, the JIT should've copied them into the main method region.
amanasifkhalid added a commit that referenced this pull request Feb 14, 2025
Split off from #112004. Excluding cold blocks from the loop-aware RPO simplifies how we compute the initial layout to feed into 3-opt, as it eliminates the need to manually move cold blocks out-of-line, instead allowing them to sink to the end of the method. This has the consequence of changing -- most likely worsening -- the layout of cold sections. However, our current threshold for "cold" is low enough that I don't think this churn matters: While BB_COLD_WEIGHT is 0.01, we compare this to normalized weights scaled by BB_UNITY_WEIGHT (100), so normalized weights must be below 0.0001 to be considered cold. In other words, profile data must suggest a block executes less than 0.01% (not 1%) of the time to be excluded from reordering.
@amanasifkhalid
Copy link
Member Author

/azp run runtime-coreclr libraries-jitstress, runtime-coreclr libraries-pgo

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@amanasifkhalid amanasifkhalid marked this pull request as ready for review February 21, 2025 16:13
@Copilot Copilot bot review requested due to automatic review settings February 21, 2025 16:13

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

if (!block->hasHndIndex() && (!block->isBBWeightCold(this) || block->IsFirst()))
{
// Set the block's ordinal.
block->bbPreorderNum = numHotBlocks;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to repurpose bbPreorderNum for the ordinal instead of bbPostorderNum, since we can modify the former during the loop-aware RPO computation without breaking it. I probably should've split this out into a separate PR, though I've created enough review burden with this work as-is.

//
void Compiler::fgRebuildEHRegions()
void Compiler::fgFindTryRegionEnds()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code isn't anything new. Since we no longer need to move cold EH blocks back in line after layout, we can bring back the EH repair logic I introduced in #108634. But now that we don't reorder handler regions, I've simplified this code to only reset try region entries in the main method body.

@@ -5345,7 +5232,8 @@ bool Compiler::ThreeOptLayout::CompactHotJumps()
// If we aren't sure which successor is hotter, and we already fall into one of them,
// do nothing.
BasicBlock* const unlikelyTarget = unlikelyEdge->getDestinationBlock();
if ((unlikelyEdge->getLikelihood() == 0.5) && (unlikelyTarget->bbPostorderNum == (i + 1)))
if ((unlikelyEdge->getLikelihood() == 0.5) && isCandidateBlock(unlikelyTarget) &&
(unlikelyTarget->bbPreorderNum == (i + 1)))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to check if the unlikely target is in the candidate span of blocks before checking its ordinal. This isn't a correctness issue per se, but switching ordinals from bbPostorderNum to bbPreorderNum created churn here.

// If we moved this region within another region, recompute the try region end blocks.
if (parentIndex != EHblkDsc::NO_ENCLOSING_INDEX)
{
compiler->fgFindTryRegionEnds();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can probably do something cheaper than potentially iterating the whole main method body here, though this path seems cold enough to not noticeably affect TP diffs. I'll try something in a follow-up and see if it's worth it.

@amanasifkhalid
Copy link
Member Author

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs look like a PerfScore improvement overall, except for in coreclr_tests and sometimes libraries_tests. We're getting a little bit of TP back, but I think there's further room for improvement by templating away some of the hot EH-specific checks. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant