JIT: Consolidate layout passes into one phase #112004

amanasifkhalid · 2025-01-30T16:30:13Z

Part of #107749. This replaces our RPO layout and cold code motion phases with a single loop-aware RPO computation (with cold blocks ignored) that we feed into 3-opt as its initial layout. This has the nice property of needing to reorder the block list only once, and never leaving it in a temporarily invalid state (i.e. with EH regions broken up).

My initial plan was for this PR to be zero-diff since I split out all the invariant changes (like not reordering cold/handler blocks at all) into separate PRs, but I found it necessary to implement moving try regions around to avoid regressing layout quality. Suppose we have some initial layout that looks like this:

[BB01 (hot)]
[BB02 (cold)]
[BB03 (hot, try entry)]
[BB04 (hot, try exit)]
[BB05 (hot)]

When reordering the block list, we want to only reorder within try regions to avoid breaking their contiguity. Suppose 3-opt wants the block list to look like BB01 -> BB03 -> BB04 -> BB05 -> BB02. If we only reorder within regions, we cannot place BB03 after BB01, so the layout remains as-is, with the cold block inline. And if BB05 were somewhere else in the initial layout, we wouldn't be able to move it after BB04 because they're in different regions.

We can be a bit clever and remember the last hot block in each region, so that EH region boundaries don't trip us up. However, this only enables better movement within regions, and nested regions end up sinking down the method. We end up with the following layout:

[BB01 (hot)]
[BB05 (hot)]
[BB02 (cold)]
[BB03 (hot, try entry)]
[BB04 (hot, try exit)]

The hot part of the main method body is compact, but the try region is interleaved with cold code. After adding support for try region movement, we can now move try regions up to their ideal successors, when doing so doesn't break EH nesting invariants:

[BB01 (hot)]
[BB03 (hot, try entry)]
[BB04 (hot, try exit)]
[BB05 (hot)]
[BB02 (cold)]

Thus, this change has some churn, but it looks mostly like a PerfScore win.

dotnet-policy-service · 2025-01-30T16:30:48Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Follow-up to #111989. Now that we only run one pass of 3-opt, we can remove some cruft needed to maintain state across 3-opt passes. This is a meek attempt to reduce the size of #112004 by separating out some of the no-diff changes.

amanasifkhalid · 2025-02-07T17:36:57Z

/azp run runtime-coreclr libraries-jitstress, runtime-coreclr libraries-pgo

azure-pipelines · 2025-02-07T17:37:15Z

Azure Pipelines successfully started running 2 pipeline(s).

amanasifkhalid · 2025-02-07T21:58:58Z

/azp run runtime-coreclr libraries-pgo

azure-pipelines · 2025-02-07T21:59:12Z

Azure Pipelines successfully started running 1 pipeline(s).

This reverts commit ce79655.

amanasifkhalid · 2025-02-07T22:33:36Z

/azp run runtime-coreclr libraries-pgo

azure-pipelines · 2025-02-07T22:33:45Z

Azure Pipelines successfully started running 1 pipeline(s).

Splitting this out of #112004 to simplify diff triage. The cost of executing a handler region should be dominated by the runtime overhead of exception handling, so I don't think we're losing anything meaningful in not reordering them. In the case of finally regions, if they are sufficiently hot, the JIT should've copied them into the main method region.

Split off from #112004. Excluding cold blocks from the loop-aware RPO simplifies how we compute the initial layout to feed into 3-opt, as it eliminates the need to manually move cold blocks out-of-line, instead allowing them to sink to the end of the method. This has the consequence of changing -- most likely worsening -- the layout of cold sections. However, our current threshold for "cold" is low enough that I don't think this churn matters: While BB_COLD_WEIGHT is 0.01, we compare this to normalized weights scaled by BB_UNITY_WEIGHT (100), so normalized weights must be below 0.0001 to be considered cold. In other words, profile data must suggest a block executes less than 0.01% (not 1%) of the time to be excluded from reordering.

amanasifkhalid · 2025-02-21T03:50:01Z

/azp run runtime-coreclr libraries-jitstress, runtime-coreclr libraries-pgo

azure-pipelines · 2025-02-21T03:50:17Z

Azure Pipelines successfully started running 2 pipeline(s).

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

amanasifkhalid · 2025-02-21T19:20:28Z

src/coreclr/jit/fgopt.cpp

+        if (!block->hasHndIndex() && (!block->isBBWeightCold(this) || block->IsFirst()))
+        {
+            // Set the block's ordinal.
+            block->bbPreorderNum          = numHotBlocks;


I decided to repurpose bbPreorderNum for the ordinal instead of bbPostorderNum, since we can modify the former during the loop-aware RPO computation without breaking it. I probably should've split this out into a separate PR, though I've created enough review burden with this work as-is.

amanasifkhalid · 2025-02-21T19:23:05Z

src/coreclr/jit/jiteh.cpp

 //
-void Compiler::fgRebuildEHRegions()
+void Compiler::fgFindTryRegionEnds()


This code isn't anything new. Since we no longer need to move cold EH blocks back in line after layout, we can bring back the EH repair logic I introduced in #108634. But now that we don't reorder handler regions, I've simplified this code to only reset try region entries in the main method body.

amanasifkhalid · 2025-02-21T19:26:06Z

src/coreclr/jit/fgopt.cpp

@@ -5345,7 +5232,8 @@ bool Compiler::ThreeOptLayout::CompactHotJumps()
            // If we aren't sure which successor is hotter, and we already fall into one of them,
            // do nothing.
            BasicBlock* const unlikelyTarget = unlikelyEdge->getDestinationBlock();
-            if ((unlikelyEdge->getLikelihood() == 0.5) && (unlikelyTarget->bbPostorderNum == (i + 1)))
+            if ((unlikelyEdge->getLikelihood() == 0.5) && isCandidateBlock(unlikelyTarget) &&
+                (unlikelyTarget->bbPreorderNum == (i + 1)))


I forgot to check if the unlikely target is in the candidate span of blocks before checking its ordinal. This isn't a correctness issue per se, but switching ordinals from bbPostorderNum to bbPreorderNum created churn here.

amanasifkhalid · 2025-02-21T19:28:51Z

src/coreclr/jit/fgopt.cpp

+        // If we moved this region within another region, recompute the try region end blocks.
+        if (parentIndex != EHblkDsc::NO_ENCLOSING_INDEX)
+        {
+            compiler->fgFindTryRegionEnds();


I can probably do something cheaper than potentially iterating the whole main method body here, though this path seems cold enough to not noticeably affect TP diffs. I'll try something in a follow-up and see if it's worth it.

amanasifkhalid · 2025-02-21T19:31:16Z

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs look like a PerfScore improvement overall, except for in coreclr_tests and sometimes libraries_tests. We're getting a little bit of TP back, but I think there's further room for improvement by templating away some of the hot EH-specific checks. Thanks!

amanasifkhalid added 6 commits January 29, 2025 17:26

Reorder block list only once after 3-opt

0a70d2a

Fix block insertion

25389c9

Run 3-opt once for all regions

df25317

Remove some EH checks

46539d5

Remove unused layout methods

84600ec

Relax EH invariants; better stress mode

289134a

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 30, 2025

dotnet-policy-service bot assigned amanasifkhalid Jan 30, 2025

amanasifkhalid mentioned this pull request Jan 30, 2025

JIT: Run 3-opt once across all regions #111989

Merged

Place cold blocks in RPO

94d4f84

build-analysis bot mentioned this pull request Jan 30, 2025

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

amanasifkhalid added 2 commits February 5, 2025 15:57

Merge from main

38b3700

Clean up 3-opt driver a bit

a2774e0

amanasifkhalid mentioned this pull request Feb 5, 2025

JIT: Clean up 3-opt driver logic #112210

Merged

Clean up 3-opt driver a bit

1262dd6

amanasifkhalid added 4 commits February 6, 2025 11:37

Merge from main

918c557

Cold code motion working

1404f80

Move try regions

26359c4

Ensure method entry is considered hot by 3-opt

f062f0b

Always invalidate DFS tree after layout

ce79655

amanasifkhalid added 2 commits February 7, 2025 17:31

Revert "Always invalidate DFS tree after layout"

2520f7a

This reverts commit ce79655.

Fix DFS tree invalidation logic

e54dbb5

amanasifkhalid mentioned this pull request Feb 7, 2025

JIT: Don't reorder handler blocks #112292

Merged

build-analysis bot mentioned this pull request Feb 8, 2025

System.Numerics.Tensors.Tests.ConvertTests.ConvertChecked failing with System.OverflowException #112286

Open

amanasifkhalid mentioned this pull request Feb 11, 2025

JIT: Don't put cold blocks in RPO during layout #112448

Merged

amanasifkhalid mentioned this pull request Feb 20, 2025

JIT: Replace fgMoveHotJumps with 3-opt utility #112016

Merged

amanasifkhalid added 3 commits February 20, 2025 00:32

Merge from main

2f5f342

Style; use bbPreorderNum for ordinal

f08a756

Cleanup

447b2dd

build-analysis bot mentioned this pull request Feb 20, 2025

Intermittent build failure in AfterSourceBuild: "Could not write state file" #76488

Open

amanasifkhalid added 3 commits February 20, 2025 17:01

Fix call-finally motion

6f7ec57

fgFindEHRegionEnds -> fgFindTryRegionEnds

5d32277

Style

52536be

build-analysis bot mentioned this pull request Feb 21, 2025

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

3 tasks

amanasifkhalid added 2 commits February 21, 2025 11:13

Recompute try region ends only when necessary

60303ad

Merge branch 'main' into one-layout-phase

2497d6b

amanasifkhalid marked this pull request as ready for review February 21, 2025 16:13

Copilot bot review requested due to automatic review settings February 21, 2025 16:13

Copilot AI reviewed Feb 21, 2025

View reviewed changes

Merge branch 'main' into one-layout-phase

c3b78a3

amanasifkhalid commented Feb 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Consolidate layout passes into one phase #112004

JIT: Consolidate layout passes into one phase #112004

amanasifkhalid commented Jan 30, 2025 •

edited

Loading

dotnet-policy-service bot commented Jan 30, 2025

amanasifkhalid commented Feb 7, 2025

azure-pipelines bot commented Feb 7, 2025

amanasifkhalid commented Feb 7, 2025

azure-pipelines bot commented Feb 7, 2025

amanasifkhalid commented Feb 7, 2025

azure-pipelines bot commented Feb 7, 2025

amanasifkhalid commented Feb 21, 2025

azure-pipelines bot commented Feb 21, 2025

amanasifkhalid Feb 21, 2025

amanasifkhalid Feb 21, 2025

amanasifkhalid Feb 21, 2025

amanasifkhalid Feb 21, 2025

amanasifkhalid commented Feb 21, 2025

JIT: Consolidate layout passes into one phase #112004

Are you sure you want to change the base?

JIT: Consolidate layout passes into one phase #112004

Conversation

amanasifkhalid commented Jan 30, 2025 • edited Loading

dotnet-policy-service bot commented Jan 30, 2025

amanasifkhalid commented Feb 7, 2025

azure-pipelines bot commented Feb 7, 2025

amanasifkhalid commented Feb 7, 2025

azure-pipelines bot commented Feb 7, 2025

amanasifkhalid commented Feb 7, 2025

azure-pipelines bot commented Feb 7, 2025

amanasifkhalid commented Feb 21, 2025

azure-pipelines bot commented Feb 21, 2025

Choose a reason for hiding this comment

amanasifkhalid Feb 21, 2025

Choose a reason for hiding this comment

amanasifkhalid Feb 21, 2025

Choose a reason for hiding this comment

amanasifkhalid Feb 21, 2025

Choose a reason for hiding this comment

amanasifkhalid Feb 21, 2025

Choose a reason for hiding this comment

amanasifkhalid commented Feb 21, 2025

amanasifkhalid commented Jan 30, 2025 •

edited

Loading