Change the default loop unroll limit to 4 #80353

tannergooding · 2023-01-09T02:40:37Z

Today, the JIT defaults to 1 unless the upper limit is a Vector###<T>.Count property in which case it will try to unroll provided the method body isn't "too large". This means that devs can write a slightly more obscure pattern to get loop unrolling if truly desired, but the more natural pattern will not light up.

The AMD and Intel software optimization manuals all recommend unrolling small loops. However, the limits and rules for when to unroll differs between them:

Intel recommends 16 or fewer, with a divisor based on the number of conditional branches it contains
- Intel has some other consideration such as fitting in the instruction cache and to consider partial unrolling so that the loop overhead accounts for less than 10% of the execution time
AMD recommends unrolling for 10 or fewer iterations
- AMD likewise has other considerations such as when the loop body is fewer than 10 instructions, factoring in number of branches, and number of micro-ops (ideally less than 40)
Arm64 has a software optimization guide but doesn't go in depth on loop unrolling
- Arm64 does have a small section on optimizing memcpy via unrolling

This PR updates the default for the JIT to 4. The reason 4 is chosen is because it gives some improvements to various functions and code paths without significantly increasing code size. Likewise, it has some level of significance to both the JIT and common code users will write in perf sensitive scenarios:

It represents the maximum number of fields we will consider for promotion today
It represents the typical number of arguments that will be passed in register across all ABIs
It represents the number of fields in a Vector4 (4x T) or an optimized Matrix4x4 (4x Vector4<T>) which are heavily used in graphics and UI stacks
It represents the number of 32-bit values in a Vector128<T> (this is the common SIMD type and the size natural size .NET uses for indexing and most other "sizes" in managed)

I ran perf numbers for the benchmarks that had PMI diffs and we have a few that are unchanged and a few that are faster. I expect this may require some tweaking based on what we see in the actual perf runs, but we are early in the cycle and have plenty of time to react or back out the change if necessary.

// * Summary *

BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22621.963)
AMD Ryzen 9 7950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=8.0.100-alpha.1.23056.8
  [Host]     : .NET 8.0.0 (8.0.23.5503), X64 RyuJIT AVX2
  Job-OFYLOX : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-UUJPBY : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250.0000 ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1

Method	Job	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Allocated	Alloc Ratio
BenchEmFloat	Job-GJDPWC	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	2,023.0 ms	10.83 ms	9.04 ms	2,018.8 ms	2,016.0 ms	2,046.0 ms	1.00	6000.0000	5000.0000	99.33 MB
BenchEmFloat	Job-CPGMPI	\runtime_base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	2,030.8 ms	6.68 ms	6.24 ms	2,028.0 ms	2,025.2 ms	2,042.0 ms	1.00	6000.0000	5000.0000	99.33 MB

BenchEmFloatClass	Job-GJDPWC	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	406.8 ms	1.17 ms	0.98 ms	407.0 ms	404.7 ms	408.1 ms	1.00	2000.0000	1000.0000	34.39 MB
BenchEmFloatClass	Job-CPGMPI	\runtime_base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	408.7 ms	1.39 ms	1.23 ms	408.7 ms	406.7 ms	410.8 ms	1.00	2000.0000	1000.0000	34.39 MB
KNucleotide_1	Job-HPETII	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128.1 ms	1.33 ms	1.24 ms	127.8 ms	126.4 ms	130.2 ms	0.99	0.03	1000.0000	1000.0000
KNucleotide_1	Job-XXUDEF	\runtime_base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	129.6 ms	2.53 ms	2.92 ms	129.1 ms	126.2 ms	135.8 ms	1.00	0.00	1000.0000	1000.0000
LLoops	Job-ZCIXOZ	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	321.8 ms	2.17 ms	1.92 ms	322.7 ms	318.3 ms	325.3 ms	0.96	3.41 MB	1.00
LLoops	Job-NHLETH	\runtime_base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	336.1 ms	1.62 ms	1.43 ms	335.5 ms	334.4 ms	339.0 ms	1.00	3.41 MB	1.00
MDLLoops	Job-OFYLOX	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	440.2 ms	1.99 ms	1.76 ms	440.0 ms	437.0 ms	443.2 ms	1.01	3.39 MB	1.00
MDLLoops	Job-UUJPBY	\runtime_base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	434.4 ms	2.81 ms	2.49 ms	434.5 ms	431.2 ms	438.7 ms	1.00	3.39 MB	1.00
MDPuzzle	Job-OFYLOX	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	227.8 ms	3.69 ms	3.45 ms	228.5 ms	219.6 ms	234.1 ms	0.99	0.02	-	NA
MDPuzzle	Job-UUJPBY	\runtime_base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	229.3 ms	1.49 ms	1.32 ms	229.4 ms	226.9 ms	231.4 ms	1.00	0.00	-	NA
Puzzle	Job-MXGOER	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	225.1 ms	3.02 ms	2.83 ms	223.7 ms	221.8 ms	230.9 ms	0.99	0.02	-	NA
Puzzle	Job-CTVVXD	\runtime_base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	225.8 ms	4.54 ms	5.23 ms	227.9 ms	217.7 ms	233.7 ms	1.00	0.00	-	NA
Richards	Job-ZUZZOX	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	45.63 us	0.778 us	0.728 us	45.60 us	44.53 us	47.00 us	0.99	0.02	1.34 KB	1.00
Richards	Job-KCTOAU	\runtime_base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	46.18 us	0.584 us	0.546 us	46.25 us	45.30 us	47.34 us	1.00	0.00	1.34 KB	1.00

ghost · 2023-01-09T02:40:48Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Today, the JIT defaults to 1 unless the upper limit is a Vector###<T>.Count property in which case it will try to unroll provided the method body isn't "too large". This means that devs can write a slightly more obscure pattern to get loop unrolling if truly desired, but the more natural pattern will not light up.

The AMD and Intel software optimization manuals all recommend unrolling small loops. However, the limits and rules for when to unroll differs between them:

Intel recommends 16 or fewer, with a divisor based on the number of conditional branches it contains
- Intel has some other consideration such as fitting in the instruction cache and to consider partial unrolling so that the loop overhead accounts for less than 10% of the execution time
AMD recommends unrolling for 10 or fewer iterations
- AMD likewise has other considerations such as when the loop body is fewer than 10 instructions, factoring in number of branches, and number of micro-ops (ideally less than 40)
Arm64 has a software optimization guide but doesn't go in depth on loop unrolling
- Arm64 does have a small section on optimizing memcpy via unrolling

This PR updates the default for the JIT to 4. The reason 4 is chosen is because it gives some improvements to various functions and code paths without significantly increasing code size. Likewise, it has some level of significance to both the JIT and common code users will write in perf sensitive scenarios:

It represents the maximum number of fields we will consider for promotion today
It represents the typical number of arguments that will be passed in register across all ABIs
It represents the number of fields in a Vector4 (4x T) or an optimized Matrix4x4 (4x Vector4<T>) which are heavily used in graphics and UI stacks
It represents the number of 32-bit values in a Vector128<T> (this is the common SIMD type and the size natural size .NET uses for indexing and most other "sizes" in managed)

Author:	tannergooding
Assignees:	tannergooding
Labels:	`area-CodeGen-coreclr`
Milestone:	-

tannergooding · 2023-01-09T02:41:52Z

For reference, here are some pmi diffs for frameworks and benchmarks comparing 2, 4, 6, and 8. I did not include 16 or 32 (the other two sizes the JIT will currently unroll given Vector###<T>.Count) as they are larger than the manuals typically recommend.

DEFAULT_UNROLL_LOOP_MAX_ITERATION_COUNT: 2

Frameworks

Found 279 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 59982060
Total bytes of diff: 59982829
Total bytes of delta: 769 (0.00 % of base)
Total relative delta: NaN
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
         529 : System.Security.Cryptography.dasm (0.06% of base)
         132 : System.Private.Xml.dasm (0.00% of base)
          84 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.00% of base)
          68 : System.IO.Hashing.dasm (0.19% of base)

Top file improvements (bytes):
         -44 : System.Runtime.InteropServices.dasm (-1.31% of base)

5 total files with Code Size differences (1 improved, 4 regressed), 269 unchanged.

Top method regressions (bytes):
         320 (43.18% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:EncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],int,ulong,bool):ubyte[]:this
         167 (46.65% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:TryEncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,bool,byref):bool
         132 (61.40% of base) : System.Private.Xml.dasm - System.Xml.Schema.SchemaInfo:Finish():this
          67 (23.26% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.TraceEvent:XmlAttribHex(System.Text.StringBuilder,System.String,ulong):System.Text.StringBuilder
          42 (16.47% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.CngCommon:TrySignHash(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,byref):bool
          34 (39.08% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Accumulate512(ulong,ulong,ulong)
          34 (39.08% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Accumulate512Inlined(ulong,ulong,ulong)
          22 (17.74% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC:EnsureBGCRevisitInfoAlloc():this

Top method improvements (bytes):
         -22 (-5.47% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Add():this
         -22 (-6.32% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Remove():this
          -3 (-0.32% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Etlx.TraceLog+<>c__DisplayClass84_0:<SetupCallbacks>b__9(Microsoft.Diagnostics.Tracing.Parsers.Kernel.ImageLoadTraceData):this
          -2 (-3.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileVersion():System.String:this

Top method regressions (percentages):
         132 (61.40% of base) : System.Private.Xml.dasm - System.Xml.Schema.SchemaInfo:Finish():this
         167 (46.65% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:TryEncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,bool,byref):bool
         320 (43.18% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:EncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],int,ulong,bool):ubyte[]:this
          34 (39.08% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Accumulate512(ulong,ulong,ulong)
          34 (39.08% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Accumulate512Inlined(ulong,ulong,ulong)
          67 (23.26% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.TraceEvent:XmlAttribHex(System.Text.StringBuilder,System.String,ulong):System.Text.StringBuilder
          22 (17.74% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC:EnsureBGCRevisitInfoAlloc():this
          42 (16.47% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.CngCommon:TrySignHash(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,byref):bool

Top method improvements (percentages):
         -22 (-6.32% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Remove():this
         -22 (-5.47% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Add():this
          -2 (-3.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileVersion():System.String:this
          -3 (-0.32% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Etlx.TraceLog+<>c__DisplayClass84_0:<SetupCallbacks>b__9(Microsoft.Diagnostics.Tracing.Parsers.Kernel.ImageLoadTraceData):this

12 total methods with Code Size differences (4 improved, 8 regressed), 374703 unchanged.

Benchmarks

Found 103 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 503196
Total bytes of diff: 505161
Total bytes of delta: 1965 (0.39 % of base)
Total relative delta: 0.95
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
        1191 : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm (38.71% of base)
         548 : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm (4.21% of base)
         187 : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm (4.64% of base)
          20 : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm (0.13% of base)
          19 : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm (0.48% of base)

5 total files with Code Size differences (0 improved, 5 regressed), 93 unchanged.

Top method regressions (bytes):
        1191 (60.95% of base) : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm - Benchstone.MDBenchI.MDPuzzle:DoIt():bool:this
         304 ( 3.04% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Main1(int):this
         244 (16.20% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Init():this
         187 (11.18% of base) : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          20 ( 1.51% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this
          19 ( 2.43% of base) : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm - BenchmarksGame.KNucleotide_1:Bench(System.IO.Stream,BenchmarksGame.TestHarnessHelpers,bool):bool

Top method regressions (percentages):
        1191 (60.95% of base) : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm - Benchstone.MDBenchI.MDPuzzle:DoIt():bool:this
         244 (16.20% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Init():this
         187 (11.18% of base) : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
         304 ( 3.04% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Main1(int):this
          19 ( 2.43% of base) : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm - BenchmarksGame.KNucleotide_1:Bench(System.IO.Stream,BenchmarksGame.TestHarnessHelpers,bool):bool
          20 ( 1.51% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this

6 total methods with Code Size differences (0 improved, 6 regressed), 2957 unchanged.

DEFAULT_UNROLL_LOOP_MAX_ITERATION_COUNT: 4

Frameworks

Found 289 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 59982060
Total bytes of diff: 59988291
Total bytes of delta: 6231 (0.01 % of base)
Total relative delta: NaN
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
        3338 : System.Numerics.Tensors.dasm (1.11% of base)
         560 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.01% of base)
         529 : System.Security.Cryptography.dasm (0.06% of base)
         418 : System.DirectoryServices.dasm (0.09% of base)
         311 : System.Private.Xml.dasm (0.01% of base)
         185 : System.Runtime.Numerics.dasm (0.15% of base)
         169 : xunit.runner.reporters.netcoreapp10.dasm (0.29% of base)
         169 : xunit.runner.utility.netcoreapp10.dasm (0.09% of base)
         169 : xunit.console.dasm (0.17% of base)
         156 : System.Composition.Hosting.dasm (0.18% of base)
         136 : System.Private.CoreLib.dasm (0.00% of base)
          68 : System.IO.Hashing.dasm (0.19% of base)
          57 : System.Net.Security.dasm (0.03% of base)
          10 : System.Private.DataContractSerialization.dasm (0.00% of base)

Top file improvements (bytes):
         -44 : System.Runtime.InteropServices.dasm (-1.31% of base)

15 total files with Code Size differences (1 improved, 14 regressed), 259 unchanged.

Top method regressions (bytes):
         418 (23.87% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[int]:GetArrayString(bool):System.String:this
         418 (23.84% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[long]:GetArrayString(bool):System.String:this
         418 (23.83% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[short]:GetArrayString(bool):System.String:this
         418 (24.18% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.__Canon]:GetArrayString(bool):System.String:this
         418 (23.83% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.Nullable`1[int]]:GetArrayString(bool):System.String:this
         418 (23.47% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.Numerics.Vector`1[float]]:GetArrayString(bool):System.String:this
         418 (23.16% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[ubyte]:GetArrayString(bool):System.String:this
         412 (23.00% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[double]:GetArrayString(bool):System.String:this
         320 (43.18% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:EncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],int,ulong,bool):ubyte[]:this
         214 (81.68% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:set_RawSchedule(bool[,,]):this
         204 (87.18% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:get_RawSchedule():bool[,,]:this
         185 (14.26% of base) : System.Runtime.Numerics.dasm - System.Numerics.BigInteger:.ctor(System.ReadOnlySpan`1[ubyte],bool,bool):this
         169 (12.36% of base) : xunit.runner.reporters.netcoreapp10.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         169 (12.36% of base) : xunit.runner.utility.netcoreapp10.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         169 (12.36% of base) : xunit.console.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         167 (46.65% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:TryEncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,bool,byref):bool
         132 (61.40% of base) : System.Private.Xml.dasm - System.Xml.Schema.SchemaInfo:Finish():this
         101 (19.88% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC:OnEnd(Microsoft.Diagnostics.Tracing.Analysis.TraceGarbageCollector):this
         100 (101.01% of base) : System.Private.Xml.dasm - System.Xml.Xsl.Xslt.Compiler:MergeWhitespaceRules(System.Xml.Xsl.Xslt.Stylesheet):this
          96 (19.01% of base) : System.Composition.Hosting.dasm - System.Composition.Hosting.Util.SmallSparseInitonlyArray:Add(int,System.Object):this

Top method improvements (bytes):
         -22 (-5.47% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Add():this
         -22 (-6.32% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Remove():this
          -3 (-0.32% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Etlx.TraceLog+<>c__DisplayClass84_0:<SetupCallbacks>b__9(Microsoft.Diagnostics.Tracing.Parsers.Kernel.ImageLoadTraceData):this
          -2 (-3.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileVersion():System.String:this
          -1 (-0.13% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.GCCondemnedReasons:Decode(int):this

Top method regressions (percentages):
          62 (140.91% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.BPerfEventSource:DecodeMod(byref):uint
         100 (101.01% of base) : System.Private.Xml.dasm - System.Xml.Xsl.Xslt.Compiler:MergeWhitespaceRules(System.Xml.Xsl.Xslt.Stylesheet):this
         204 (87.18% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:get_RawSchedule():bool[,,]:this
         214 (81.68% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:set_RawSchedule(bool[,,]):this
          55 (76.39% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC:GetHeapSizeBeforeMB(System.Collections.Generic.List`1[Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC],Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC):double
         132 (61.40% of base) : System.Private.Xml.dasm - System.Xml.Schema.SchemaInfo:Finish():this
         167 (46.65% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:TryEncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,bool,byref):bool
         320 (43.18% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:EncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],int,ulong,bool):ubyte[]:this
          20 (39.22% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_VerLanguage():System.String:this
          79 (39.11% of base) : System.Private.Xml.dasm - System.Xml.Xsl.XsltOld.Compiler:AddScript(int,System.String):this
          34 (39.08% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Accumulate512(ulong,ulong,ulong)
          34 (39.08% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Accumulate512Inlined(ulong,ulong,ulong)
          60 (35.71% of base) : System.Composition.Hosting.dasm - System.Composition.Hosting.Util.SmallSparseInitonlyArray:TryGetValue(int,byref):bool:this
          62 (25.51% of base) : System.Private.CoreLib.dasm - System.Buffers.Text.Utf8Parser:TryParseUInt16X(System.ReadOnlySpan`1[ubyte],byref,byref):bool
          36 (24.83% of base) : System.Private.CoreLib.dasm - System.IO.BinaryReader:Read7BitEncodedInt():int:this
         418 (24.18% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.__Canon]:GetArrayString(bool):System.String:this
         418 (23.87% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[int]:GetArrayString(bool):System.String:this
         418 (23.84% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[long]:GetArrayString(bool):System.String:this
         418 (23.83% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[short]:GetArrayString(bool):System.String:this
         418 (23.83% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.Nullable`1[int]]:GetArrayString(bool):System.String:this

Top method improvements (percentages):
         -22 (-6.32% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Remove():this
         -22 (-5.47% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Add():this
          -2 (-3.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileVersion():System.String:this
          -3 (-0.32% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Etlx.TraceLog+<>c__DisplayClass84_0:<SetupCallbacks>b__9(Microsoft.Diagnostics.Tracing.Parsers.Kernel.ImageLoadTraceData):this
          -1 (-0.13% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.GCCondemnedReasons:Decode(int):this

45 total methods with Code Size differences (5 improved, 40 regressed), 374670 unchanged.

Benchmarks

Found 105 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 503196
Total bytes of diff: 508557
Total bytes of delta: 5361 (1.07 % of base)
Total relative delta: 7.36
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
        3160 : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm (102.70% of base)
         839 : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm (6.44% of base)
         653 : Bytemark\Bytemark\Bytemark.dasm (0.89% of base)
         353 : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm (8.76% of base)
         224 : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm (1.51% of base)
         113 : V8\Richards\Richards\Richards.dasm (2.19% of base)
          19 : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm (0.48% of base)

7 total files with Code Size differences (0 improved, 7 regressed), 91 unchanged.

Top method regressions (bytes):
        3160 (161.72% of base) : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm - Benchstone.MDBenchI.MDPuzzle:DoIt():bool:this
         456 (30.28% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Init():this
         383 ( 3.83% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Main1(int):this
         353 (21.11% of base) : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
         169 (12.75% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this
         116 (13.21% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:DivideInternalFPF(byref,byref,byref)
         113 (81.29% of base) : V8\Richards\Richards\Richards.dasm - V8.Richards.WorkerTask:run(V8.Richards.Packet):V8.Richards.TaskControlBlock:this
          95 (10.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:DivideInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          76 (86.36% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:memmove(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          71 (85.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:memmove(byref,byref)
          55 ( 0.66% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          45 ( 5.85% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:MultiplyInternalFPF(byref,byref,byref)
          41 ( 5.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:MultiplyInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          31 (21.23% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:LongToInternalFPF(int,EMFloatClass+InternalFPF)
          29 (19.73% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:LongToInternalFPF(int,byref)
          27 (17.88% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:StickyShiftRightMant(byref,int)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFInfinity(EMFloatClass+InternalFPF,ubyte)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFZero(EMFloatClass+InternalFPF,ubyte)
          21 (13.64% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:StickyShiftRightMant(EMFloatClass+InternalFPF,int)
          19 ( 2.43% of base) : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm - BenchmarksGame.KNucleotide_1:Bench(System.IO.Stream,BenchmarksGame.TestHarnessHelpers,bool):bool

Top method regressions (percentages):
        3160 (161.72% of base) : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm - Benchstone.MDBenchI.MDPuzzle:DoIt():bool:this
          76 (86.36% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:memmove(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          71 (85.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:memmove(byref,byref)
         113 (81.29% of base) : V8\Richards\Richards\Richards.dasm - V8.Richards.WorkerTask:run(V8.Richards.Packet):V8.Richards.TaskControlBlock:this
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFInfinity(EMFloatClass+InternalFPF,ubyte)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFZero(EMFloatClass+InternalFPF,ubyte)
         456 (30.28% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Init():this
          19 (27.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFInfinity(byref,ubyte)
          19 (27.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFZero(byref,ubyte)
          31 (21.23% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:LongToInternalFPF(int,EMFloatClass+InternalFPF)
         353 (21.11% of base) : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          29 (19.73% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:LongToInternalFPF(int,byref)
          14 (18.67% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFNaN(byref)
          27 (17.88% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:StickyShiftRightMant(byref,int)
          21 (13.64% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:StickyShiftRightMant(EMFloatClass+InternalFPF,int)
         116 (13.21% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:DivideInternalFPF(byref,byref,byref)
         169 (12.75% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this
          95 (10.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:DivideInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
           7 ( 8.33% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFNaN(EMFloatClass+InternalFPF)
          45 ( 5.85% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:MultiplyInternalFPF(byref,byref,byref)

24 total methods with Code Size differences (0 improved, 24 regressed), 2939 unchanged.

DEFAULT_UNROLL_LOOP_MAX_ITERATION_COUNT: 6

Frameworks

Found 292 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 59982060
Total bytes of diff: 59989095
Total bytes of delta: 7035 (0.01 % of base)
Total relative delta: NaN
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
        3338 : System.Numerics.Tensors.dasm (1.11% of base)
         841 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.02% of base)
         649 : System.Security.Cryptography.dasm (0.07% of base)
         418 : System.DirectoryServices.dasm (0.09% of base)
         311 : System.Private.Xml.dasm (0.01% of base)
         271 : System.Private.CoreLib.dasm (0.00% of base)
         185 : System.Runtime.Numerics.dasm (0.15% of base)
         169 : xunit.console.dasm (0.17% of base)
         169 : xunit.runner.reporters.netcoreapp10.dasm (0.29% of base)
         169 : xunit.runner.utility.netcoreapp10.dasm (0.09% of base)
         156 : System.Composition.Hosting.dasm (0.18% of base)
         151 : System.IO.Hashing.dasm (0.42% of base)
          84 : System.Runtime.Caching.dasm (0.13% of base)
          63 : System.Diagnostics.TextWriterTraceListener.dasm (0.37% of base)
          57 : System.Net.Security.dasm (0.03% of base)
          38 : System.Reflection.MetadataLoadContext.dasm (0.02% of base)
          10 : System.Private.DataContractSerialization.dasm (0.00% of base)

Top file improvements (bytes):
         -44 : System.Runtime.InteropServices.dasm (-1.31% of base)

18 total files with Code Size differences (1 improved, 17 regressed), 256 unchanged.

Top method regressions (bytes):
         418 (23.87% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[int]:GetArrayString(bool):System.String:this
         418 (23.84% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[long]:GetArrayString(bool):System.String:this
         418 (23.83% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[short]:GetArrayString(bool):System.String:this
         418 (24.18% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.__Canon]:GetArrayString(bool):System.String:this
         418 (23.83% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.Nullable`1[int]]:GetArrayString(bool):System.String:this
         418 (23.47% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.Numerics.Vector`1[float]]:GetArrayString(bool):System.String:this
         418 (23.16% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[ubyte]:GetArrayString(bool):System.String:this
         412 (23.00% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[double]:GetArrayString(bool):System.String:this
         320 (43.18% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:EncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],int,ulong,bool):ubyte[]:this
         214 (81.68% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:set_RawSchedule(bool[,,]):this
         204 (87.18% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:get_RawSchedule():bool[,,]:this
         185 (14.26% of base) : System.Runtime.Numerics.dasm - System.Numerics.BigInteger:.ctor(System.ReadOnlySpan`1[ubyte],bool,bool):this
         169 (12.36% of base) : xunit.console.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         169 (12.36% of base) : xunit.runner.reporters.netcoreapp10.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         169 (12.36% of base) : xunit.runner.utility.netcoreapp10.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         167 (46.65% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:TryEncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,bool,byref):bool
         135 (13.33% of base) : System.Private.CoreLib.dasm - System.Buffers.Text.Utf8Formatter:TryFormatDateTimeO(System.DateTime,System.TimeSpan,System.Span`1[ubyte],byref):bool
         132 (61.40% of base) : System.Private.Xml.dasm - System.Xml.Schema.SchemaInfo:Finish():this
         110 (20.75% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:PayloadValue(int):System.Object:this
         101 (19.88% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC:OnEnd(Microsoft.Diagnostics.Tracing.Analysis.TraceGarbageCollector):this

Top method improvements (bytes):
         -22 (-5.47% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Add():this
         -22 (-6.32% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Remove():this
          -2 (-3.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileVersion():System.String:this
          -1 (-0.13% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.GCCondemnedReasons:Decode(int):this

Top method regressions (percentages):
          62 (140.91% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.BPerfEventSource:DecodeMod(byref):uint
         100 (101.01% of base) : System.Private.Xml.dasm - System.Xml.Xsl.Xslt.Compiler:MergeWhitespaceRules(System.Xml.Xsl.Xslt.Stylesheet):this
         204 (87.18% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:get_RawSchedule():bool[,,]:this
          42 (82.35% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_CompanyName():System.String:this
         214 (81.68% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:set_RawSchedule(bool[,,]):this
          84 (81.55% of base) : System.Runtime.Caching.dasm - System.Runtime.Caching.MemoryMonitor:InitHistory():this
          83 (76.85% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:DeriveSecretFromSeed(ulong,ulong)
          55 (76.39% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC:GetHeapSizeBeforeMB(System.Collections.Generic.List`1[Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC],Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC):double
         132 (61.40% of base) : System.Private.Xml.dasm - System.Xml.Schema.SchemaInfo:Finish():this
          31 (60.78% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_ProductName():System.String:this
         167 (46.65% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:TryEncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,bool,byref):bool
         320 (43.18% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:EncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],int,ulong,bool):ubyte[]:this
          20 (39.22% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_VerLanguage():System.String:this
          79 (39.11% of base) : System.Private.Xml.dasm - System.Xml.Xsl.XsltOld.Compiler:AddScript(int,System.String):this
          34 (39.08% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Accumulate512(ulong,ulong,ulong)
          34 (39.08% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Accumulate512Inlined(ulong,ulong,ulong)
          60 (35.71% of base) : System.Composition.Hosting.dasm - System.Composition.Hosting.Util.SmallSparseInitonlyArray:TryGetValue(int,byref):bool:this
          62 (25.51% of base) : System.Private.CoreLib.dasm - System.Buffers.Text.Utf8Parser:TryParseUInt16X(System.ReadOnlySpan`1[ubyte],byref,byref):bool
          28 (25.23% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.CapiHelper:WriteDSSSeed(System.Security.Cryptography.DSAParameters,System.IO.BinaryWriter)
          36 (24.83% of base) : System.Private.CoreLib.dasm - System.IO.BinaryReader:Read7BitEncodedInt():int:this

Top method improvements (percentages):
         -22 (-6.32% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Remove():this
         -22 (-5.47% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Add():this
          -2 (-3.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileVersion():System.String:this
          -1 (-0.13% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.GCCondemnedReasons:Decode(int):this

55 total methods with Code Size differences (4 improved, 51 regressed), 374660 unchanged.

Benchmarks

Found 106 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 503196
Total bytes of diff: 508799
Total bytes of delta: 5603 (1.11 % of base)
Total relative delta: 8.23
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
        3271 : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm (106.30% of base)
         839 : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm (6.44% of base)
         653 : Bytemark\Bytemark\Bytemark.dasm (0.89% of base)
         453 : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm (11.24% of base)
         224 : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm (1.51% of base)
         113 : V8\Richards\Richards\Richards.dasm (2.19% of base)
          31 : FractalPerf\FractalPerf\FractalPerf.dasm (1.96% of base)
          19 : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm (0.48% of base)

8 total files with Code Size differences (0 improved, 8 regressed), 90 unchanged.

Top method regressions (bytes):
        3271 (167.40% of base) : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm - Benchstone.MDBenchI.MDPuzzle:DoIt():bool:this
         456 (30.28% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Init():this
         453 (27.09% of base) : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
         383 ( 3.83% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Main1(int):this
         169 (12.75% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this
         116 (13.21% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:DivideInternalFPF(byref,byref,byref)
         113 (81.29% of base) : V8\Richards\Richards\Richards.dasm - V8.Richards.WorkerTask:run(V8.Richards.Packet):V8.Richards.TaskControlBlock:this
          95 (10.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:DivideInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          76 (86.36% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:memmove(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          71 (85.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:memmove(byref,byref)
          55 ( 0.66% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          45 ( 5.85% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:MultiplyInternalFPF(byref,byref,byref)
          41 ( 5.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:MultiplyInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          31 (21.23% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:LongToInternalFPF(int,EMFloatClass+InternalFPF)
          31 (75.61% of base) : FractalPerf\FractalPerf\FractalPerf.dasm - FractalPerf.Launch:TestBase():bool
          29 (19.73% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:LongToInternalFPF(int,byref)
          27 (17.88% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:StickyShiftRightMant(byref,int)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFInfinity(EMFloatClass+InternalFPF,ubyte)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFZero(EMFloatClass+InternalFPF,ubyte)
          21 (13.64% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:StickyShiftRightMant(EMFloatClass+InternalFPF,int)

Top method regressions (percentages):
        3271 (167.40% of base) : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm - Benchstone.MDBenchI.MDPuzzle:DoIt():bool:this
          76 (86.36% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:memmove(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          71 (85.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:memmove(byref,byref)
         113 (81.29% of base) : V8\Richards\Richards\Richards.dasm - V8.Richards.WorkerTask:run(V8.Richards.Packet):V8.Richards.TaskControlBlock:this
          31 (75.61% of base) : FractalPerf\FractalPerf\FractalPerf.dasm - FractalPerf.Launch:TestBase():bool
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFInfinity(EMFloatClass+InternalFPF,ubyte)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFZero(EMFloatClass+InternalFPF,ubyte)
         456 (30.28% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Init():this
          19 (27.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFInfinity(byref,ubyte)
          19 (27.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFZero(byref,ubyte)
         453 (27.09% of base) : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          31 (21.23% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:LongToInternalFPF(int,EMFloatClass+InternalFPF)
          29 (19.73% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:LongToInternalFPF(int,byref)
          14 (18.67% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFNaN(byref)
          27 (17.88% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:StickyShiftRightMant(byref,int)
          21 (13.64% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:StickyShiftRightMant(EMFloatClass+InternalFPF,int)
         116 (13.21% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:DivideInternalFPF(byref,byref,byref)
         169 (12.75% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this
          95 (10.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:DivideInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
           7 ( 8.33% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFNaN(EMFloatClass+InternalFPF)

25 total methods with Code Size differences (0 improved, 25 regressed), 2938 unchanged.

DEFAULT_UNROLL_LOOP_MAX_ITERATION_COUNT: 8

Frameworks

Found 294 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 59982060
Total bytes of diff: 59990124
Total bytes of delta: 8064 (0.01 % of base)
Total relative delta: NaN
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
        3338 : System.Numerics.Tensors.dasm (1.11% of base)
        1365 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.04% of base)
         649 : System.Security.Cryptography.dasm (0.07% of base)
         576 : System.DirectoryServices.dasm (0.13% of base)
         350 : System.Private.CoreLib.dasm (0.01% of base)
         311 : System.Private.Xml.dasm (0.01% of base)
         230 : System.IO.Hashing.dasm (0.64% of base)
         185 : System.Runtime.Numerics.dasm (0.15% of base)
         169 : xunit.runner.utility.netcoreapp10.dasm (0.09% of base)
         169 : xunit.console.dasm (0.17% of base)
         169 : xunit.runner.reporters.netcoreapp10.dasm (0.29% of base)
         156 : System.Composition.Hosting.dasm (0.18% of base)
         116 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.00% of base)
          84 : System.Runtime.Caching.dasm (0.13% of base)
          73 : System.Formats.Asn1.dasm (0.08% of base)
          63 : System.Diagnostics.TextWriterTraceListener.dasm (0.37% of base)
          57 : System.Net.Security.dasm (0.03% of base)
          38 : System.Reflection.MetadataLoadContext.dasm (0.02% of base)
          10 : System.Private.DataContractSerialization.dasm (0.00% of base)

Top file improvements (bytes):
         -44 : System.Runtime.InteropServices.dasm (-1.31% of base)

20 total files with Code Size differences (1 improved, 19 regressed), 254 unchanged.

Top method regressions (bytes):
         418 (23.87% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[int]:GetArrayString(bool):System.String:this
         418 (23.84% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[long]:GetArrayString(bool):System.String:this
         418 (23.83% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[short]:GetArrayString(bool):System.String:this
         418 (24.18% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.__Canon]:GetArrayString(bool):System.String:this
         418 (23.83% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.Nullable`1[int]]:GetArrayString(bool):System.String:this
         418 (23.47% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.Numerics.Vector`1[float]]:GetArrayString(bool):System.String:this
         418 (23.16% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[ubyte]:GetArrayString(bool):System.String:this
         412 (23.00% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[double]:GetArrayString(bool):System.String:this
         320 (43.18% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:EncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],int,ulong,bool):ubyte[]:this
         237 (44.72% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:PayloadValue(int):System.Object:this
         214 (81.68% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:set_RawSchedule(bool[,,]):this
         213 (10.88% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
         204 (87.18% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:get_RawSchedule():bool[,,]:this
         185 (14.26% of base) : System.Runtime.Numerics.dasm - System.Numerics.BigInteger:.ctor(System.ReadOnlySpan`1[ubyte],bool,bool):this
         169 (12.36% of base) : xunit.runner.utility.netcoreapp10.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         169 (12.36% of base) : xunit.console.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         169 (12.36% of base) : xunit.runner.reporters.netcoreapp10.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         167 (46.65% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:TryEncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,bool,byref):bool
         158 (190.36% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:SetDailySchedule(int,int,int,int):this
         135 (13.33% of base) : System.Private.CoreLib.dasm - System.Buffers.Text.Utf8Formatter:TryFormatDateTimeO(System.DateTime,System.TimeSpan,System.Span`1[ubyte],byref):bool

Top method improvements (bytes):
         -22 (-5.47% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Add():this
         -22 (-6.32% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Remove():this
          -2 (-3.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileVersion():System.String:this
          -1 (-0.13% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.GCCondemnedReasons:Decode(int):this

Top method regressions (percentages):
         116 (374.19% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.Symbols.CRC32:CalcEntry(uint):uint
         158 (190.36% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:SetDailySchedule(int,int,int,int):this
          62 (140.91% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.BPerfEventSource:DecodeMod(byref):uint
          64 (125.49% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileId():System.String:this
          73 (114.06% of base) : System.Formats.Asn1.dasm - System.Formats.Asn1.AsnDecoder:InterpretNamedBitListReversed(System.ReadOnlySpan`1[ubyte]):long
          53 (103.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_ProductVersion():System.String:this
         100 (101.01% of base) : System.Private.Xml.dasm - System.Xml.Xsl.Xslt.Compiler:MergeWhitespaceRules(System.Xml.Xsl.Xslt.Stylesheet):this
          79 (98.75% of base) : System.Private.CoreLib.dasm - System.Numerics.Crc32ReflectedTable:Generate(uint):uint[]
          79 (98.75% of base) : System.IO.Hashing.dasm - System.Numerics.Crc32ReflectedTable:Generate(uint):uint[]
         204 (87.18% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:get_RawSchedule():bool[,,]:this
          42 (82.35% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_CompanyName():System.String:this
         214 (81.68% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:set_RawSchedule(bool[,,]):this
          84 (81.55% of base) : System.Runtime.Caching.dasm - System.Runtime.Caching.MemoryMonitor:InitHistory():this
          83 (76.85% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:DeriveSecretFromSeed(ulong,ulong)
          55 (76.39% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC:GetHeapSizeBeforeMB(System.Collections.Generic.List`1[Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC],Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC):double
         132 (61.40% of base) : System.Private.Xml.dasm - System.Xml.Schema.SchemaInfo:Finish():this
          31 (60.78% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_ProductName():System.String:this
         167 (46.65% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:TryEncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,bool,byref):bool
         237 (44.72% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:PayloadValue(int):System.Object:this
         320 (43.18% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:EncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],int,ulong,bool):ubyte[]:this

Top method improvements (percentages):
         -22 (-6.32% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Remove():this
         -22 (-5.47% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Add():this
          -2 (-3.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileVersion():System.String:this
          -1 (-0.13% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.GCCondemnedReasons:Decode(int):this

63 total methods with Code Size differences (4 improved, 59 regressed), 374652 unchanged.

Benchmarks

Found 107 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 503196
Total bytes of diff: 508872
Total bytes of delta: 5676 (1.13 % of base)
Total relative delta: 9.58
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
        3271 : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm (106.30% of base)
         839 : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm (6.44% of base)
         653 : Bytemark\Bytemark\Bytemark.dasm (0.89% of base)
         453 : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm (11.24% of base)
         224 : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm (1.51% of base)
         113 : V8\Richards\Richards\Richards.dasm (2.19% of base)
          73 : Benchstones\BenchI\Permutate\Permutate\Permutate.dasm (9.64% of base)
          31 : FractalPerf\FractalPerf\FractalPerf.dasm (1.96% of base)
          19 : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm (0.48% of base)

9 total files with Code Size differences (0 improved, 9 regressed), 89 unchanged.

Top method regressions (bytes):
        3271 (167.40% of base) : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm - Benchstone.MDBenchI.MDPuzzle:DoIt():bool:this
         456 (30.28% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Init():this
         453 (27.09% of base) : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
         383 ( 3.83% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Main1(int):this
         169 (12.75% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this
         116 (13.21% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:DivideInternalFPF(byref,byref,byref)
         113 (81.29% of base) : V8\Richards\Richards\Richards.dasm - V8.Richards.WorkerTask:run(V8.Richards.Packet):V8.Richards.TaskControlBlock:this
          95 (10.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:DivideInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          76 (86.36% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:memmove(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          73 (135.19% of base) : Benchstones\BenchI\Permutate\Permutate\Permutate.dasm - Benchstone.BenchI.Permutate:Initialize():this
          71 (85.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:memmove(byref,byref)
          55 ( 0.66% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          45 ( 5.85% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:MultiplyInternalFPF(byref,byref,byref)
          41 ( 5.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:MultiplyInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          31 (21.23% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:LongToInternalFPF(int,EMFloatClass+InternalFPF)
          31 (75.61% of base) : FractalPerf\FractalPerf\FractalPerf.dasm - FractalPerf.Launch:TestBase():bool
          29 (19.73% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:LongToInternalFPF(int,byref)
          27 (17.88% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:StickyShiftRightMant(byref,int)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFInfinity(EMFloatClass+InternalFPF,ubyte)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFZero(EMFloatClass+InternalFPF,ubyte)

Top method regressions (percentages):
        3271 (167.40% of base) : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm - Benchstone.MDBenchI.MDPuzzle:DoIt():bool:this
          73 (135.19% of base) : Benchstones\BenchI\Permutate\Permutate\Permutate.dasm - Benchstone.BenchI.Permutate:Initialize():this
          76 (86.36% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:memmove(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          71 (85.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:memmove(byref,byref)
         113 (81.29% of base) : V8\Richards\Richards\Richards.dasm - V8.Richards.WorkerTask:run(V8.Richards.Packet):V8.Richards.TaskControlBlock:this
          31 (75.61% of base) : FractalPerf\FractalPerf\FractalPerf.dasm - FractalPerf.Launch:TestBase():bool
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFInfinity(EMFloatClass+InternalFPF,ubyte)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFZero(EMFloatClass+InternalFPF,ubyte)
         456 (30.28% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Init():this
          19 (27.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFInfinity(byref,ubyte)
          19 (27.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFZero(byref,ubyte)
         453 (27.09% of base) : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          31 (21.23% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:LongToInternalFPF(int,EMFloatClass+InternalFPF)
          29 (19.73% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:LongToInternalFPF(int,byref)
          14 (18.67% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFNaN(byref)
          27 (17.88% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:StickyShiftRightMant(byref,int)
          21 (13.64% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:StickyShiftRightMant(EMFloatClass+InternalFPF,int)
         116 (13.21% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:DivideInternalFPF(byref,byref,byref)
         169 (12.75% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this
          95 (10.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:DivideInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)

26 total methods with Code Size differences (0 improved, 26 regressed), 2937 unchanged.

tannergooding · 2023-01-09T02:48:30Z

/azp list

azure-pipelines · 2023-01-09T02:48:36Z

CI/CD Pipelines for this repository: runtime-coreclr outerloop runtime-coreclr jitstress runtime-coreclr jitstressregs runtime-coreclr jitstress2-jitstressregs runtime-coreclr gcstress0x3-gcstress0xc runtime-coreclr gcstress-extra runtime-coreclr r2r-extra runtime-coreclr jitstress-isas-x86 runtime-coreclr jitstress-isas-arm runtime-coreclr jitstressregs-x86 runtime-coreclr libraries-jitstressregs runtime-coreclr libraries-jitstress2-jitstressregs runtime-coreclr r2r runtime-coreclr runincontext runtime-coreclr crossgen2 runtime-libraries-coreclr outerloop runtime-libraries-coreclr outerloop-windows runtime-libraries-coreclr outerloop-linux runtime-libraries-coreclr outerloop-osx runtime runtime-libraries enterprise-linux runtime-libraries stress-http runtime-libraries stress-ssl runtime-dev-innerloop runtime-coreclr crossgen2 outerloop coreclr-release-outerloop-nightly runtime-coreclr crossgen2-composite runtime-jit-experimental runtime-coreclr libraries-jitstress dotnet-linker-tests runtime-coreclr ilasm runtime-coreclr crossgen2-composite gcstress runtime-staging runtime-coreclr pgo runtime-coreclr libraries-pgo Antigen runtime-community Fuzzlyn runtime-coreclr superpmi-replay runtime-wasm runtime-coreclr superpmi-diffs runtime-coreclr superpmi-asmdiffs-checked-release runtime-extra-platforms jit-cfg runtime-wasm-perf runtime-llvm runtime-coreclr jitstress-random runtime-coreclr libraries-jitstress-random runtime-android-grpc-client-tests runtime-wasm-libtests runtime-wasm-non-libtests runtime-android runtime-androidemulator runtime-ioslike runtime-ioslikesimulator runtime-linuxbionic runtime-maccatalyst runtime-coreclr pgostress runtime-coreclr jitstress-isas-avx512 runtime-tools-tests runtime-libraries-mono outerloop runtime-sanitized

tannergooding · 2023-01-10T22:21:35Z

The SuperPMI diffs in CI are a lot larger than what I got locally, particularly in tests, there is a small TP impact as well.

CC. @BruceForstall. Would appreciate some input on if you think this is something worth moving forward on given where all the loop code is today.

It is worth noting most of the tests are HWIntrinsic tests where the code is explicitly doing constant count loops that are emulating the actual instruction functionality for minimal correctness verification.

BruceForstall · 2023-01-18T23:17:06Z

Diffs

My summary: it only modestly kicks in on most collections, except for coreclr_tests where it is mostly in the templatized HW Intrinsics tests. E.g., it hits in MDPuzzle:DoIt() which is a perfect example of code where it should kick in. The TP impact is mostly on benchmarks and coreclr_tests -- presumably only in the HW intrinsics cases where it kicks in. It's possible, though unlikely, many cases are going through the "sizing" loop which might not be cheap and then not unrolling.

I'd like to see this change checked in, especially early in the cycle where we can get perf numbers early. Choosing 4 as the max unroll count seems completely reasonable. The code size metric may need to be tuned, but since it doesn't kick in too much anyway, it's not clear how to tune it better than it already is.

cc @dotnet/jit-contrib

tannergooding · 2023-01-18T23:35:57Z

Merging in latest main and rerunning CI since general direction seems favorable/positive

Definitely needs to run stress tests before PR gets merged

AndyAyersMS · 2023-01-18T23:41:08Z

Are there examples where we know this extra unrolling will make a big impact on perf?

tannergooding · 2023-01-20T23:28:45Z

Are there examples where we know this extra unrolling will make a big impact on perf?

@AndyAyersMS not in our own code. Most of our own perf critical code is explicitly using vectors over unknown length inputs.

… unrolling and large amounts of dead code elimination

tannergooding · 2023-01-26T18:25:32Z

Fixed the one failure which was a change around the generated code which impacted the new "codegen verification" test feature.

The particular tests were impacted since they are getting unrolled now and that also leads to some large dead code elimination which meant that verifying the codegen would've been very difficult.

That also calls out that there is an existing missing loop optimization around converting something like this to not loop and just assign result directly once:

for (var i = 0; i < 4; i++)
{
    result = AdvSimd.CompareEqual(left, asVar);
}

That is, recognizing when the loop is doing throwaway or non-side effecting work if executed more than once

@BruceForstall, do you know if we have an existing issue tracking that?

tannergooding · 2023-01-26T21:12:40Z

CC. @dotnet/jit-contrib, this should be ready for review.

If we can get it reviewed/merged before EOD tomorrow it will make the first preview and give us the greatest amount of time for feedback/etc.

BruceForstall · 2023-01-27T20:25:08Z

@BruceForstall, do you know if we have an existing issue tracking that?

Not that I'm aware of. Looks like we should be able to hoist this loop-invariant code leading to an empty loop.

BruceForstall · 2023-01-27T20:31:58Z

src/tests/JIT/Regression/JitBlue/Runtime_33972/Runtime_33972.cs

-        // 'asVar' should be propagated.
-        // 'AdvSimd.CompareEqual' should be hoisted out of the loops.
-        // ARM64-FULL-LINE: fcmeq {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, #0.0
-        // ARM64: blt
-        // ARM64-NOT: fcmeq
+


Is this test unused / uninteresting due to removing these lines? I thought that tests marked <HasDisasmCheck>true</HasDisasmCheck> are only used for disasm checking, and aren't actually run (but maybe that will change?). @TIHan @markples ?

This was actually a test added quite a bit before DisasmCheck existed and is covering specific containment functionality: #33972

The disasm checks were added later since it represents a good candidate for validating the containment functionality is actually working as expected.

In this particular case, we could probably cover it as something like:

// ARM64-FULL-LINE: fcmeq {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, #0.0 // ARM64-NOT: blt // ARM64-NOT: fcmeq

But the combination of loop unrolling + dead code optimization makes it a case that's a bit difficult to test/understand the reasoning around.

I think we're fine not having the coverage for these two subcases in particular given all the other disasm coverage of the scenario in the same file, but I'd be fine with adding the above lines if people would prefer they remain.

HasDisasmCheck adds disasm checking in addition to execution. (In fact, as it is, it requires execution because we get the disasm via normal execution plus DOTNET_JitDisasm.)

Loop logic is difficult to test in a line-based system. The existence of fcmeq is probably covered reasonably by the other tests, so this one is a question of whether the trouble/brittleness is worth it to verify that intended loop transformations have occurred. (and it sounds like the answer is 'no')

HasDisasmCheck adds disasm checking in addition to execution

That's good to hear. However, there's an unfortunate wrinkle: it appears that all HasDisasmCheck tests also must set:

DOTNET_TieredCompilation=0 DOTNET_JITMinOpts=0

so those tests don't get alternative stress mode configurations run.

(this is subject for another issue, not this PR, or course)

chsienki · 2023-01-27T22:24:27Z

src/coreclr/jit/compiler.h

@@ -9424,7 +9424,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 #define DEFAULT_MAX_LOOPSIZE_FOR_ALIGN DEFAULT_ALIGN_LOOP_BOUNDARY * 3

 // By default only single iteration loops will be unrolled


Does this comment need updating?

Yes, oops. Totally missed the comment

Will get up a fix

Change the default loop unroll limit to 4

c18f6e4

ghost assigned tannergooding Jan 9, 2023

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 9, 2023

Merge remote-tracking branch 'dotnet/main' into loop-unroll

7c25562

build-analysis bot mentioned this pull request Jan 10, 2023

Build fails with "eng/common/tools.sh: line 474: 537 Segmentation fault" #76759

Closed

Merge remote-tracking branch 'dotnet/main' into loop-unroll

7e68d05

tannergooding added 2 commits January 26, 2023 10:09

Merge remote-tracking branch 'dotnet/main' into loop-unroll

0fed3de

Remove verification testing for two complex tests involving both loop…

ce06a21

… unrolling and large amounts of dead code elimination

tannergooding marked this pull request as ready for review January 26, 2023 18:21

BruceForstall approved these changes Jan 27, 2023

View reviewed changes

tannergooding merged commit 7c2969e into dotnet:main Jan 27, 2023

tannergooding deleted the loop-unroll branch January 27, 2023 20:46

chsienki reviewed Jan 27, 2023

View reviewed changes

jakobbotsch mentioned this pull request Feb 6, 2023

[Fuzzlyn] Jump into the middle of handler region #81675

Closed

JulieLeeMSFT mentioned this pull request Feb 8, 2023

What's new in .NET 8 Preview 1 dotnet/core#8133

Closed

3 tasks

ghost locked as resolved and limited conversation to collaborators Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the default loop unroll limit to 4 #80353

Change the default loop unroll limit to 4 #80353

tannergooding commented Jan 9, 2023 •

edited

Loading

ghost commented Jan 9, 2023

tannergooding commented Jan 9, 2023

Frameworks

Benchmarks

Frameworks

Benchmarks

Frameworks

Benchmarks

Frameworks

Benchmarks

tannergooding commented Jan 9, 2023

azure-pipelines bot commented Jan 9, 2023

tannergooding commented Jan 10, 2023 •

edited

Loading

BruceForstall commented Jan 18, 2023

tannergooding commented Jan 18, 2023 •

edited

Loading

AndyAyersMS commented Jan 18, 2023

tannergooding commented Jan 20, 2023

tannergooding commented Jan 26, 2023 •

edited

Loading

tannergooding commented Jan 26, 2023 •

edited

Loading

BruceForstall commented Jan 27, 2023

BruceForstall Jan 27, 2023

tannergooding Jan 27, 2023 •

edited

Loading

markples Jan 27, 2023

BruceForstall Jan 27, 2023

chsienki Jan 27, 2023

tannergooding Jan 27, 2023

tannergooding Jan 27, 2023

		@@ -9424,7 +9424,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
		#define DEFAULT_MAX_LOOPSIZE_FOR_ALIGN DEFAULT_ALIGN_LOOP_BOUNDARY * 3

		// By default only single iteration loops will be unrolled

Change the default loop unroll limit to 4 #80353

Change the default loop unroll limit to 4 #80353

Conversation

tannergooding commented Jan 9, 2023 • edited Loading

ghost commented Jan 9, 2023

tannergooding commented Jan 9, 2023

Frameworks

Benchmarks

Frameworks

Benchmarks

Frameworks

Benchmarks

Frameworks

Benchmarks

tannergooding commented Jan 9, 2023

azure-pipelines bot commented Jan 9, 2023

tannergooding commented Jan 10, 2023 • edited Loading

BruceForstall commented Jan 18, 2023

tannergooding commented Jan 18, 2023 • edited Loading

AndyAyersMS commented Jan 18, 2023

tannergooding commented Jan 20, 2023

tannergooding commented Jan 26, 2023 • edited Loading

tannergooding commented Jan 26, 2023 • edited Loading

BruceForstall commented Jan 27, 2023

BruceForstall Jan 27, 2023

Choose a reason for hiding this comment

tannergooding Jan 27, 2023 • edited Loading

Choose a reason for hiding this comment

markples Jan 27, 2023

Choose a reason for hiding this comment

BruceForstall Jan 27, 2023

Choose a reason for hiding this comment

chsienki Jan 27, 2023

Choose a reason for hiding this comment

tannergooding Jan 27, 2023

Choose a reason for hiding this comment

tannergooding Jan 27, 2023

Choose a reason for hiding this comment

tannergooding commented Jan 9, 2023 •

edited

Loading

tannergooding commented Jan 10, 2023 •

edited

Loading

tannergooding commented Jan 18, 2023 •

edited

Loading

tannergooding commented Jan 26, 2023 •

edited

Loading

tannergooding commented Jan 26, 2023 •

edited

Loading

tannergooding Jan 27, 2023 •

edited

Loading