Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the default loop unroll limit to 4 #80353

Merged
merged 5 commits into from
Jan 27, 2023

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Jan 9, 2023

Today, the JIT defaults to 1 unless the upper limit is a Vector###<T>.Count property in which case it will try to unroll provided the method body isn't "too large". This means that devs can write a slightly more obscure pattern to get loop unrolling if truly desired, but the more natural pattern will not light up.

The AMD and Intel software optimization manuals all recommend unrolling small loops. However, the limits and rules for when to unroll differs between them:

  • Intel recommends 16 or fewer, with a divisor based on the number of conditional branches it contains
    • Intel has some other consideration such as fitting in the instruction cache and to consider partial unrolling so that the loop overhead accounts for less than 10% of the execution time
  • AMD recommends unrolling for 10 or fewer iterations
    • AMD likewise has other considerations such as when the loop body is fewer than 10 instructions, factoring in number of branches, and number of micro-ops (ideally less than 40)
  • Arm64 has a software optimization guide but doesn't go in depth on loop unrolling
    • Arm64 does have a small section on optimizing memcpy via unrolling

This PR updates the default for the JIT to 4. The reason 4 is chosen is because it gives some improvements to various functions and code paths without significantly increasing code size. Likewise, it has some level of significance to both the JIT and common code users will write in perf sensitive scenarios:

  • It represents the maximum number of fields we will consider for promotion today
  • It represents the typical number of arguments that will be passed in register across all ABIs
  • It represents the number of fields in a Vector4 (4x T) or an optimized Matrix4x4 (4x Vector4<T>) which are heavily used in graphics and UI stacks
  • It represents the number of 32-bit values in a Vector128<T> (this is the common SIMD type and the size natural size .NET uses for indexing and most other "sizes" in managed)

I ran perf numbers for the benchmarks that had PMI diffs and we have a few that are unchanged and a few that are faster. I expect this may require some tweaking based on what we see in the actual perf runs, but we are early in the cycle and have plenty of time to react or back out the change if necessary.

// * Summary *

BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22621.963)
AMD Ryzen 9 7950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=8.0.100-alpha.1.23056.8
  [Host]     : .NET 8.0.0 (8.0.23.5503), X64 RyuJIT AVX2
  Job-OFYLOX : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-UUJPBY : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250.0000 ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1
Method Job Toolchain Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
BenchEmFloat Job-GJDPWC \runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe 2,023.0 ms 10.83 ms 9.04 ms 2,018.8 ms 2,016.0 ms 2,046.0 ms 1.00 6000.0000 5000.0000 99.33 MB
BenchEmFloat Job-CPGMPI \runtime_base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe 2,030.8 ms 6.68 ms 6.24 ms 2,028.0 ms 2,025.2 ms 2,042.0 ms 1.00 6000.0000 5000.0000 99.33 MB
BenchEmFloatClass Job-GJDPWC \runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe 406.8 ms 1.17 ms 0.98 ms 407.0 ms 404.7 ms 408.1 ms 1.00 2000.0000 1000.0000 34.39 MB
BenchEmFloatClass Job-CPGMPI \runtime_base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe 408.7 ms 1.39 ms 1.23 ms 408.7 ms 406.7 ms 410.8 ms 1.00 2000.0000 1000.0000 34.39 MB
KNucleotide_1 Job-HPETII \runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe 128.1 ms 1.33 ms 1.24 ms 127.8 ms 126.4 ms 130.2 ms 0.99 0.03 1000.0000 1000.0000
KNucleotide_1 Job-XXUDEF \runtime_base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe 129.6 ms 2.53 ms 2.92 ms 129.1 ms 126.2 ms 135.8 ms 1.00 0.00 1000.0000 1000.0000
LLoops Job-ZCIXOZ \runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe 321.8 ms 2.17 ms 1.92 ms 322.7 ms 318.3 ms 325.3 ms 0.96 3.41 MB 1.00
LLoops Job-NHLETH \runtime_base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe 336.1 ms 1.62 ms 1.43 ms 335.5 ms 334.4 ms 339.0 ms 1.00 3.41 MB 1.00
MDLLoops Job-OFYLOX \runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe 440.2 ms 1.99 ms 1.76 ms 440.0 ms 437.0 ms 443.2 ms 1.01 3.39 MB 1.00
MDLLoops Job-UUJPBY \runtime_base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe 434.4 ms 2.81 ms 2.49 ms 434.5 ms 431.2 ms 438.7 ms 1.00 3.39 MB 1.00
MDPuzzle Job-OFYLOX \runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe 227.8 ms 3.69 ms 3.45 ms 228.5 ms 219.6 ms 234.1 ms 0.99 0.02 - NA
MDPuzzle Job-UUJPBY \runtime_base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe 229.3 ms 1.49 ms 1.32 ms 229.4 ms 226.9 ms 231.4 ms 1.00 0.00 - NA
Puzzle Job-MXGOER \runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe 225.1 ms 3.02 ms 2.83 ms 223.7 ms 221.8 ms 230.9 ms 0.99 0.02 - NA
Puzzle Job-CTVVXD \runtime_base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe 225.8 ms 4.54 ms 5.23 ms 227.9 ms 217.7 ms 233.7 ms 1.00 0.00 - NA
Richards Job-ZUZZOX \runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe 45.63 us 0.778 us 0.728 us 45.60 us 44.53 us 47.00 us 0.99 0.02 1.34 KB 1.00
Richards Job-KCTOAU \runtime_base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe 46.18 us 0.584 us 0.546 us 46.25 us 45.30 us 47.34 us 1.00 0.00 1.34 KB 1.00

@ghost ghost assigned tannergooding Jan 9, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 9, 2023
@ghost
Copy link

ghost commented Jan 9, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Today, the JIT defaults to 1 unless the upper limit is a Vector###<T>.Count property in which case it will try to unroll provided the method body isn't "too large". This means that devs can write a slightly more obscure pattern to get loop unrolling if truly desired, but the more natural pattern will not light up.

The AMD and Intel software optimization manuals all recommend unrolling small loops. However, the limits and rules for when to unroll differs between them:

  • Intel recommends 16 or fewer, with a divisor based on the number of conditional branches it contains
    • Intel has some other consideration such as fitting in the instruction cache and to consider partial unrolling so that the loop overhead accounts for less than 10% of the execution time
  • AMD recommends unrolling for 10 or fewer iterations
    • AMD likewise has other considerations such as when the loop body is fewer than 10 instructions, factoring in number of branches, and number of micro-ops (ideally less than 40)
  • Arm64 has a software optimization guide but doesn't go in depth on loop unrolling
    • Arm64 does have a small section on optimizing memcpy via unrolling

This PR updates the default for the JIT to 4. The reason 4 is chosen is because it gives some improvements to various functions and code paths without significantly increasing code size. Likewise, it has some level of significance to both the JIT and common code users will write in perf sensitive scenarios:

  • It represents the maximum number of fields we will consider for promotion today
  • It represents the typical number of arguments that will be passed in register across all ABIs
  • It represents the number of fields in a Vector4 (4x T) or an optimized Matrix4x4 (4x Vector4<T>) which are heavily used in graphics and UI stacks
  • It represents the number of 32-bit values in a Vector128<T> (this is the common SIMD type and the size natural size .NET uses for indexing and most other "sizes" in managed)
Author: tannergooding
Assignees: tannergooding
Labels:

area-CodeGen-coreclr

Milestone: -

@tannergooding
Copy link
Member Author

For reference, here are some pmi diffs for frameworks and benchmarks comparing 2, 4, 6, and 8. I did not include 16 or 32 (the other two sizes the JIT will currently unroll given Vector###<T>.Count) as they are larger than the manuals typically recommend.

DEFAULT_UNROLL_LOOP_MAX_ITERATION_COUNT: 2

Frameworks

Found 279 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 59982060
Total bytes of diff: 59982829
Total bytes of delta: 769 (0.00 % of base)
Total relative delta: NaN
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
         529 : System.Security.Cryptography.dasm (0.06% of base)
         132 : System.Private.Xml.dasm (0.00% of base)
          84 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.00% of base)
          68 : System.IO.Hashing.dasm (0.19% of base)

Top file improvements (bytes):
         -44 : System.Runtime.InteropServices.dasm (-1.31% of base)

5 total files with Code Size differences (1 improved, 4 regressed), 269 unchanged.

Top method regressions (bytes):
         320 (43.18% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:EncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],int,ulong,bool):ubyte[]:this
         167 (46.65% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:TryEncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,bool,byref):bool
         132 (61.40% of base) : System.Private.Xml.dasm - System.Xml.Schema.SchemaInfo:Finish():this
          67 (23.26% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.TraceEvent:XmlAttribHex(System.Text.StringBuilder,System.String,ulong):System.Text.StringBuilder
          42 (16.47% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.CngCommon:TrySignHash(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,byref):bool
          34 (39.08% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Accumulate512(ulong,ulong,ulong)
          34 (39.08% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Accumulate512Inlined(ulong,ulong,ulong)
          22 (17.74% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC:EnsureBGCRevisitInfoAlloc():this

Top method improvements (bytes):
         -22 (-5.47% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Add():this
         -22 (-6.32% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Remove():this
          -3 (-0.32% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Etlx.TraceLog+<>c__DisplayClass84_0:<SetupCallbacks>b__9(Microsoft.Diagnostics.Tracing.Parsers.Kernel.ImageLoadTraceData):this
          -2 (-3.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileVersion():System.String:this

Top method regressions (percentages):
         132 (61.40% of base) : System.Private.Xml.dasm - System.Xml.Schema.SchemaInfo:Finish():this
         167 (46.65% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:TryEncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,bool,byref):bool
         320 (43.18% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:EncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],int,ulong,bool):ubyte[]:this
          34 (39.08% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Accumulate512(ulong,ulong,ulong)
          34 (39.08% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Accumulate512Inlined(ulong,ulong,ulong)
          67 (23.26% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.TraceEvent:XmlAttribHex(System.Text.StringBuilder,System.String,ulong):System.Text.StringBuilder
          22 (17.74% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC:EnsureBGCRevisitInfoAlloc():this
          42 (16.47% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.CngCommon:TrySignHash(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,byref):bool

Top method improvements (percentages):
         -22 (-6.32% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Remove():this
         -22 (-5.47% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Add():this
          -2 (-3.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileVersion():System.String:this
          -3 (-0.32% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Etlx.TraceLog+<>c__DisplayClass84_0:<SetupCallbacks>b__9(Microsoft.Diagnostics.Tracing.Parsers.Kernel.ImageLoadTraceData):this

12 total methods with Code Size differences (4 improved, 8 regressed), 374703 unchanged.

Benchmarks

Found 103 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 503196
Total bytes of diff: 505161
Total bytes of delta: 1965 (0.39 % of base)
Total relative delta: 0.95
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
        1191 : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm (38.71% of base)
         548 : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm (4.21% of base)
         187 : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm (4.64% of base)
          20 : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm (0.13% of base)
          19 : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm (0.48% of base)

5 total files with Code Size differences (0 improved, 5 regressed), 93 unchanged.

Top method regressions (bytes):
        1191 (60.95% of base) : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm - Benchstone.MDBenchI.MDPuzzle:DoIt():bool:this
         304 ( 3.04% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Main1(int):this
         244 (16.20% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Init():this
         187 (11.18% of base) : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          20 ( 1.51% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this
          19 ( 2.43% of base) : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm - BenchmarksGame.KNucleotide_1:Bench(System.IO.Stream,BenchmarksGame.TestHarnessHelpers,bool):bool

Top method regressions (percentages):
        1191 (60.95% of base) : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm - Benchstone.MDBenchI.MDPuzzle:DoIt():bool:this
         244 (16.20% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Init():this
         187 (11.18% of base) : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
         304 ( 3.04% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Main1(int):this
          19 ( 2.43% of base) : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm - BenchmarksGame.KNucleotide_1:Bench(System.IO.Stream,BenchmarksGame.TestHarnessHelpers,bool):bool
          20 ( 1.51% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this

6 total methods with Code Size differences (0 improved, 6 regressed), 2957 unchanged.
DEFAULT_UNROLL_LOOP_MAX_ITERATION_COUNT: 4

Frameworks

Found 289 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 59982060
Total bytes of diff: 59988291
Total bytes of delta: 6231 (0.01 % of base)
Total relative delta: NaN
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
        3338 : System.Numerics.Tensors.dasm (1.11% of base)
         560 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.01% of base)
         529 : System.Security.Cryptography.dasm (0.06% of base)
         418 : System.DirectoryServices.dasm (0.09% of base)
         311 : System.Private.Xml.dasm (0.01% of base)
         185 : System.Runtime.Numerics.dasm (0.15% of base)
         169 : xunit.runner.reporters.netcoreapp10.dasm (0.29% of base)
         169 : xunit.runner.utility.netcoreapp10.dasm (0.09% of base)
         169 : xunit.console.dasm (0.17% of base)
         156 : System.Composition.Hosting.dasm (0.18% of base)
         136 : System.Private.CoreLib.dasm (0.00% of base)
          68 : System.IO.Hashing.dasm (0.19% of base)
          57 : System.Net.Security.dasm (0.03% of base)
          10 : System.Private.DataContractSerialization.dasm (0.00% of base)

Top file improvements (bytes):
         -44 : System.Runtime.InteropServices.dasm (-1.31% of base)

15 total files with Code Size differences (1 improved, 14 regressed), 259 unchanged.

Top method regressions (bytes):
         418 (23.87% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[int]:GetArrayString(bool):System.String:this
         418 (23.84% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[long]:GetArrayString(bool):System.String:this
         418 (23.83% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[short]:GetArrayString(bool):System.String:this
         418 (24.18% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.__Canon]:GetArrayString(bool):System.String:this
         418 (23.83% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.Nullable`1[int]]:GetArrayString(bool):System.String:this
         418 (23.47% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.Numerics.Vector`1[float]]:GetArrayString(bool):System.String:this
         418 (23.16% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[ubyte]:GetArrayString(bool):System.String:this
         412 (23.00% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[double]:GetArrayString(bool):System.String:this
         320 (43.18% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:EncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],int,ulong,bool):ubyte[]:this
         214 (81.68% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:set_RawSchedule(bool[,,]):this
         204 (87.18% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:get_RawSchedule():bool[,,]:this
         185 (14.26% of base) : System.Runtime.Numerics.dasm - System.Numerics.BigInteger:.ctor(System.ReadOnlySpan`1[ubyte],bool,bool):this
         169 (12.36% of base) : xunit.runner.reporters.netcoreapp10.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         169 (12.36% of base) : xunit.runner.utility.netcoreapp10.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         169 (12.36% of base) : xunit.console.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         167 (46.65% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:TryEncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,bool,byref):bool
         132 (61.40% of base) : System.Private.Xml.dasm - System.Xml.Schema.SchemaInfo:Finish():this
         101 (19.88% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC:OnEnd(Microsoft.Diagnostics.Tracing.Analysis.TraceGarbageCollector):this
         100 (101.01% of base) : System.Private.Xml.dasm - System.Xml.Xsl.Xslt.Compiler:MergeWhitespaceRules(System.Xml.Xsl.Xslt.Stylesheet):this
          96 (19.01% of base) : System.Composition.Hosting.dasm - System.Composition.Hosting.Util.SmallSparseInitonlyArray:Add(int,System.Object):this

Top method improvements (bytes):
         -22 (-5.47% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Add():this
         -22 (-6.32% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Remove():this
          -3 (-0.32% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Etlx.TraceLog+<>c__DisplayClass84_0:<SetupCallbacks>b__9(Microsoft.Diagnostics.Tracing.Parsers.Kernel.ImageLoadTraceData):this
          -2 (-3.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileVersion():System.String:this
          -1 (-0.13% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.GCCondemnedReasons:Decode(int):this

Top method regressions (percentages):
          62 (140.91% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.BPerfEventSource:DecodeMod(byref):uint
         100 (101.01% of base) : System.Private.Xml.dasm - System.Xml.Xsl.Xslt.Compiler:MergeWhitespaceRules(System.Xml.Xsl.Xslt.Stylesheet):this
         204 (87.18% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:get_RawSchedule():bool[,,]:this
         214 (81.68% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:set_RawSchedule(bool[,,]):this
          55 (76.39% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC:GetHeapSizeBeforeMB(System.Collections.Generic.List`1[Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC],Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC):double
         132 (61.40% of base) : System.Private.Xml.dasm - System.Xml.Schema.SchemaInfo:Finish():this
         167 (46.65% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:TryEncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,bool,byref):bool
         320 (43.18% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:EncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],int,ulong,bool):ubyte[]:this
          20 (39.22% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_VerLanguage():System.String:this
          79 (39.11% of base) : System.Private.Xml.dasm - System.Xml.Xsl.XsltOld.Compiler:AddScript(int,System.String):this
          34 (39.08% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Accumulate512(ulong,ulong,ulong)
          34 (39.08% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Accumulate512Inlined(ulong,ulong,ulong)
          60 (35.71% of base) : System.Composition.Hosting.dasm - System.Composition.Hosting.Util.SmallSparseInitonlyArray:TryGetValue(int,byref):bool:this
          62 (25.51% of base) : System.Private.CoreLib.dasm - System.Buffers.Text.Utf8Parser:TryParseUInt16X(System.ReadOnlySpan`1[ubyte],byref,byref):bool
          36 (24.83% of base) : System.Private.CoreLib.dasm - System.IO.BinaryReader:Read7BitEncodedInt():int:this
         418 (24.18% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.__Canon]:GetArrayString(bool):System.String:this
         418 (23.87% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[int]:GetArrayString(bool):System.String:this
         418 (23.84% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[long]:GetArrayString(bool):System.String:this
         418 (23.83% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[short]:GetArrayString(bool):System.String:this
         418 (23.83% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.Nullable`1[int]]:GetArrayString(bool):System.String:this

Top method improvements (percentages):
         -22 (-6.32% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Remove():this
         -22 (-5.47% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Add():this
          -2 (-3.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileVersion():System.String:this
          -3 (-0.32% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Etlx.TraceLog+<>c__DisplayClass84_0:<SetupCallbacks>b__9(Microsoft.Diagnostics.Tracing.Parsers.Kernel.ImageLoadTraceData):this
          -1 (-0.13% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.GCCondemnedReasons:Decode(int):this

45 total methods with Code Size differences (5 improved, 40 regressed), 374670 unchanged.

Benchmarks

Found 105 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 503196
Total bytes of diff: 508557
Total bytes of delta: 5361 (1.07 % of base)
Total relative delta: 7.36
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
        3160 : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm (102.70% of base)
         839 : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm (6.44% of base)
         653 : Bytemark\Bytemark\Bytemark.dasm (0.89% of base)
         353 : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm (8.76% of base)
         224 : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm (1.51% of base)
         113 : V8\Richards\Richards\Richards.dasm (2.19% of base)
          19 : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm (0.48% of base)

7 total files with Code Size differences (0 improved, 7 regressed), 91 unchanged.

Top method regressions (bytes):
        3160 (161.72% of base) : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm - Benchstone.MDBenchI.MDPuzzle:DoIt():bool:this
         456 (30.28% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Init():this
         383 ( 3.83% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Main1(int):this
         353 (21.11% of base) : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
         169 (12.75% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this
         116 (13.21% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:DivideInternalFPF(byref,byref,byref)
         113 (81.29% of base) : V8\Richards\Richards\Richards.dasm - V8.Richards.WorkerTask:run(V8.Richards.Packet):V8.Richards.TaskControlBlock:this
          95 (10.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:DivideInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          76 (86.36% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:memmove(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          71 (85.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:memmove(byref,byref)
          55 ( 0.66% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          45 ( 5.85% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:MultiplyInternalFPF(byref,byref,byref)
          41 ( 5.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:MultiplyInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          31 (21.23% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:LongToInternalFPF(int,EMFloatClass+InternalFPF)
          29 (19.73% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:LongToInternalFPF(int,byref)
          27 (17.88% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:StickyShiftRightMant(byref,int)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFInfinity(EMFloatClass+InternalFPF,ubyte)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFZero(EMFloatClass+InternalFPF,ubyte)
          21 (13.64% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:StickyShiftRightMant(EMFloatClass+InternalFPF,int)
          19 ( 2.43% of base) : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm - BenchmarksGame.KNucleotide_1:Bench(System.IO.Stream,BenchmarksGame.TestHarnessHelpers,bool):bool

Top method regressions (percentages):
        3160 (161.72% of base) : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm - Benchstone.MDBenchI.MDPuzzle:DoIt():bool:this
          76 (86.36% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:memmove(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          71 (85.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:memmove(byref,byref)
         113 (81.29% of base) : V8\Richards\Richards\Richards.dasm - V8.Richards.WorkerTask:run(V8.Richards.Packet):V8.Richards.TaskControlBlock:this
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFInfinity(EMFloatClass+InternalFPF,ubyte)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFZero(EMFloatClass+InternalFPF,ubyte)
         456 (30.28% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Init():this
          19 (27.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFInfinity(byref,ubyte)
          19 (27.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFZero(byref,ubyte)
          31 (21.23% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:LongToInternalFPF(int,EMFloatClass+InternalFPF)
         353 (21.11% of base) : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          29 (19.73% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:LongToInternalFPF(int,byref)
          14 (18.67% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFNaN(byref)
          27 (17.88% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:StickyShiftRightMant(byref,int)
          21 (13.64% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:StickyShiftRightMant(EMFloatClass+InternalFPF,int)
         116 (13.21% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:DivideInternalFPF(byref,byref,byref)
         169 (12.75% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this
          95 (10.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:DivideInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
           7 ( 8.33% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFNaN(EMFloatClass+InternalFPF)
          45 ( 5.85% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:MultiplyInternalFPF(byref,byref,byref)

24 total methods with Code Size differences (0 improved, 24 regressed), 2939 unchanged.
DEFAULT_UNROLL_LOOP_MAX_ITERATION_COUNT: 6

Frameworks

Found 292 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 59982060
Total bytes of diff: 59989095
Total bytes of delta: 7035 (0.01 % of base)
Total relative delta: NaN
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
        3338 : System.Numerics.Tensors.dasm (1.11% of base)
         841 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.02% of base)
         649 : System.Security.Cryptography.dasm (0.07% of base)
         418 : System.DirectoryServices.dasm (0.09% of base)
         311 : System.Private.Xml.dasm (0.01% of base)
         271 : System.Private.CoreLib.dasm (0.00% of base)
         185 : System.Runtime.Numerics.dasm (0.15% of base)
         169 : xunit.console.dasm (0.17% of base)
         169 : xunit.runner.reporters.netcoreapp10.dasm (0.29% of base)
         169 : xunit.runner.utility.netcoreapp10.dasm (0.09% of base)
         156 : System.Composition.Hosting.dasm (0.18% of base)
         151 : System.IO.Hashing.dasm (0.42% of base)
          84 : System.Runtime.Caching.dasm (0.13% of base)
          63 : System.Diagnostics.TextWriterTraceListener.dasm (0.37% of base)
          57 : System.Net.Security.dasm (0.03% of base)
          38 : System.Reflection.MetadataLoadContext.dasm (0.02% of base)
          10 : System.Private.DataContractSerialization.dasm (0.00% of base)

Top file improvements (bytes):
         -44 : System.Runtime.InteropServices.dasm (-1.31% of base)

18 total files with Code Size differences (1 improved, 17 regressed), 256 unchanged.

Top method regressions (bytes):
         418 (23.87% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[int]:GetArrayString(bool):System.String:this
         418 (23.84% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[long]:GetArrayString(bool):System.String:this
         418 (23.83% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[short]:GetArrayString(bool):System.String:this
         418 (24.18% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.__Canon]:GetArrayString(bool):System.String:this
         418 (23.83% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.Nullable`1[int]]:GetArrayString(bool):System.String:this
         418 (23.47% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.Numerics.Vector`1[float]]:GetArrayString(bool):System.String:this
         418 (23.16% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[ubyte]:GetArrayString(bool):System.String:this
         412 (23.00% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[double]:GetArrayString(bool):System.String:this
         320 (43.18% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:EncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],int,ulong,bool):ubyte[]:this
         214 (81.68% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:set_RawSchedule(bool[,,]):this
         204 (87.18% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:get_RawSchedule():bool[,,]:this
         185 (14.26% of base) : System.Runtime.Numerics.dasm - System.Numerics.BigInteger:.ctor(System.ReadOnlySpan`1[ubyte],bool,bool):this
         169 (12.36% of base) : xunit.console.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         169 (12.36% of base) : xunit.runner.reporters.netcoreapp10.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         169 (12.36% of base) : xunit.runner.utility.netcoreapp10.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         167 (46.65% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:TryEncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,bool,byref):bool
         135 (13.33% of base) : System.Private.CoreLib.dasm - System.Buffers.Text.Utf8Formatter:TryFormatDateTimeO(System.DateTime,System.TimeSpan,System.Span`1[ubyte],byref):bool
         132 (61.40% of base) : System.Private.Xml.dasm - System.Xml.Schema.SchemaInfo:Finish():this
         110 (20.75% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:PayloadValue(int):System.Object:this
         101 (19.88% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC:OnEnd(Microsoft.Diagnostics.Tracing.Analysis.TraceGarbageCollector):this

Top method improvements (bytes):
         -22 (-5.47% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Add():this
         -22 (-6.32% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Remove():this
          -2 (-3.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileVersion():System.String:this
          -1 (-0.13% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.GCCondemnedReasons:Decode(int):this

Top method regressions (percentages):
          62 (140.91% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.BPerfEventSource:DecodeMod(byref):uint
         100 (101.01% of base) : System.Private.Xml.dasm - System.Xml.Xsl.Xslt.Compiler:MergeWhitespaceRules(System.Xml.Xsl.Xslt.Stylesheet):this
         204 (87.18% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:get_RawSchedule():bool[,,]:this
          42 (82.35% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_CompanyName():System.String:this
         214 (81.68% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:set_RawSchedule(bool[,,]):this
          84 (81.55% of base) : System.Runtime.Caching.dasm - System.Runtime.Caching.MemoryMonitor:InitHistory():this
          83 (76.85% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:DeriveSecretFromSeed(ulong,ulong)
          55 (76.39% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC:GetHeapSizeBeforeMB(System.Collections.Generic.List`1[Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC],Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC):double
         132 (61.40% of base) : System.Private.Xml.dasm - System.Xml.Schema.SchemaInfo:Finish():this
          31 (60.78% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_ProductName():System.String:this
         167 (46.65% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:TryEncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,bool,byref):bool
         320 (43.18% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:EncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],int,ulong,bool):ubyte[]:this
          20 (39.22% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_VerLanguage():System.String:this
          79 (39.11% of base) : System.Private.Xml.dasm - System.Xml.Xsl.XsltOld.Compiler:AddScript(int,System.String):this
          34 (39.08% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Accumulate512(ulong,ulong,ulong)
          34 (39.08% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Accumulate512Inlined(ulong,ulong,ulong)
          60 (35.71% of base) : System.Composition.Hosting.dasm - System.Composition.Hosting.Util.SmallSparseInitonlyArray:TryGetValue(int,byref):bool:this
          62 (25.51% of base) : System.Private.CoreLib.dasm - System.Buffers.Text.Utf8Parser:TryParseUInt16X(System.ReadOnlySpan`1[ubyte],byref,byref):bool
          28 (25.23% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.CapiHelper:WriteDSSSeed(System.Security.Cryptography.DSAParameters,System.IO.BinaryWriter)
          36 (24.83% of base) : System.Private.CoreLib.dasm - System.IO.BinaryReader:Read7BitEncodedInt():int:this

Top method improvements (percentages):
         -22 (-6.32% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Remove():this
         -22 (-5.47% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Add():this
          -2 (-3.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileVersion():System.String:this
          -1 (-0.13% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.GCCondemnedReasons:Decode(int):this

55 total methods with Code Size differences (4 improved, 51 regressed), 374660 unchanged.

Benchmarks

Found 106 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 503196
Total bytes of diff: 508799
Total bytes of delta: 5603 (1.11 % of base)
Total relative delta: 8.23
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
        3271 : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm (106.30% of base)
         839 : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm (6.44% of base)
         653 : Bytemark\Bytemark\Bytemark.dasm (0.89% of base)
         453 : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm (11.24% of base)
         224 : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm (1.51% of base)
         113 : V8\Richards\Richards\Richards.dasm (2.19% of base)
          31 : FractalPerf\FractalPerf\FractalPerf.dasm (1.96% of base)
          19 : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm (0.48% of base)

8 total files with Code Size differences (0 improved, 8 regressed), 90 unchanged.

Top method regressions (bytes):
        3271 (167.40% of base) : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm - Benchstone.MDBenchI.MDPuzzle:DoIt():bool:this
         456 (30.28% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Init():this
         453 (27.09% of base) : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
         383 ( 3.83% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Main1(int):this
         169 (12.75% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this
         116 (13.21% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:DivideInternalFPF(byref,byref,byref)
         113 (81.29% of base) : V8\Richards\Richards\Richards.dasm - V8.Richards.WorkerTask:run(V8.Richards.Packet):V8.Richards.TaskControlBlock:this
          95 (10.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:DivideInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          76 (86.36% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:memmove(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          71 (85.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:memmove(byref,byref)
          55 ( 0.66% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          45 ( 5.85% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:MultiplyInternalFPF(byref,byref,byref)
          41 ( 5.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:MultiplyInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          31 (21.23% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:LongToInternalFPF(int,EMFloatClass+InternalFPF)
          31 (75.61% of base) : FractalPerf\FractalPerf\FractalPerf.dasm - FractalPerf.Launch:TestBase():bool
          29 (19.73% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:LongToInternalFPF(int,byref)
          27 (17.88% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:StickyShiftRightMant(byref,int)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFInfinity(EMFloatClass+InternalFPF,ubyte)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFZero(EMFloatClass+InternalFPF,ubyte)
          21 (13.64% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:StickyShiftRightMant(EMFloatClass+InternalFPF,int)

Top method regressions (percentages):
        3271 (167.40% of base) : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm - Benchstone.MDBenchI.MDPuzzle:DoIt():bool:this
          76 (86.36% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:memmove(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          71 (85.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:memmove(byref,byref)
         113 (81.29% of base) : V8\Richards\Richards\Richards.dasm - V8.Richards.WorkerTask:run(V8.Richards.Packet):V8.Richards.TaskControlBlock:this
          31 (75.61% of base) : FractalPerf\FractalPerf\FractalPerf.dasm - FractalPerf.Launch:TestBase():bool
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFInfinity(EMFloatClass+InternalFPF,ubyte)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFZero(EMFloatClass+InternalFPF,ubyte)
         456 (30.28% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Init():this
          19 (27.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFInfinity(byref,ubyte)
          19 (27.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFZero(byref,ubyte)
         453 (27.09% of base) : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          31 (21.23% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:LongToInternalFPF(int,EMFloatClass+InternalFPF)
          29 (19.73% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:LongToInternalFPF(int,byref)
          14 (18.67% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFNaN(byref)
          27 (17.88% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:StickyShiftRightMant(byref,int)
          21 (13.64% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:StickyShiftRightMant(EMFloatClass+InternalFPF,int)
         116 (13.21% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:DivideInternalFPF(byref,byref,byref)
         169 (12.75% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this
          95 (10.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:DivideInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
           7 ( 8.33% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFNaN(EMFloatClass+InternalFPF)

25 total methods with Code Size differences (0 improved, 25 regressed), 2938 unchanged.
DEFAULT_UNROLL_LOOP_MAX_ITERATION_COUNT: 8

Frameworks

Found 294 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 59982060
Total bytes of diff: 59990124
Total bytes of delta: 8064 (0.01 % of base)
Total relative delta: NaN
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
        3338 : System.Numerics.Tensors.dasm (1.11% of base)
        1365 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.04% of base)
         649 : System.Security.Cryptography.dasm (0.07% of base)
         576 : System.DirectoryServices.dasm (0.13% of base)
         350 : System.Private.CoreLib.dasm (0.01% of base)
         311 : System.Private.Xml.dasm (0.01% of base)
         230 : System.IO.Hashing.dasm (0.64% of base)
         185 : System.Runtime.Numerics.dasm (0.15% of base)
         169 : xunit.runner.utility.netcoreapp10.dasm (0.09% of base)
         169 : xunit.console.dasm (0.17% of base)
         169 : xunit.runner.reporters.netcoreapp10.dasm (0.29% of base)
         156 : System.Composition.Hosting.dasm (0.18% of base)
         116 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.00% of base)
          84 : System.Runtime.Caching.dasm (0.13% of base)
          73 : System.Formats.Asn1.dasm (0.08% of base)
          63 : System.Diagnostics.TextWriterTraceListener.dasm (0.37% of base)
          57 : System.Net.Security.dasm (0.03% of base)
          38 : System.Reflection.MetadataLoadContext.dasm (0.02% of base)
          10 : System.Private.DataContractSerialization.dasm (0.00% of base)

Top file improvements (bytes):
         -44 : System.Runtime.InteropServices.dasm (-1.31% of base)

20 total files with Code Size differences (1 improved, 19 regressed), 254 unchanged.

Top method regressions (bytes):
         418 (23.87% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[int]:GetArrayString(bool):System.String:this
         418 (23.84% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[long]:GetArrayString(bool):System.String:this
         418 (23.83% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[short]:GetArrayString(bool):System.String:this
         418 (24.18% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.__Canon]:GetArrayString(bool):System.String:this
         418 (23.83% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.Nullable`1[int]]:GetArrayString(bool):System.String:this
         418 (23.47% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.Numerics.Vector`1[float]]:GetArrayString(bool):System.String:this
         418 (23.16% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[ubyte]:GetArrayString(bool):System.String:this
         412 (23.00% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[double]:GetArrayString(bool):System.String:this
         320 (43.18% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:EncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],int,ulong,bool):ubyte[]:this
         237 (44.72% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:PayloadValue(int):System.Object:this
         214 (81.68% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:set_RawSchedule(bool[,,]):this
         213 (10.88% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
         204 (87.18% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:get_RawSchedule():bool[,,]:this
         185 (14.26% of base) : System.Runtime.Numerics.dasm - System.Numerics.BigInteger:.ctor(System.ReadOnlySpan`1[ubyte],bool,bool):this
         169 (12.36% of base) : xunit.runner.utility.netcoreapp10.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         169 (12.36% of base) : xunit.console.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         169 (12.36% of base) : xunit.runner.reporters.netcoreapp10.dasm - Xunit.JsonBuffer:ReadString():System.String:this
         167 (46.65% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:TryEncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,bool,byref):bool
         158 (190.36% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:SetDailySchedule(int,int,int,int):this
         135 (13.33% of base) : System.Private.CoreLib.dasm - System.Buffers.Text.Utf8Formatter:TryFormatDateTimeO(System.DateTime,System.TimeSpan,System.Span`1[ubyte],byref):bool

Top method improvements (bytes):
         -22 (-5.47% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Add():this
         -22 (-6.32% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Remove():this
          -2 (-3.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileVersion():System.String:this
          -1 (-0.13% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.GCCondemnedReasons:Decode(int):this

Top method regressions (percentages):
         116 (374.19% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.Symbols.CRC32:CalcEntry(uint):uint
         158 (190.36% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:SetDailySchedule(int,int,int,int):this
          62 (140.91% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.BPerfEventSource:DecodeMod(byref):uint
          64 (125.49% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileId():System.String:this
          73 (114.06% of base) : System.Formats.Asn1.dasm - System.Formats.Asn1.AsnDecoder:InterpretNamedBitListReversed(System.ReadOnlySpan`1[ubyte]):long
          53 (103.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_ProductVersion():System.String:this
         100 (101.01% of base) : System.Private.Xml.dasm - System.Xml.Xsl.Xslt.Compiler:MergeWhitespaceRules(System.Xml.Xsl.Xslt.Stylesheet):this
          79 (98.75% of base) : System.Private.CoreLib.dasm - System.Numerics.Crc32ReflectedTable:Generate(uint):uint[]
          79 (98.75% of base) : System.IO.Hashing.dasm - System.Numerics.Crc32ReflectedTable:Generate(uint):uint[]
         204 (87.18% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:get_RawSchedule():bool[,,]:this
          42 (82.35% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_CompanyName():System.String:this
         214 (81.68% of base) : System.DirectoryServices.dasm - System.DirectoryServices.ActiveDirectory.ActiveDirectorySchedule:set_RawSchedule(bool[,,]):this
          84 (81.55% of base) : System.Runtime.Caching.dasm - System.Runtime.Caching.MemoryMonitor:InitHistory():this
          83 (76.85% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:DeriveSecretFromSeed(ulong,ulong)
          55 (76.39% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC:GetHeapSizeBeforeMB(System.Collections.Generic.List`1[Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC],Microsoft.Diagnostics.Tracing.Analysis.GC.TraceGC):double
         132 (61.40% of base) : System.Private.Xml.dasm - System.Xml.Schema.SchemaInfo:Finish():this
          31 (60.78% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_ProductName():System.String:this
         167 (46.65% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:TryEncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],System.Span`1[ubyte],int,ulong,bool,byref):bool
         237 (44.72% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:PayloadValue(int):System.Object:this
         320 (43.18% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.RSACng:EncryptOrDecrypt(Microsoft.Win32.SafeHandles.SafeNCryptKeyHandle,System.ReadOnlySpan`1[ubyte],int,ulong,bool):ubyte[]:this

Top method improvements (percentages):
         -22 (-6.32% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Remove():this
         -22 (-5.47% of base) : System.Runtime.InteropServices.dasm - System.Runtime.InteropServices.HandleCollector:Add():this
          -2 (-3.92% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:get_FileVersion():System.String:this
          -1 (-0.13% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.GC.GCCondemnedReasons:Decode(int):this

63 total methods with Code Size differences (4 improved, 59 regressed), 374652 unchanged.

Benchmarks

Found 107 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 503196
Total bytes of diff: 508872
Total bytes of delta: 5676 (1.13 % of base)
Total relative delta: 9.58
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
        3271 : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm (106.30% of base)
         839 : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm (6.44% of base)
         653 : Bytemark\Bytemark\Bytemark.dasm (0.89% of base)
         453 : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm (11.24% of base)
         224 : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm (1.51% of base)
         113 : V8\Richards\Richards\Richards.dasm (2.19% of base)
          73 : Benchstones\BenchI\Permutate\Permutate\Permutate.dasm (9.64% of base)
          31 : FractalPerf\FractalPerf\FractalPerf.dasm (1.96% of base)
          19 : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm (0.48% of base)

9 total files with Code Size differences (0 improved, 9 regressed), 89 unchanged.

Top method regressions (bytes):
        3271 (167.40% of base) : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm - Benchstone.MDBenchI.MDPuzzle:DoIt():bool:this
         456 (30.28% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Init():this
         453 (27.09% of base) : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
         383 ( 3.83% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Main1(int):this
         169 (12.75% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this
         116 (13.21% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:DivideInternalFPF(byref,byref,byref)
         113 (81.29% of base) : V8\Richards\Richards\Richards.dasm - V8.Richards.WorkerTask:run(V8.Richards.Packet):V8.Richards.TaskControlBlock:this
          95 (10.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:DivideInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          76 (86.36% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:memmove(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          73 (135.19% of base) : Benchstones\BenchI\Permutate\Permutate\Permutate.dasm - Benchstone.BenchI.Permutate:Initialize():this
          71 (85.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:memmove(byref,byref)
          55 ( 0.66% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          45 ( 5.85% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:MultiplyInternalFPF(byref,byref,byref)
          41 ( 5.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:MultiplyInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          31 (21.23% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:LongToInternalFPF(int,EMFloatClass+InternalFPF)
          31 (75.61% of base) : FractalPerf\FractalPerf\FractalPerf.dasm - FractalPerf.Launch:TestBase():bool
          29 (19.73% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:LongToInternalFPF(int,byref)
          27 (17.88% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:StickyShiftRightMant(byref,int)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFInfinity(EMFloatClass+InternalFPF,ubyte)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFZero(EMFloatClass+InternalFPF,ubyte)

Top method regressions (percentages):
        3271 (167.40% of base) : Benchstones\MDBenchI\MDPuzzle\MDPuzzle\MDPuzzle.dasm - Benchstone.MDBenchI.MDPuzzle:DoIt():bool:this
          73 (135.19% of base) : Benchstones\BenchI\Permutate\Permutate\Permutate.dasm - Benchstone.BenchI.Permutate:Initialize():this
          76 (86.36% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:memmove(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)
          71 (85.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:memmove(byref,byref)
         113 (81.29% of base) : V8\Richards\Richards\Richards.dasm - V8.Richards.WorkerTask:run(V8.Richards.Packet):V8.Richards.TaskControlBlock:this
          31 (75.61% of base) : FractalPerf\FractalPerf\FractalPerf.dasm - FractalPerf.Launch:TestBase():bool
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFInfinity(EMFloatClass+InternalFPF,ubyte)
          21 (30.43% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:SetInternalFPFZero(EMFloatClass+InternalFPF,ubyte)
         456 (30.28% of base) : Benchstones\MDBenchF\MDLLoops\MDLLoops\MDLLoops.dasm - Benchstone.MDBenchF.MDLLoops:Init():this
          19 (27.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFInfinity(byref,ubyte)
          19 (27.54% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFZero(byref,ubyte)
         453 (27.09% of base) : Benchstones\BenchI\Puzzle\Puzzle\Puzzle.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          31 (21.23% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:LongToInternalFPF(int,EMFloatClass+InternalFPF)
          29 (19.73% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:LongToInternalFPF(int,byref)
          14 (18.67% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:SetInternalFPFNaN(byref)
          27 (17.88% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:StickyShiftRightMant(byref,int)
          21 (13.64% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:StickyShiftRightMant(EMFloatClass+InternalFPF,int)
         116 (13.21% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloat:DivideInternalFPF(byref,byref,byref)
         169 (12.75% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this
          95 (10.17% of base) : Bytemark\Bytemark\Bytemark.dasm - EMFloatClass:DivideInternalFPF(EMFloatClass+InternalFPF,EMFloatClass+InternalFPF,EMFloatClass+InternalFPF)

26 total methods with Code Size differences (0 improved, 26 regressed), 2937 unchanged.

@tannergooding
Copy link
Member Author

/azp list

@azure-pipelines
Copy link

CI/CD Pipelines for this repository:

@tannergooding
Copy link
Member Author

tannergooding commented Jan 10, 2023

The SuperPMI diffs in CI are a lot larger than what I got locally, particularly in tests, there is a small TP impact as well.

CC. @BruceForstall. Would appreciate some input on if you think this is something worth moving forward on given where all the loop code is today.

It is worth noting most of the tests are HWIntrinsic tests where the code is explicitly doing constant count loops that are emulating the actual instruction functionality for minimal correctness verification.

@BruceForstall
Copy link
Member

Diffs

My summary: it only modestly kicks in on most collections, except for coreclr_tests where it is mostly in the templatized HW Intrinsics tests. E.g., it hits in MDPuzzle:DoIt() which is a perfect example of code where it should kick in. The TP impact is mostly on benchmarks and coreclr_tests -- presumably only in the HW intrinsics cases where it kicks in. It's possible, though unlikely, many cases are going through the "sizing" loop which might not be cheap and then not unrolling.

I'd like to see this change checked in, especially early in the cycle where we can get perf numbers early. Choosing 4 as the max unroll count seems completely reasonable. The code size metric may need to be tuned, but since it doesn't kick in too much anyway, it's not clear how to tune it better than it already is.

cc @dotnet/jit-contrib

@tannergooding
Copy link
Member Author

tannergooding commented Jan 18, 2023

Merging in latest main and rerunning CI since general direction seems favorable/positive

Definitely needs to run stress tests before PR gets merged

@AndyAyersMS
Copy link
Member

Are there examples where we know this extra unrolling will make a big impact on perf?

@tannergooding
Copy link
Member Author

Are there examples where we know this extra unrolling will make a big impact on perf?

@AndyAyersMS not in our own code. Most of our own perf critical code is explicitly using vectors over unknown length inputs.

@tannergooding tannergooding marked this pull request as ready for review January 26, 2023 18:21
@tannergooding
Copy link
Member Author

tannergooding commented Jan 26, 2023

Fixed the one failure which was a change around the generated code which impacted the new "codegen verification" test feature.

The particular tests were impacted since they are getting unrolled now and that also leads to some large dead code elimination which meant that verifying the codegen would've been very difficult.

That also calls out that there is an existing missing loop optimization around converting something like this to not loop and just assign result directly once:

for (var i = 0; i < 4; i++)
{
    result = AdvSimd.CompareEqual(left, asVar);
}

That is, recognizing when the loop is doing throwaway or non-side effecting work if executed more than once

@BruceForstall, do you know if we have an existing issue tracking that?

@tannergooding
Copy link
Member Author

tannergooding commented Jan 26, 2023

CC. @dotnet/jit-contrib, this should be ready for review.

If we can get it reviewed/merged before EOD tomorrow it will make the first preview and give us the greatest amount of time for feedback/etc.

@BruceForstall
Copy link
Member

@BruceForstall, do you know if we have an existing issue tracking that?

Not that I'm aware of. Looks like we should be able to hoist this loop-invariant code leading to an empty loop.

// 'asVar' should be propagated.
// 'AdvSimd.CompareEqual' should be hoisted out of the loops.
// ARM64-FULL-LINE: fcmeq {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, #0.0
// ARM64: blt
// ARM64-NOT: fcmeq

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test unused / uninteresting due to removing these lines? I thought that tests marked <HasDisasmCheck>true</HasDisasmCheck> are only used for disasm checking, and aren't actually run (but maybe that will change?). @TIHan @markples ?

Copy link
Member Author

@tannergooding tannergooding Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually a test added quite a bit before DisasmCheck existed and is covering specific containment functionality: #33972

The disasm checks were added later since it represents a good candidate for validating the containment functionality is actually working as expected.

In this particular case, we could probably cover it as something like:

// ARM64-FULL-LINE: fcmeq {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, #0.0
// ARM64-NOT: blt
// ARM64-NOT: fcmeq

But the combination of loop unrolling + dead code optimization makes it a case that's a bit difficult to test/understand the reasoning around.

I think we're fine not having the coverage for these two subcases in particular given all the other disasm coverage of the scenario in the same file, but I'd be fine with adding the above lines if people would prefer they remain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HasDisasmCheck adds disasm checking in addition to execution. (In fact, as it is, it requires execution because we get the disasm via normal execution plus DOTNET_JitDisasm.)

Loop logic is difficult to test in a line-based system. The existence of fcmeq is probably covered reasonably by the other tests, so this one is a question of whether the trouble/brittleness is worth it to verify that intended loop transformations have occurred. (and it sounds like the answer is 'no')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HasDisasmCheck adds disasm checking in addition to execution

That's good to hear. However, there's an unfortunate wrinkle: it appears that all HasDisasmCheck tests also must set:

DOTNET_TieredCompilation=0
DOTNET_JITMinOpts=0

so those tests don't get alternative stress mode configurations run.

(this is subject for another issue, not this PR, or course)

@tannergooding tannergooding merged commit 7c2969e into dotnet:main Jan 27, 2023
@tannergooding tannergooding deleted the loop-unroll branch January 27, 2023 20:46
@@ -9424,7 +9424,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
#define DEFAULT_MAX_LOOPSIZE_FOR_ALIGN DEFAULT_ALIGN_LOOP_BOUNDARY * 3

// By default only single iteration loops will be unrolled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this comment need updating?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, oops. Totally missed the comment

Will get up a fix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants