Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loop Alignment support for Arm64 #60135

Merged
merged 11 commits into from
Oct 20, 2021

Conversation

kunalspathak
Copy link
Member

Add support for loop alignment for Arm64. This is an extension of adaptive loop alignment work done for xarch in #44370 .

The loop alignment is adaptive and the padding amount will be adjusted based on the number of blocks needed to fit the loop. Since the instruction encoding size for Arm64 is 4 bytes, 4 NOP will be added for a loop that can fit in single chunk of 32 bytes, and the padding amount will reduce as the loop size increases.

Max Pad (bytes) Minimum 32B blocks needed to fit the loop
16 1 (32 bytes)
12 2 (64 bytes)
8 3 (96 bytes)
4 4 (128 bytes)

Benchmarks windows/arm64 diffs:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 221276
Total bytes of diff: 223752
Total bytes of delta: 2476 (1.12% of base)
Total relative delta: 9.88
    diff is a regression.
    relative diff is a regression.
Detail diffs


Top file regressions (bytes):
          48 : 12587.dasm (3.28% of base)
          32 : 2381.dasm (8.89% of base)
          32 : 2757.dasm (2.69% of base)
          24 : 1201.dasm (2.37% of base)
          24 : 18571.dasm (1.14% of base)
          20 : 9146.dasm (1.71% of base)
          16 : 11289.dasm (19.05% of base)
          16 : 13071.dasm (4.17% of base)
          16 : 14407.dasm (4.71% of base)
          16 : 11879.dasm (4.71% of base)
          16 : 21920.dasm (4.71% of base)
          16 : 265.dasm (1.38% of base)
          16 : 12059.dasm (1.02% of base)
          16 : 4135.dasm (2.41% of base)
          16 : 13231.dasm (5.80% of base)
          16 : 17237.dasm (5.80% of base)
          16 : 19061.dasm (4.71% of base)
          16 : 1602.dasm (2.48% of base)
          16 : 21369.dasm (4.71% of base)
          16 : 22459.dasm (4.88% of base)

293 total files with Code Size differences (0 improved, 293 regressed), 1 unchanged.

Top method regressions (bytes):
          48 ( 3.28% of base) : 12587.dasm - BenchmarksGame.FannkuchRedux_9:Run(int,int)
          32 ( 2.69% of base) : 2757.dasm - System.Number:TryParseInt64IntegerStyle(System.ReadOnlySpan`1[Char],int,System.Globalization.NumberFormatInfo,byref):int
          32 ( 8.89% of base) : 2381.dasm - System.Xml.XmlStreamNodeWriter:UnsafeGetUTF8Chars(long,int,System.Byte[],int):int:this
          24 ( 2.37% of base) : 1201.dasm - System.Text.RegularExpressions.RegexBoyerMoore:.ctor(System.String,bool,bool,System.Globalization.CultureInfo):this
          24 ( 1.14% of base) : 18571.dasm - System.Threading.ReaderWriterLockSlim:TryEnterWriteLockCore(TimeoutTracker):bool:this
          20 ( 1.71% of base) : 9146.dasm - System.Number:TryParseUInt64IntegerStyle(System.ReadOnlySpan`1[Char],int,System.Globalization.NumberFormatInfo,byref):int
          16 ( 1.02% of base) : 12059.dasm - BenchmarksGame.FannkuchRedux_5:run(int,int,int)
          16 ( 2.37% of base) : 21241.dasm - BenchmarksGame.SpectralNorm_1:Bench(int):double:this
          16 ( 0.76% of base) : 23328.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          16 ( 1.47% of base) : 10517.dasm - DecCalc:VarDecMul(byref,byref)
          16 ( 5.80% of base) : 13231.dasm - System.Buffers.Text.Utf8Parser:TryParseByteD(System.ReadOnlySpan`1[Byte],byref,byref):bool
          16 ( 4.17% of base) : 13071.dasm - System.Buffers.Text.Utf8Parser:TryParseUInt16D(System.ReadOnlySpan`1[Byte],byref,byref):bool
          16 ( 2.41% of base) : 4135.dasm - System.Buffers.Text.Utf8Parser:TryParseUInt32D(System.ReadOnlySpan`1[Byte],byref,byref):bool
          16 (19.05% of base) : 11289.dasm - System.Collections.IndexerSet`1[Int32][System.Int32]:Span():int:this
          16 ( 4.71% of base) : 14822.dasm - System.MathBenchmarks.Double:AbsTest()
          16 ( 4.71% of base) : 21369.dasm - System.MathBenchmarks.Double:CeilingTest()
          16 ( 4.71% of base) : 21920.dasm - System.MathBenchmarks.Double:FloorTest()
          16 ( 4.88% of base) : 22459.dasm - System.MathBenchmarks.Double:ILogBTest()
          16 ( 4.71% of base) : 11879.dasm - System.MathBenchmarks.Double:RoundTest()
          16 ( 4.71% of base) : 13485.dasm - System.MathBenchmarks.Double:SqrtTest()

Top method regressions (percentages):
          12 (21.43% of base) : 18878.dasm - System.Diagnostics.Tracing.EventSource:GetDispatcher(System.Diagnostics.Tracing.EventListener):System.Diagnostics.Tracing.EventDispatcher:this
          16 (19.05% of base) : 11289.dasm - System.Collections.IndexerSet`1[Int32][System.Int32]:Span():int:this
          12 (17.65% of base) : 12576.dasm - System.Xml.Linq.XElement:Attribute(System.Xml.Linq.XName):System.Xml.Linq.XAttribute:this
           8 (16.67% of base) : 19693.dasm - System.Collections.Concurrent.ConcurrentStack`1[__Canon][System.__Canon]:get_Count():int:this
           8 (16.67% of base) : 14274.dasm - System.Collections.Concurrent.ConcurrentStack`1[Int32][System.Int32]:get_Count():int:this
          12 (13.64% of base) : 22676.dasm - System.Collections.IterateFor`1[Int32][System.Int32]:ReadOnlySpan():int:this
          12 (13.64% of base) : 22412.dasm - System.Collections.IterateFor`1[Int32][System.Int32]:Span():int:this
          12 (13.64% of base) : 19362.dasm - System.Collections.IterateForEach`1[Int32][System.Int32]:ReadOnlySpan():int:this
          12 (13.64% of base) : 18995.dasm - System.Collections.IterateForEach`1[Int32][System.Int32]:Span():int:this
           8 (13.33% of base) : 22034.dasm - Benchstone.BenchF.DMath:Fact(double):double
           8 (13.33% of base) : 22035.dasm - Benchstone.BenchF.DMath:Power(double,double):double
          12 (13.04% of base) : 17238.dasm - System.Reflection.BlobUtilities:WriteBytes(System.Byte[],int,ubyte,int)
           8 (12.50% of base) : 12681.dasm - Span.IndexerBench:TestIndexer1(System.Span`1[Byte]):ubyte
           8 (12.50% of base) : 13060.dasm - Span.IndexerBench:TestIndexer2(System.Span`1[Byte]):ubyte
           8 (12.50% of base) : 13330.dasm - Span.IndexerBench:TestIndexer3(System.Span`1[Byte]):ubyte
           8 (12.50% of base) : 18057.dasm - Span.IndexerBench:TestReadOnlyIndexer1(System.ReadOnlySpan`1[Byte]):ubyte
           8 (12.50% of base) : 18954.dasm - Span.IndexerBench:TestReadOnlyIndexer2(System.ReadOnlySpan`1[Byte]):ubyte
           8 (12.50% of base) : 11224.dasm - Span.IndexerBench:TestRef(System.Span`1[Byte]):ubyte
           8 (11.76% of base) : 14723.dasm - Span.IndexerBench:TestIndexer6(System.Span`1[Byte],Span.Sink):ubyte
          12 (11.54% of base) : 21117.dasm - System.Collections.IndexerSetReverse`1[Int32][System.Int32]:Span():int:this

293 total methods with Code Size differences (0 improved, 293 regressed), 1 unchanged.


Libraries windows/arm64 diffs:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 547924
Total bytes of diff: 555328
Total bytes of delta: 7404 (1.35% of base)
Total relative delta: 41.93
    diff is a regression.
    relative diff is a regression.
Detail diffs


Top file regressions (bytes):
          64 : 6462.dasm (1.75% of base)
          32 : 116552.dasm (8.89% of base)
          32 : 145813.dasm (7.92% of base)
          28 : 213124.dasm (1.77% of base)
          28 : 110271.dasm (6.14% of base)
          28 : 204842.dasm (7.69% of base)
          24 : 100302.dasm (1.50% of base)
          24 : 98371.dasm (9.68% of base)
          24 : 100294.dasm (8.57% of base)
          24 : 99802.dasm (10.71% of base)
          24 : 90757.dasm (0.93% of base)
          24 : 92581.dasm (9.38% of base)
          24 : 98133.dasm (9.68% of base)
          24 : 99568.dasm (10.71% of base)
          24 : 101605.dasm (1.32% of base)
          24 : 222436.dasm (2.37% of base)
          20 : 90768.dasm (0.75% of base)
          20 : 170368.dasm (1.38% of base)
          20 : 172208.dasm (0.98% of base)
          20 : 21798.dasm (6.58% of base)

956 total files with Code Size differences (0 improved, 956 regressed), 2 unchanged.

Top method regressions (bytes):
          64 ( 1.75% of base) : 6462.dasm - <StartupCode$FSharp-Core>.$Quotations:eq@197(Microsoft.FSharp.Quotations.Tree,Microsoft.FSharp.Quotations.Tree):bool
          32 ( 7.92% of base) : 145813.dasm - Microsoft.CSharp.RuntimeBinder.SymbolTable:FindMethodFromMemberInfo(System.Reflection.MemberInfo):Microsoft.CSharp.RuntimeBinder.Semantics.MethodSymbol
          32 ( 8.89% of base) : 116552.dasm - System.Xml.XmlStreamNodeWriter:UnsafeGetUTF8Chars(long,int,System.Byte[],int):int:this
          28 ( 6.14% of base) : 110271.dasm - System.Data.SqlTypes.SqlBinary:PerformCompareByte(System.Byte[],System.Byte[]):int
          28 ( 7.69% of base) : 204842.dasm - System.Net.WebSockets.ManagedWebSocket:ApplyMask(System.Span`1[Byte],int,int):int
          28 ( 1.77% of base) : 213124.dasm - System.Numerics.BigInteger:.ctor(System.ReadOnlySpan`1[Byte],bool,bool):this
          24 ( 1.32% of base) : 101605.dasm - <>c__DisplayClass32_0:<SetupCallbacks>b__46(Microsoft.Diagnostics.Tracing.Parsers.ClrPrivate.MulticoreJitPrivateTraceData):this
          24 (10.71% of base) : 99568.dasm - Microsoft.Diagnostics.Tracing.Parsers.ApplicationServer.Multidata42TemplateATraceData:PayloadValue(int):System.Object:this
          24 (10.71% of base) : 99802.dasm - Microsoft.Diagnostics.Tracing.Parsers.ApplicationServer.Multidata79TemplateATraceData:PayloadValue(int):System.Object:this
          24 ( 9.68% of base) : 98371.dasm - Microsoft.Diagnostics.Tracing.Parsers.ClrPrivate.DynamicTypeUseStringAndIntPrivateTraceData:PayloadValue(int):System.Object:this
          24 ( 9.68% of base) : 98133.dasm - Microsoft.Diagnostics.Tracing.Parsers.ClrPrivate.ModuleTransparencyCalculationTraceData:PayloadValue(int):System.Object:this
          24 ( 9.38% of base) : 92581.dasm - Microsoft.Diagnostics.Tracing.Parsers.LinuxKernel.ProcessStartTraceData:PayloadValue(int):System.Object:this
          24 ( 8.57% of base) : 100294.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftAntimalwareEngine.MessageUfsScanStart32Args_V1TraceData:PayloadValue(int):System.Object:this
          24 ( 1.50% of base) : 100302.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftAntimalwareEngine.MetaStoreTaskMetaStoreActionArgsTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          24 ( 0.93% of base) : 90757.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftWindowsTCPIP.IpDadFailedArgs:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          24 ( 2.37% of base) : 222436.dasm - System.Text.RegularExpressions.RegexBoyerMoore:.ctor(System.String,bool,bool,System.Globalization.CultureInfo):this
          20 ( 1.63% of base) : 235147.dasm - ArraySerializer:Deserialize(Xunit.Abstractions.IXunitSerializationInfo):this
          20 ( 6.58% of base) : 21798.dasm - Microsoft.CodeAnalysis.CSharp.Binder:CreateSourceIndicesArray(int,int):System.Int32[]
          20 ( 0.70% of base) : 97675.dasm - Microsoft.Diagnostics.Tracing.Parsers.Clr.ResolutionAttemptedTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          20 ( 1.27% of base) : 100328.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftAntimalwareEngine.PersistedStoreTaskPersistedStoreAnalyzeFileArgsTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this

Top method regressions (percentages):
          16 (36.36% of base) : 229123.dasm - Internal.JitInterface.MemoryHelper:FillMemory(long,ubyte,int)
          16 (36.36% of base) : 166457.dasm - System.ComponentModel.EventHandlerList:Find(System.Object):ListEntry:this
          16 (33.33% of base) : 82923.dasm - Microsoft.Diagnostics.Tracing.EventPipeEventMetaDataHeader:ClearMemory(long,int)
          16 (30.77% of base) : 174245.dasm - FilterAndTransform:Reverse(TransformSpec):TransformSpec
          16 (26.67% of base) : 130330.dasm - System.Xml.Xsl.XsltOld.DocumentScope:ResolveAtom(System.String):System.String:this
          12 (25.00% of base) : 174742.dasm - System.Diagnostics.Metrics.Counter`1[Vector`1][System.Numerics.Vector`1[System.Single]]:Add(System.Numerics.Vector`1[Single],System.ReadOnlySpan`1[[System.Collections.Generic.KeyValuePair`2[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]):this
          12 (25.00% of base) : 174809.dasm - System.Diagnostics.Metrics.Instrument`1[Vector`1][System.Numerics.Vector`1[System.Single]]:RecordMeasurement(System.Numerics.Vector`1[Single],System.ReadOnlySpan`1[[System.Collections.Generic.KeyValuePair`2[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]):this
          16 (23.53% of base) : 146733.dasm - Microsoft.CSharp.RuntimeBinder.Semantics.Symbol:LookupNext(long):Microsoft.CSharp.RuntimeBinder.Semantics.Symbol:this
          16 (22.22% of base) : 92551.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftWindowsNDISPacketCapture.PacketFragmentArgs:IsPrintable(long,long):bool
          12 (21.43% of base) : 74506.dasm - Microsoft.CodeAnalysis.RealParser:CountSignificantBits(ubyte):int
          12 (21.43% of base) : 206494.dasm - System.Xml.Linq.XElement:RemoveAttributesSkipNotify():this
          12 (20.00% of base) : 798.dasm - Microsoft.FSharp.Collections.ListDebugView`1[__Canon][System.__Canon]:get__FullList():System.__Canon[]:this
          12 (20.00% of base) : 797.dasm - Microsoft.FSharp.Collections.ListDebugView`1[__Canon][System.__Canon]:get_Items():System.__Canon[]:this
          12 (20.00% of base) : 803.dasm - Microsoft.FSharp.Collections.ListDebugView`1[Byte][System.Byte]:get__FullList():System.Byte[]:this
          12 (20.00% of base) : 802.dasm - Microsoft.FSharp.Collections.ListDebugView`1[Byte][System.Byte]:get_Items():System.Byte[]:this
          12 (18.75% of base) : 15261.dasm - System.MemoryExtensions:ClampStart(System.ReadOnlySpan`1[Int32],int):int
          12 (18.75% of base) : 15264.dasm - System.MemoryExtensions:ClampStart(System.ReadOnlySpan`1[Int64],long):int
          12 (17.65% of base) : 206572.dasm - System.Xml.Linq.XElement:Attribute(System.Xml.Linq.XName):System.Xml.Linq.XAttribute:this
          16 (16.67% of base) : 82945.dasm - Microsoft.Diagnostics.Tracing.TraceEventSource:GetContainerID(long):System.String:this
           8 (16.67% of base) : 162472.dasm - NodeKeyValueCollection:System.Collections.ICollection.get_Count():int:this

956 total methods with Code Size differences (0 improved, 956 regressed), 2 unchanged.


It is worth noting that because of fixed size encoding of arm64, very few methods benefit from loop alignment. For example, below is the diff for xarch.

x64:

Collection Methods affected Total padding (in bytes)
Benchmarks 470 4908
Libraries 1637 14967

arm64:

Collection Methods affected Total padding (in bytes)
Benchmarks 293 2476
Libraries 956 7404

Finally, as expected, the allocation size regression is same as code size regression we see above and there is no mismatch because there is we do not over-estimation the instruction sizes for arm64.

I tried to measure performance for some of the benchmarks that asmdiff pointed out and I didn't notice any significant performance and stability difference (cc @TamarChristinaArm ), but the plan is to merge this in and monitor the perf lab to see if it impacts benchmarks over the time. If we see that it adversely affects the benchmarks, we will revert this feature for arm64.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 7, 2021
@ghost
Copy link

ghost commented Oct 7, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Add support for loop alignment for Arm64. This is an extension of adaptive loop alignment work done for xarch in #44370 .

The loop alignment is adaptive and the padding amount will be adjusted based on the number of blocks needed to fit the loop. Since the instruction encoding size for Arm64 is 4 bytes, 4 NOP will be added for a loop that can fit in single chunk of 32 bytes, and the padding amount will reduce as the loop size increases.

Max Pad (bytes) Minimum 32B blocks needed to fit the loop
16 1 (32 bytes)
12 2 (64 bytes)
8 3 (96 bytes)
4 4 (128 bytes)

Benchmarks windows/arm64 diffs:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 221276
Total bytes of diff: 223752
Total bytes of delta: 2476 (1.12% of base)
Total relative delta: 9.88
    diff is a regression.
    relative diff is a regression.
Detail diffs


Top file regressions (bytes):
          48 : 12587.dasm (3.28% of base)
          32 : 2381.dasm (8.89% of base)
          32 : 2757.dasm (2.69% of base)
          24 : 1201.dasm (2.37% of base)
          24 : 18571.dasm (1.14% of base)
          20 : 9146.dasm (1.71% of base)
          16 : 11289.dasm (19.05% of base)
          16 : 13071.dasm (4.17% of base)
          16 : 14407.dasm (4.71% of base)
          16 : 11879.dasm (4.71% of base)
          16 : 21920.dasm (4.71% of base)
          16 : 265.dasm (1.38% of base)
          16 : 12059.dasm (1.02% of base)
          16 : 4135.dasm (2.41% of base)
          16 : 13231.dasm (5.80% of base)
          16 : 17237.dasm (5.80% of base)
          16 : 19061.dasm (4.71% of base)
          16 : 1602.dasm (2.48% of base)
          16 : 21369.dasm (4.71% of base)
          16 : 22459.dasm (4.88% of base)

293 total files with Code Size differences (0 improved, 293 regressed), 1 unchanged.

Top method regressions (bytes):
          48 ( 3.28% of base) : 12587.dasm - BenchmarksGame.FannkuchRedux_9:Run(int,int)
          32 ( 2.69% of base) : 2757.dasm - System.Number:TryParseInt64IntegerStyle(System.ReadOnlySpan`1[Char],int,System.Globalization.NumberFormatInfo,byref):int
          32 ( 8.89% of base) : 2381.dasm - System.Xml.XmlStreamNodeWriter:UnsafeGetUTF8Chars(long,int,System.Byte[],int):int:this
          24 ( 2.37% of base) : 1201.dasm - System.Text.RegularExpressions.RegexBoyerMoore:.ctor(System.String,bool,bool,System.Globalization.CultureInfo):this
          24 ( 1.14% of base) : 18571.dasm - System.Threading.ReaderWriterLockSlim:TryEnterWriteLockCore(TimeoutTracker):bool:this
          20 ( 1.71% of base) : 9146.dasm - System.Number:TryParseUInt64IntegerStyle(System.ReadOnlySpan`1[Char],int,System.Globalization.NumberFormatInfo,byref):int
          16 ( 1.02% of base) : 12059.dasm - BenchmarksGame.FannkuchRedux_5:run(int,int,int)
          16 ( 2.37% of base) : 21241.dasm - BenchmarksGame.SpectralNorm_1:Bench(int):double:this
          16 ( 0.76% of base) : 23328.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          16 ( 1.47% of base) : 10517.dasm - DecCalc:VarDecMul(byref,byref)
          16 ( 5.80% of base) : 13231.dasm - System.Buffers.Text.Utf8Parser:TryParseByteD(System.ReadOnlySpan`1[Byte],byref,byref):bool
          16 ( 4.17% of base) : 13071.dasm - System.Buffers.Text.Utf8Parser:TryParseUInt16D(System.ReadOnlySpan`1[Byte],byref,byref):bool
          16 ( 2.41% of base) : 4135.dasm - System.Buffers.Text.Utf8Parser:TryParseUInt32D(System.ReadOnlySpan`1[Byte],byref,byref):bool
          16 (19.05% of base) : 11289.dasm - System.Collections.IndexerSet`1[Int32][System.Int32]:Span():int:this
          16 ( 4.71% of base) : 14822.dasm - System.MathBenchmarks.Double:AbsTest()
          16 ( 4.71% of base) : 21369.dasm - System.MathBenchmarks.Double:CeilingTest()
          16 ( 4.71% of base) : 21920.dasm - System.MathBenchmarks.Double:FloorTest()
          16 ( 4.88% of base) : 22459.dasm - System.MathBenchmarks.Double:ILogBTest()
          16 ( 4.71% of base) : 11879.dasm - System.MathBenchmarks.Double:RoundTest()
          16 ( 4.71% of base) : 13485.dasm - System.MathBenchmarks.Double:SqrtTest()

Top method regressions (percentages):
          12 (21.43% of base) : 18878.dasm - System.Diagnostics.Tracing.EventSource:GetDispatcher(System.Diagnostics.Tracing.EventListener):System.Diagnostics.Tracing.EventDispatcher:this
          16 (19.05% of base) : 11289.dasm - System.Collections.IndexerSet`1[Int32][System.Int32]:Span():int:this
          12 (17.65% of base) : 12576.dasm - System.Xml.Linq.XElement:Attribute(System.Xml.Linq.XName):System.Xml.Linq.XAttribute:this
           8 (16.67% of base) : 19693.dasm - System.Collections.Concurrent.ConcurrentStack`1[__Canon][System.__Canon]:get_Count():int:this
           8 (16.67% of base) : 14274.dasm - System.Collections.Concurrent.ConcurrentStack`1[Int32][System.Int32]:get_Count():int:this
          12 (13.64% of base) : 22676.dasm - System.Collections.IterateFor`1[Int32][System.Int32]:ReadOnlySpan():int:this
          12 (13.64% of base) : 22412.dasm - System.Collections.IterateFor`1[Int32][System.Int32]:Span():int:this
          12 (13.64% of base) : 19362.dasm - System.Collections.IterateForEach`1[Int32][System.Int32]:ReadOnlySpan():int:this
          12 (13.64% of base) : 18995.dasm - System.Collections.IterateForEach`1[Int32][System.Int32]:Span():int:this
           8 (13.33% of base) : 22034.dasm - Benchstone.BenchF.DMath:Fact(double):double
           8 (13.33% of base) : 22035.dasm - Benchstone.BenchF.DMath:Power(double,double):double
          12 (13.04% of base) : 17238.dasm - System.Reflection.BlobUtilities:WriteBytes(System.Byte[],int,ubyte,int)
           8 (12.50% of base) : 12681.dasm - Span.IndexerBench:TestIndexer1(System.Span`1[Byte]):ubyte
           8 (12.50% of base) : 13060.dasm - Span.IndexerBench:TestIndexer2(System.Span`1[Byte]):ubyte
           8 (12.50% of base) : 13330.dasm - Span.IndexerBench:TestIndexer3(System.Span`1[Byte]):ubyte
           8 (12.50% of base) : 18057.dasm - Span.IndexerBench:TestReadOnlyIndexer1(System.ReadOnlySpan`1[Byte]):ubyte
           8 (12.50% of base) : 18954.dasm - Span.IndexerBench:TestReadOnlyIndexer2(System.ReadOnlySpan`1[Byte]):ubyte
           8 (12.50% of base) : 11224.dasm - Span.IndexerBench:TestRef(System.Span`1[Byte]):ubyte
           8 (11.76% of base) : 14723.dasm - Span.IndexerBench:TestIndexer6(System.Span`1[Byte],Span.Sink):ubyte
          12 (11.54% of base) : 21117.dasm - System.Collections.IndexerSetReverse`1[Int32][System.Int32]:Span():int:this

293 total methods with Code Size differences (0 improved, 293 regressed), 1 unchanged.


Libraries windows/arm64 diffs:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 547924
Total bytes of diff: 555328
Total bytes of delta: 7404 (1.35% of base)
Total relative delta: 41.93
    diff is a regression.
    relative diff is a regression.
Detail diffs


Top file regressions (bytes):
          64 : 6462.dasm (1.75% of base)
          32 : 116552.dasm (8.89% of base)
          32 : 145813.dasm (7.92% of base)
          28 : 213124.dasm (1.77% of base)
          28 : 110271.dasm (6.14% of base)
          28 : 204842.dasm (7.69% of base)
          24 : 100302.dasm (1.50% of base)
          24 : 98371.dasm (9.68% of base)
          24 : 100294.dasm (8.57% of base)
          24 : 99802.dasm (10.71% of base)
          24 : 90757.dasm (0.93% of base)
          24 : 92581.dasm (9.38% of base)
          24 : 98133.dasm (9.68% of base)
          24 : 99568.dasm (10.71% of base)
          24 : 101605.dasm (1.32% of base)
          24 : 222436.dasm (2.37% of base)
          20 : 90768.dasm (0.75% of base)
          20 : 170368.dasm (1.38% of base)
          20 : 172208.dasm (0.98% of base)
          20 : 21798.dasm (6.58% of base)

956 total files with Code Size differences (0 improved, 956 regressed), 2 unchanged.

Top method regressions (bytes):
          64 ( 1.75% of base) : 6462.dasm - <StartupCode$FSharp-Core>.$Quotations:eq@197(Microsoft.FSharp.Quotations.Tree,Microsoft.FSharp.Quotations.Tree):bool
          32 ( 7.92% of base) : 145813.dasm - Microsoft.CSharp.RuntimeBinder.SymbolTable:FindMethodFromMemberInfo(System.Reflection.MemberInfo):Microsoft.CSharp.RuntimeBinder.Semantics.MethodSymbol
          32 ( 8.89% of base) : 116552.dasm - System.Xml.XmlStreamNodeWriter:UnsafeGetUTF8Chars(long,int,System.Byte[],int):int:this
          28 ( 6.14% of base) : 110271.dasm - System.Data.SqlTypes.SqlBinary:PerformCompareByte(System.Byte[],System.Byte[]):int
          28 ( 7.69% of base) : 204842.dasm - System.Net.WebSockets.ManagedWebSocket:ApplyMask(System.Span`1[Byte],int,int):int
          28 ( 1.77% of base) : 213124.dasm - System.Numerics.BigInteger:.ctor(System.ReadOnlySpan`1[Byte],bool,bool):this
          24 ( 1.32% of base) : 101605.dasm - <>c__DisplayClass32_0:<SetupCallbacks>b__46(Microsoft.Diagnostics.Tracing.Parsers.ClrPrivate.MulticoreJitPrivateTraceData):this
          24 (10.71% of base) : 99568.dasm - Microsoft.Diagnostics.Tracing.Parsers.ApplicationServer.Multidata42TemplateATraceData:PayloadValue(int):System.Object:this
          24 (10.71% of base) : 99802.dasm - Microsoft.Diagnostics.Tracing.Parsers.ApplicationServer.Multidata79TemplateATraceData:PayloadValue(int):System.Object:this
          24 ( 9.68% of base) : 98371.dasm - Microsoft.Diagnostics.Tracing.Parsers.ClrPrivate.DynamicTypeUseStringAndIntPrivateTraceData:PayloadValue(int):System.Object:this
          24 ( 9.68% of base) : 98133.dasm - Microsoft.Diagnostics.Tracing.Parsers.ClrPrivate.ModuleTransparencyCalculationTraceData:PayloadValue(int):System.Object:this
          24 ( 9.38% of base) : 92581.dasm - Microsoft.Diagnostics.Tracing.Parsers.LinuxKernel.ProcessStartTraceData:PayloadValue(int):System.Object:this
          24 ( 8.57% of base) : 100294.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftAntimalwareEngine.MessageUfsScanStart32Args_V1TraceData:PayloadValue(int):System.Object:this
          24 ( 1.50% of base) : 100302.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftAntimalwareEngine.MetaStoreTaskMetaStoreActionArgsTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          24 ( 0.93% of base) : 90757.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftWindowsTCPIP.IpDadFailedArgs:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          24 ( 2.37% of base) : 222436.dasm - System.Text.RegularExpressions.RegexBoyerMoore:.ctor(System.String,bool,bool,System.Globalization.CultureInfo):this
          20 ( 1.63% of base) : 235147.dasm - ArraySerializer:Deserialize(Xunit.Abstractions.IXunitSerializationInfo):this
          20 ( 6.58% of base) : 21798.dasm - Microsoft.CodeAnalysis.CSharp.Binder:CreateSourceIndicesArray(int,int):System.Int32[]
          20 ( 0.70% of base) : 97675.dasm - Microsoft.Diagnostics.Tracing.Parsers.Clr.ResolutionAttemptedTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          20 ( 1.27% of base) : 100328.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftAntimalwareEngine.PersistedStoreTaskPersistedStoreAnalyzeFileArgsTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this

Top method regressions (percentages):
          16 (36.36% of base) : 229123.dasm - Internal.JitInterface.MemoryHelper:FillMemory(long,ubyte,int)
          16 (36.36% of base) : 166457.dasm - System.ComponentModel.EventHandlerList:Find(System.Object):ListEntry:this
          16 (33.33% of base) : 82923.dasm - Microsoft.Diagnostics.Tracing.EventPipeEventMetaDataHeader:ClearMemory(long,int)
          16 (30.77% of base) : 174245.dasm - FilterAndTransform:Reverse(TransformSpec):TransformSpec
          16 (26.67% of base) : 130330.dasm - System.Xml.Xsl.XsltOld.DocumentScope:ResolveAtom(System.String):System.String:this
          12 (25.00% of base) : 174742.dasm - System.Diagnostics.Metrics.Counter`1[Vector`1][System.Numerics.Vector`1[System.Single]]:Add(System.Numerics.Vector`1[Single],System.ReadOnlySpan`1[[System.Collections.Generic.KeyValuePair`2[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]):this
          12 (25.00% of base) : 174809.dasm - System.Diagnostics.Metrics.Instrument`1[Vector`1][System.Numerics.Vector`1[System.Single]]:RecordMeasurement(System.Numerics.Vector`1[Single],System.ReadOnlySpan`1[[System.Collections.Generic.KeyValuePair`2[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]):this
          16 (23.53% of base) : 146733.dasm - Microsoft.CSharp.RuntimeBinder.Semantics.Symbol:LookupNext(long):Microsoft.CSharp.RuntimeBinder.Semantics.Symbol:this
          16 (22.22% of base) : 92551.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftWindowsNDISPacketCapture.PacketFragmentArgs:IsPrintable(long,long):bool
          12 (21.43% of base) : 74506.dasm - Microsoft.CodeAnalysis.RealParser:CountSignificantBits(ubyte):int
          12 (21.43% of base) : 206494.dasm - System.Xml.Linq.XElement:RemoveAttributesSkipNotify():this
          12 (20.00% of base) : 798.dasm - Microsoft.FSharp.Collections.ListDebugView`1[__Canon][System.__Canon]:get__FullList():System.__Canon[]:this
          12 (20.00% of base) : 797.dasm - Microsoft.FSharp.Collections.ListDebugView`1[__Canon][System.__Canon]:get_Items():System.__Canon[]:this
          12 (20.00% of base) : 803.dasm - Microsoft.FSharp.Collections.ListDebugView`1[Byte][System.Byte]:get__FullList():System.Byte[]:this
          12 (20.00% of base) : 802.dasm - Microsoft.FSharp.Collections.ListDebugView`1[Byte][System.Byte]:get_Items():System.Byte[]:this
          12 (18.75% of base) : 15261.dasm - System.MemoryExtensions:ClampStart(System.ReadOnlySpan`1[Int32],int):int
          12 (18.75% of base) : 15264.dasm - System.MemoryExtensions:ClampStart(System.ReadOnlySpan`1[Int64],long):int
          12 (17.65% of base) : 206572.dasm - System.Xml.Linq.XElement:Attribute(System.Xml.Linq.XName):System.Xml.Linq.XAttribute:this
          16 (16.67% of base) : 82945.dasm - Microsoft.Diagnostics.Tracing.TraceEventSource:GetContainerID(long):System.String:this
           8 (16.67% of base) : 162472.dasm - NodeKeyValueCollection:System.Collections.ICollection.get_Count():int:this

956 total methods with Code Size differences (0 improved, 956 regressed), 2 unchanged.


It is worth noting that because of fixed size encoding of arm64, very few methods benefit from loop alignment. For example, below is the diff for xarch.

x64:

Collection Methods affected Total padding (in bytes)
Benchmarks 470 4908
Libraries 1637 14967

arm64:

Collection Methods affected Total padding (in bytes)
Benchmarks 293 2476
Libraries 956 7404

Finally, as expected, the allocation size regression is same as code size regression we see above and there is no mismatch because there is we do not over-estimation the instruction sizes for arm64.

I tried to measure performance for some of the benchmarks that asmdiff pointed out and I didn't notice any significant performance and stability difference (cc @TamarChristinaArm ), but the plan is to merge this in and monitor the perf lab to see if it impacts benchmarks over the time. If we see that it adversely affects the benchmarks, we will revert this feature for arm64.

Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak kunalspathak changed the title Loopalignment arm64 Loop Alignment support for Arm64 Oct 7, 2021
@kunalspathak kunalspathak marked this pull request as ready for review October 18, 2021 15:24
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

@BruceForstall BruceForstall self-requested a review October 18, 2021 23:40
INS_OPTS_D_TO_H // Double to Half

#if FEATURE_LOOP_ALIGN
, INS_OPTS_ALIGN // Align instruction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you can have an INS_align with INS_OPTS_NONE that indicates the align instruction is ignored? In which case, shouldn't the comment say, "Align instruction that will be used (not ignored)"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you just use INS_align without any special INS_OPTS to mean "align 4 bytes", and munge it to INS_nop if you decide not to use it for alignment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is there is a real INS_nop in arm64 that we use it occasionally and it will be hard to distinguish at few places if the INS_nop is from the alignment or something else. With INS_OPTS it becomes easier because all the way through the instruction remain ins_align .

@kunalspathak
Copy link
Member Author

Addressed all your feedback. Please take another look.

@kunalspathak kunalspathak merged commit 7bd68df into dotnet:main Oct 20, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Nov 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants