Loop Alignment support for Arm64 #60135

kunalspathak · 2021-10-07T18:35:24Z

Add support for loop alignment for Arm64. This is an extension of adaptive loop alignment work done for xarch in #44370 .

The loop alignment is adaptive and the padding amount will be adjusted based on the number of blocks needed to fit the loop. Since the instruction encoding size for Arm64 is 4 bytes, 4 NOP will be added for a loop that can fit in single chunk of 32 bytes, and the padding amount will reduce as the loop size increases.

Max Pad (bytes)	Minimum 32B blocks needed to fit the loop
16	1 (32 bytes)
12	2 (64 bytes)
8	3 (96 bytes)
4	4 (128 bytes)

Benchmarks windows/arm64 diffs:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 221276
Total bytes of diff: 223752
Total bytes of delta: 2476 (1.12% of base)
Total relative delta: 9.88
    diff is a regression.
    relative diff is a regression.

Detail diffs



Top file regressions (bytes):
          48 : 12587.dasm (3.28% of base)
          32 : 2381.dasm (8.89% of base)
          32 : 2757.dasm (2.69% of base)
          24 : 1201.dasm (2.37% of base)
          24 : 18571.dasm (1.14% of base)
          20 : 9146.dasm (1.71% of base)
          16 : 11289.dasm (19.05% of base)
          16 : 13071.dasm (4.17% of base)
          16 : 14407.dasm (4.71% of base)
          16 : 11879.dasm (4.71% of base)
          16 : 21920.dasm (4.71% of base)
          16 : 265.dasm (1.38% of base)
          16 : 12059.dasm (1.02% of base)
          16 : 4135.dasm (2.41% of base)
          16 : 13231.dasm (5.80% of base)
          16 : 17237.dasm (5.80% of base)
          16 : 19061.dasm (4.71% of base)
          16 : 1602.dasm (2.48% of base)
          16 : 21369.dasm (4.71% of base)
          16 : 22459.dasm (4.88% of base)

293 total files with Code Size differences (0 improved, 293 regressed), 1 unchanged.

Top method regressions (bytes):
          48 ( 3.28% of base) : 12587.dasm - BenchmarksGame.FannkuchRedux_9:Run(int,int)
          32 ( 2.69% of base) : 2757.dasm - System.Number:TryParseInt64IntegerStyle(System.ReadOnlySpan`1[Char],int,System.Globalization.NumberFormatInfo,byref):int
          32 ( 8.89% of base) : 2381.dasm - System.Xml.XmlStreamNodeWriter:UnsafeGetUTF8Chars(long,int,System.Byte[],int):int:this
          24 ( 2.37% of base) : 1201.dasm - System.Text.RegularExpressions.RegexBoyerMoore:.ctor(System.String,bool,bool,System.Globalization.CultureInfo):this
          24 ( 1.14% of base) : 18571.dasm - System.Threading.ReaderWriterLockSlim:TryEnterWriteLockCore(TimeoutTracker):bool:this
          20 ( 1.71% of base) : 9146.dasm - System.Number:TryParseUInt64IntegerStyle(System.ReadOnlySpan`1[Char],int,System.Globalization.NumberFormatInfo,byref):int
          16 ( 1.02% of base) : 12059.dasm - BenchmarksGame.FannkuchRedux_5:run(int,int,int)
          16 ( 2.37% of base) : 21241.dasm - BenchmarksGame.SpectralNorm_1:Bench(int):double:this
          16 ( 0.76% of base) : 23328.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          16 ( 1.47% of base) : 10517.dasm - DecCalc:VarDecMul(byref,byref)
          16 ( 5.80% of base) : 13231.dasm - System.Buffers.Text.Utf8Parser:TryParseByteD(System.ReadOnlySpan`1[Byte],byref,byref):bool
          16 ( 4.17% of base) : 13071.dasm - System.Buffers.Text.Utf8Parser:TryParseUInt16D(System.ReadOnlySpan`1[Byte],byref,byref):bool
          16 ( 2.41% of base) : 4135.dasm - System.Buffers.Text.Utf8Parser:TryParseUInt32D(System.ReadOnlySpan`1[Byte],byref,byref):bool
          16 (19.05% of base) : 11289.dasm - System.Collections.IndexerSet`1[Int32][System.Int32]:Span():int:this
          16 ( 4.71% of base) : 14822.dasm - System.MathBenchmarks.Double:AbsTest()
          16 ( 4.71% of base) : 21369.dasm - System.MathBenchmarks.Double:CeilingTest()
          16 ( 4.71% of base) : 21920.dasm - System.MathBenchmarks.Double:FloorTest()
          16 ( 4.88% of base) : 22459.dasm - System.MathBenchmarks.Double:ILogBTest()
          16 ( 4.71% of base) : 11879.dasm - System.MathBenchmarks.Double:RoundTest()
          16 ( 4.71% of base) : 13485.dasm - System.MathBenchmarks.Double:SqrtTest()

Top method regressions (percentages):
          12 (21.43% of base) : 18878.dasm - System.Diagnostics.Tracing.EventSource:GetDispatcher(System.Diagnostics.Tracing.EventListener):System.Diagnostics.Tracing.EventDispatcher:this
          16 (19.05% of base) : 11289.dasm - System.Collections.IndexerSet`1[Int32][System.Int32]:Span():int:this
          12 (17.65% of base) : 12576.dasm - System.Xml.Linq.XElement:Attribute(System.Xml.Linq.XName):System.Xml.Linq.XAttribute:this
           8 (16.67% of base) : 19693.dasm - System.Collections.Concurrent.ConcurrentStack`1[__Canon][System.__Canon]:get_Count():int:this
           8 (16.67% of base) : 14274.dasm - System.Collections.Concurrent.ConcurrentStack`1[Int32][System.Int32]:get_Count():int:this
          12 (13.64% of base) : 22676.dasm - System.Collections.IterateFor`1[Int32][System.Int32]:ReadOnlySpan():int:this
          12 (13.64% of base) : 22412.dasm - System.Collections.IterateFor`1[Int32][System.Int32]:Span():int:this
          12 (13.64% of base) : 19362.dasm - System.Collections.IterateForEach`1[Int32][System.Int32]:ReadOnlySpan():int:this
          12 (13.64% of base) : 18995.dasm - System.Collections.IterateForEach`1[Int32][System.Int32]:Span():int:this
           8 (13.33% of base) : 22034.dasm - Benchstone.BenchF.DMath:Fact(double):double
           8 (13.33% of base) : 22035.dasm - Benchstone.BenchF.DMath:Power(double,double):double
          12 (13.04% of base) : 17238.dasm - System.Reflection.BlobUtilities:WriteBytes(System.Byte[],int,ubyte,int)
           8 (12.50% of base) : 12681.dasm - Span.IndexerBench:TestIndexer1(System.Span`1[Byte]):ubyte
           8 (12.50% of base) : 13060.dasm - Span.IndexerBench:TestIndexer2(System.Span`1[Byte]):ubyte
           8 (12.50% of base) : 13330.dasm - Span.IndexerBench:TestIndexer3(System.Span`1[Byte]):ubyte
           8 (12.50% of base) : 18057.dasm - Span.IndexerBench:TestReadOnlyIndexer1(System.ReadOnlySpan`1[Byte]):ubyte
           8 (12.50% of base) : 18954.dasm - Span.IndexerBench:TestReadOnlyIndexer2(System.ReadOnlySpan`1[Byte]):ubyte
           8 (12.50% of base) : 11224.dasm - Span.IndexerBench:TestRef(System.Span`1[Byte]):ubyte
           8 (11.76% of base) : 14723.dasm - Span.IndexerBench:TestIndexer6(System.Span`1[Byte],Span.Sink):ubyte
          12 (11.54% of base) : 21117.dasm - System.Collections.IndexerSetReverse`1[Int32][System.Int32]:Span():int:this

293 total methods with Code Size differences (0 improved, 293 regressed), 1 unchanged.

Libraries windows/arm64 diffs:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 547924
Total bytes of diff: 555328
Total bytes of delta: 7404 (1.35% of base)
Total relative delta: 41.93
    diff is a regression.
    relative diff is a regression.

Detail diffs



Top file regressions (bytes):
          64 : 6462.dasm (1.75% of base)
          32 : 116552.dasm (8.89% of base)
          32 : 145813.dasm (7.92% of base)
          28 : 213124.dasm (1.77% of base)
          28 : 110271.dasm (6.14% of base)
          28 : 204842.dasm (7.69% of base)
          24 : 100302.dasm (1.50% of base)
          24 : 98371.dasm (9.68% of base)
          24 : 100294.dasm (8.57% of base)
          24 : 99802.dasm (10.71% of base)
          24 : 90757.dasm (0.93% of base)
          24 : 92581.dasm (9.38% of base)
          24 : 98133.dasm (9.68% of base)
          24 : 99568.dasm (10.71% of base)
          24 : 101605.dasm (1.32% of base)
          24 : 222436.dasm (2.37% of base)
          20 : 90768.dasm (0.75% of base)
          20 : 170368.dasm (1.38% of base)
          20 : 172208.dasm (0.98% of base)
          20 : 21798.dasm (6.58% of base)

956 total files with Code Size differences (0 improved, 956 regressed), 2 unchanged.

Top method regressions (bytes):
          64 ( 1.75% of base) : 6462.dasm - <StartupCode$FSharp-Core>.$Quotations:eq@197(Microsoft.FSharp.Quotations.Tree,Microsoft.FSharp.Quotations.Tree):bool
          32 ( 7.92% of base) : 145813.dasm - Microsoft.CSharp.RuntimeBinder.SymbolTable:FindMethodFromMemberInfo(System.Reflection.MemberInfo):Microsoft.CSharp.RuntimeBinder.Semantics.MethodSymbol
          32 ( 8.89% of base) : 116552.dasm - System.Xml.XmlStreamNodeWriter:UnsafeGetUTF8Chars(long,int,System.Byte[],int):int:this
          28 ( 6.14% of base) : 110271.dasm - System.Data.SqlTypes.SqlBinary:PerformCompareByte(System.Byte[],System.Byte[]):int
          28 ( 7.69% of base) : 204842.dasm - System.Net.WebSockets.ManagedWebSocket:ApplyMask(System.Span`1[Byte],int,int):int
          28 ( 1.77% of base) : 213124.dasm - System.Numerics.BigInteger:.ctor(System.ReadOnlySpan`1[Byte],bool,bool):this
          24 ( 1.32% of base) : 101605.dasm - <>c__DisplayClass32_0:<SetupCallbacks>b__46(Microsoft.Diagnostics.Tracing.Parsers.ClrPrivate.MulticoreJitPrivateTraceData):this
          24 (10.71% of base) : 99568.dasm - Microsoft.Diagnostics.Tracing.Parsers.ApplicationServer.Multidata42TemplateATraceData:PayloadValue(int):System.Object:this
          24 (10.71% of base) : 99802.dasm - Microsoft.Diagnostics.Tracing.Parsers.ApplicationServer.Multidata79TemplateATraceData:PayloadValue(int):System.Object:this
          24 ( 9.68% of base) : 98371.dasm - Microsoft.Diagnostics.Tracing.Parsers.ClrPrivate.DynamicTypeUseStringAndIntPrivateTraceData:PayloadValue(int):System.Object:this
          24 ( 9.68% of base) : 98133.dasm - Microsoft.Diagnostics.Tracing.Parsers.ClrPrivate.ModuleTransparencyCalculationTraceData:PayloadValue(int):System.Object:this
          24 ( 9.38% of base) : 92581.dasm - Microsoft.Diagnostics.Tracing.Parsers.LinuxKernel.ProcessStartTraceData:PayloadValue(int):System.Object:this
          24 ( 8.57% of base) : 100294.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftAntimalwareEngine.MessageUfsScanStart32Args_V1TraceData:PayloadValue(int):System.Object:this
          24 ( 1.50% of base) : 100302.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftAntimalwareEngine.MetaStoreTaskMetaStoreActionArgsTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          24 ( 0.93% of base) : 90757.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftWindowsTCPIP.IpDadFailedArgs:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          24 ( 2.37% of base) : 222436.dasm - System.Text.RegularExpressions.RegexBoyerMoore:.ctor(System.String,bool,bool,System.Globalization.CultureInfo):this
          20 ( 1.63% of base) : 235147.dasm - ArraySerializer:Deserialize(Xunit.Abstractions.IXunitSerializationInfo):this
          20 ( 6.58% of base) : 21798.dasm - Microsoft.CodeAnalysis.CSharp.Binder:CreateSourceIndicesArray(int,int):System.Int32[]
          20 ( 0.70% of base) : 97675.dasm - Microsoft.Diagnostics.Tracing.Parsers.Clr.ResolutionAttemptedTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          20 ( 1.27% of base) : 100328.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftAntimalwareEngine.PersistedStoreTaskPersistedStoreAnalyzeFileArgsTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this

Top method regressions (percentages):
          16 (36.36% of base) : 229123.dasm - Internal.JitInterface.MemoryHelper:FillMemory(long,ubyte,int)
          16 (36.36% of base) : 166457.dasm - System.ComponentModel.EventHandlerList:Find(System.Object):ListEntry:this
          16 (33.33% of base) : 82923.dasm - Microsoft.Diagnostics.Tracing.EventPipeEventMetaDataHeader:ClearMemory(long,int)
          16 (30.77% of base) : 174245.dasm - FilterAndTransform:Reverse(TransformSpec):TransformSpec
          16 (26.67% of base) : 130330.dasm - System.Xml.Xsl.XsltOld.DocumentScope:ResolveAtom(System.String):System.String:this
          12 (25.00% of base) : 174742.dasm - System.Diagnostics.Metrics.Counter`1[Vector`1][System.Numerics.Vector`1[System.Single]]:Add(System.Numerics.Vector`1[Single],System.ReadOnlySpan`1[[System.Collections.Generic.KeyValuePair`2[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]):this
          12 (25.00% of base) : 174809.dasm - System.Diagnostics.Metrics.Instrument`1[Vector`1][System.Numerics.Vector`1[System.Single]]:RecordMeasurement(System.Numerics.Vector`1[Single],System.ReadOnlySpan`1[[System.Collections.Generic.KeyValuePair`2[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]):this
          16 (23.53% of base) : 146733.dasm - Microsoft.CSharp.RuntimeBinder.Semantics.Symbol:LookupNext(long):Microsoft.CSharp.RuntimeBinder.Semantics.Symbol:this
          16 (22.22% of base) : 92551.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftWindowsNDISPacketCapture.PacketFragmentArgs:IsPrintable(long,long):bool
          12 (21.43% of base) : 74506.dasm - Microsoft.CodeAnalysis.RealParser:CountSignificantBits(ubyte):int
          12 (21.43% of base) : 206494.dasm - System.Xml.Linq.XElement:RemoveAttributesSkipNotify():this
          12 (20.00% of base) : 798.dasm - Microsoft.FSharp.Collections.ListDebugView`1[__Canon][System.__Canon]:get__FullList():System.__Canon[]:this
          12 (20.00% of base) : 797.dasm - Microsoft.FSharp.Collections.ListDebugView`1[__Canon][System.__Canon]:get_Items():System.__Canon[]:this
          12 (20.00% of base) : 803.dasm - Microsoft.FSharp.Collections.ListDebugView`1[Byte][System.Byte]:get__FullList():System.Byte[]:this
          12 (20.00% of base) : 802.dasm - Microsoft.FSharp.Collections.ListDebugView`1[Byte][System.Byte]:get_Items():System.Byte[]:this
          12 (18.75% of base) : 15261.dasm - System.MemoryExtensions:ClampStart(System.ReadOnlySpan`1[Int32],int):int
          12 (18.75% of base) : 15264.dasm - System.MemoryExtensions:ClampStart(System.ReadOnlySpan`1[Int64],long):int
          12 (17.65% of base) : 206572.dasm - System.Xml.Linq.XElement:Attribute(System.Xml.Linq.XName):System.Xml.Linq.XAttribute:this
          16 (16.67% of base) : 82945.dasm - Microsoft.Diagnostics.Tracing.TraceEventSource:GetContainerID(long):System.String:this
           8 (16.67% of base) : 162472.dasm - NodeKeyValueCollection:System.Collections.ICollection.get_Count():int:this

956 total methods with Code Size differences (0 improved, 956 regressed), 2 unchanged.

It is worth noting that because of fixed size encoding of arm64, very few methods benefit from loop alignment. For example, below is the diff for xarch.

x64:

Collection	Methods affected	Total padding (in bytes)
Benchmarks	470	4908
Libraries	1637	14967

arm64:

Collection	Methods affected	Total padding (in bytes)
Benchmarks	293	2476
Libraries	956	7404

Finally, as expected, the allocation size regression is same as code size regression we see above and there is no mismatch because there is we do not over-estimation the instruction sizes for arm64.

I tried to measure performance for some of the benchmarks that asmdiff pointed out and I didn't notice any significant performance and stability difference (cc @TamarChristinaArm ), but the plan is to merge this in and monitor the perf lab to see if it impacts benchmarks over the time. If we see that it adversely affects the benchmarks, we will revert this feature for arm64.

ghost · 2021-10-07T18:35:32Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Add support for loop alignment for Arm64. This is an extension of adaptive loop alignment work done for xarch in #44370 .

The loop alignment is adaptive and the padding amount will be adjusted based on the number of blocks needed to fit the loop. Since the instruction encoding size for Arm64 is 4 bytes, 4 NOP will be added for a loop that can fit in single chunk of 32 bytes, and the padding amount will reduce as the loop size increases.

Max Pad (bytes)	Minimum 32B blocks needed to fit the loop
16	1 (32 bytes)
12	2 (64 bytes)
8	3 (96 bytes)
4	4 (128 bytes)

Benchmarks windows/arm64 diffs:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 221276
Total bytes of diff: 223752
Total bytes of delta: 2476 (1.12% of base)
Total relative delta: 9.88
    diff is a regression.
    relative diff is a regression.

Detail diffs



Top file regressions (bytes):
          48 : 12587.dasm (3.28% of base)
          32 : 2381.dasm (8.89% of base)
          32 : 2757.dasm (2.69% of base)
          24 : 1201.dasm (2.37% of base)
          24 : 18571.dasm (1.14% of base)
          20 : 9146.dasm (1.71% of base)
          16 : 11289.dasm (19.05% of base)
          16 : 13071.dasm (4.17% of base)
          16 : 14407.dasm (4.71% of base)
          16 : 11879.dasm (4.71% of base)
          16 : 21920.dasm (4.71% of base)
          16 : 265.dasm (1.38% of base)
          16 : 12059.dasm (1.02% of base)
          16 : 4135.dasm (2.41% of base)
          16 : 13231.dasm (5.80% of base)
          16 : 17237.dasm (5.80% of base)
          16 : 19061.dasm (4.71% of base)
          16 : 1602.dasm (2.48% of base)
          16 : 21369.dasm (4.71% of base)
          16 : 22459.dasm (4.88% of base)

293 total files with Code Size differences (0 improved, 293 regressed), 1 unchanged.

Top method regressions (bytes):
          48 ( 3.28% of base) : 12587.dasm - BenchmarksGame.FannkuchRedux_9:Run(int,int)
          32 ( 2.69% of base) : 2757.dasm - System.Number:TryParseInt64IntegerStyle(System.ReadOnlySpan`1[Char],int,System.Globalization.NumberFormatInfo,byref):int
          32 ( 8.89% of base) : 2381.dasm - System.Xml.XmlStreamNodeWriter:UnsafeGetUTF8Chars(long,int,System.Byte[],int):int:this
          24 ( 2.37% of base) : 1201.dasm - System.Text.RegularExpressions.RegexBoyerMoore:.ctor(System.String,bool,bool,System.Globalization.CultureInfo):this
          24 ( 1.14% of base) : 18571.dasm - System.Threading.ReaderWriterLockSlim:TryEnterWriteLockCore(TimeoutTracker):bool:this
          20 ( 1.71% of base) : 9146.dasm - System.Number:TryParseUInt64IntegerStyle(System.ReadOnlySpan`1[Char],int,System.Globalization.NumberFormatInfo,byref):int
          16 ( 1.02% of base) : 12059.dasm - BenchmarksGame.FannkuchRedux_5:run(int,int,int)
          16 ( 2.37% of base) : 21241.dasm - BenchmarksGame.SpectralNorm_1:Bench(int):double:this
          16 ( 0.76% of base) : 23328.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          16 ( 1.47% of base) : 10517.dasm - DecCalc:VarDecMul(byref,byref)
          16 ( 5.80% of base) : 13231.dasm - System.Buffers.Text.Utf8Parser:TryParseByteD(System.ReadOnlySpan`1[Byte],byref,byref):bool
          16 ( 4.17% of base) : 13071.dasm - System.Buffers.Text.Utf8Parser:TryParseUInt16D(System.ReadOnlySpan`1[Byte],byref,byref):bool
          16 ( 2.41% of base) : 4135.dasm - System.Buffers.Text.Utf8Parser:TryParseUInt32D(System.ReadOnlySpan`1[Byte],byref,byref):bool
          16 (19.05% of base) : 11289.dasm - System.Collections.IndexerSet`1[Int32][System.Int32]:Span():int:this
          16 ( 4.71% of base) : 14822.dasm - System.MathBenchmarks.Double:AbsTest()
          16 ( 4.71% of base) : 21369.dasm - System.MathBenchmarks.Double:CeilingTest()
          16 ( 4.71% of base) : 21920.dasm - System.MathBenchmarks.Double:FloorTest()
          16 ( 4.88% of base) : 22459.dasm - System.MathBenchmarks.Double:ILogBTest()
          16 ( 4.71% of base) : 11879.dasm - System.MathBenchmarks.Double:RoundTest()
          16 ( 4.71% of base) : 13485.dasm - System.MathBenchmarks.Double:SqrtTest()

Top method regressions (percentages):
          12 (21.43% of base) : 18878.dasm - System.Diagnostics.Tracing.EventSource:GetDispatcher(System.Diagnostics.Tracing.EventListener):System.Diagnostics.Tracing.EventDispatcher:this
          16 (19.05% of base) : 11289.dasm - System.Collections.IndexerSet`1[Int32][System.Int32]:Span():int:this
          12 (17.65% of base) : 12576.dasm - System.Xml.Linq.XElement:Attribute(System.Xml.Linq.XName):System.Xml.Linq.XAttribute:this
           8 (16.67% of base) : 19693.dasm - System.Collections.Concurrent.ConcurrentStack`1[__Canon][System.__Canon]:get_Count():int:this
           8 (16.67% of base) : 14274.dasm - System.Collections.Concurrent.ConcurrentStack`1[Int32][System.Int32]:get_Count():int:this
          12 (13.64% of base) : 22676.dasm - System.Collections.IterateFor`1[Int32][System.Int32]:ReadOnlySpan():int:this
          12 (13.64% of base) : 22412.dasm - System.Collections.IterateFor`1[Int32][System.Int32]:Span():int:this
          12 (13.64% of base) : 19362.dasm - System.Collections.IterateForEach`1[Int32][System.Int32]:ReadOnlySpan():int:this
          12 (13.64% of base) : 18995.dasm - System.Collections.IterateForEach`1[Int32][System.Int32]:Span():int:this
           8 (13.33% of base) : 22034.dasm - Benchstone.BenchF.DMath:Fact(double):double
           8 (13.33% of base) : 22035.dasm - Benchstone.BenchF.DMath:Power(double,double):double
          12 (13.04% of base) : 17238.dasm - System.Reflection.BlobUtilities:WriteBytes(System.Byte[],int,ubyte,int)
           8 (12.50% of base) : 12681.dasm - Span.IndexerBench:TestIndexer1(System.Span`1[Byte]):ubyte
           8 (12.50% of base) : 13060.dasm - Span.IndexerBench:TestIndexer2(System.Span`1[Byte]):ubyte
           8 (12.50% of base) : 13330.dasm - Span.IndexerBench:TestIndexer3(System.Span`1[Byte]):ubyte
           8 (12.50% of base) : 18057.dasm - Span.IndexerBench:TestReadOnlyIndexer1(System.ReadOnlySpan`1[Byte]):ubyte
           8 (12.50% of base) : 18954.dasm - Span.IndexerBench:TestReadOnlyIndexer2(System.ReadOnlySpan`1[Byte]):ubyte
           8 (12.50% of base) : 11224.dasm - Span.IndexerBench:TestRef(System.Span`1[Byte]):ubyte
           8 (11.76% of base) : 14723.dasm - Span.IndexerBench:TestIndexer6(System.Span`1[Byte],Span.Sink):ubyte
          12 (11.54% of base) : 21117.dasm - System.Collections.IndexerSetReverse`1[Int32][System.Int32]:Span():int:this

293 total methods with Code Size differences (0 improved, 293 regressed), 1 unchanged.

Libraries windows/arm64 diffs:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 547924
Total bytes of diff: 555328
Total bytes of delta: 7404 (1.35% of base)
Total relative delta: 41.93
    diff is a regression.
    relative diff is a regression.

Detail diffs



Top file regressions (bytes):
          64 : 6462.dasm (1.75% of base)
          32 : 116552.dasm (8.89% of base)
          32 : 145813.dasm (7.92% of base)
          28 : 213124.dasm (1.77% of base)
          28 : 110271.dasm (6.14% of base)
          28 : 204842.dasm (7.69% of base)
          24 : 100302.dasm (1.50% of base)
          24 : 98371.dasm (9.68% of base)
          24 : 100294.dasm (8.57% of base)
          24 : 99802.dasm (10.71% of base)
          24 : 90757.dasm (0.93% of base)
          24 : 92581.dasm (9.38% of base)
          24 : 98133.dasm (9.68% of base)
          24 : 99568.dasm (10.71% of base)
          24 : 101605.dasm (1.32% of base)
          24 : 222436.dasm (2.37% of base)
          20 : 90768.dasm (0.75% of base)
          20 : 170368.dasm (1.38% of base)
          20 : 172208.dasm (0.98% of base)
          20 : 21798.dasm (6.58% of base)

956 total files with Code Size differences (0 improved, 956 regressed), 2 unchanged.

Top method regressions (bytes):
          64 ( 1.75% of base) : 6462.dasm - <StartupCode$FSharp-Core>.$Quotations:eq@197(Microsoft.FSharp.Quotations.Tree,Microsoft.FSharp.Quotations.Tree):bool
          32 ( 7.92% of base) : 145813.dasm - Microsoft.CSharp.RuntimeBinder.SymbolTable:FindMethodFromMemberInfo(System.Reflection.MemberInfo):Microsoft.CSharp.RuntimeBinder.Semantics.MethodSymbol
          32 ( 8.89% of base) : 116552.dasm - System.Xml.XmlStreamNodeWriter:UnsafeGetUTF8Chars(long,int,System.Byte[],int):int:this
          28 ( 6.14% of base) : 110271.dasm - System.Data.SqlTypes.SqlBinary:PerformCompareByte(System.Byte[],System.Byte[]):int
          28 ( 7.69% of base) : 204842.dasm - System.Net.WebSockets.ManagedWebSocket:ApplyMask(System.Span`1[Byte],int,int):int
          28 ( 1.77% of base) : 213124.dasm - System.Numerics.BigInteger:.ctor(System.ReadOnlySpan`1[Byte],bool,bool):this
          24 ( 1.32% of base) : 101605.dasm - <>c__DisplayClass32_0:<SetupCallbacks>b__46(Microsoft.Diagnostics.Tracing.Parsers.ClrPrivate.MulticoreJitPrivateTraceData):this
          24 (10.71% of base) : 99568.dasm - Microsoft.Diagnostics.Tracing.Parsers.ApplicationServer.Multidata42TemplateATraceData:PayloadValue(int):System.Object:this
          24 (10.71% of base) : 99802.dasm - Microsoft.Diagnostics.Tracing.Parsers.ApplicationServer.Multidata79TemplateATraceData:PayloadValue(int):System.Object:this
          24 ( 9.68% of base) : 98371.dasm - Microsoft.Diagnostics.Tracing.Parsers.ClrPrivate.DynamicTypeUseStringAndIntPrivateTraceData:PayloadValue(int):System.Object:this
          24 ( 9.68% of base) : 98133.dasm - Microsoft.Diagnostics.Tracing.Parsers.ClrPrivate.ModuleTransparencyCalculationTraceData:PayloadValue(int):System.Object:this
          24 ( 9.38% of base) : 92581.dasm - Microsoft.Diagnostics.Tracing.Parsers.LinuxKernel.ProcessStartTraceData:PayloadValue(int):System.Object:this
          24 ( 8.57% of base) : 100294.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftAntimalwareEngine.MessageUfsScanStart32Args_V1TraceData:PayloadValue(int):System.Object:this
          24 ( 1.50% of base) : 100302.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftAntimalwareEngine.MetaStoreTaskMetaStoreActionArgsTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          24 ( 0.93% of base) : 90757.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftWindowsTCPIP.IpDadFailedArgs:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          24 ( 2.37% of base) : 222436.dasm - System.Text.RegularExpressions.RegexBoyerMoore:.ctor(System.String,bool,bool,System.Globalization.CultureInfo):this
          20 ( 1.63% of base) : 235147.dasm - ArraySerializer:Deserialize(Xunit.Abstractions.IXunitSerializationInfo):this
          20 ( 6.58% of base) : 21798.dasm - Microsoft.CodeAnalysis.CSharp.Binder:CreateSourceIndicesArray(int,int):System.Int32[]
          20 ( 0.70% of base) : 97675.dasm - Microsoft.Diagnostics.Tracing.Parsers.Clr.ResolutionAttemptedTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          20 ( 1.27% of base) : 100328.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftAntimalwareEngine.PersistedStoreTaskPersistedStoreAnalyzeFileArgsTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this

Top method regressions (percentages):
          16 (36.36% of base) : 229123.dasm - Internal.JitInterface.MemoryHelper:FillMemory(long,ubyte,int)
          16 (36.36% of base) : 166457.dasm - System.ComponentModel.EventHandlerList:Find(System.Object):ListEntry:this
          16 (33.33% of base) : 82923.dasm - Microsoft.Diagnostics.Tracing.EventPipeEventMetaDataHeader:ClearMemory(long,int)
          16 (30.77% of base) : 174245.dasm - FilterAndTransform:Reverse(TransformSpec):TransformSpec
          16 (26.67% of base) : 130330.dasm - System.Xml.Xsl.XsltOld.DocumentScope:ResolveAtom(System.String):System.String:this
          12 (25.00% of base) : 174742.dasm - System.Diagnostics.Metrics.Counter`1[Vector`1][System.Numerics.Vector`1[System.Single]]:Add(System.Numerics.Vector`1[Single],System.ReadOnlySpan`1[[System.Collections.Generic.KeyValuePair`2[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]):this
          12 (25.00% of base) : 174809.dasm - System.Diagnostics.Metrics.Instrument`1[Vector`1][System.Numerics.Vector`1[System.Single]]:RecordMeasurement(System.Numerics.Vector`1[Single],System.ReadOnlySpan`1[[System.Collections.Generic.KeyValuePair`2[[System.String, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]):this
          16 (23.53% of base) : 146733.dasm - Microsoft.CSharp.RuntimeBinder.Semantics.Symbol:LookupNext(long):Microsoft.CSharp.RuntimeBinder.Semantics.Symbol:this
          16 (22.22% of base) : 92551.dasm - Microsoft.Diagnostics.Tracing.Parsers.MicrosoftWindowsNDISPacketCapture.PacketFragmentArgs:IsPrintable(long,long):bool
          12 (21.43% of base) : 74506.dasm - Microsoft.CodeAnalysis.RealParser:CountSignificantBits(ubyte):int
          12 (21.43% of base) : 206494.dasm - System.Xml.Linq.XElement:RemoveAttributesSkipNotify():this
          12 (20.00% of base) : 798.dasm - Microsoft.FSharp.Collections.ListDebugView`1[__Canon][System.__Canon]:get__FullList():System.__Canon[]:this
          12 (20.00% of base) : 797.dasm - Microsoft.FSharp.Collections.ListDebugView`1[__Canon][System.__Canon]:get_Items():System.__Canon[]:this
          12 (20.00% of base) : 803.dasm - Microsoft.FSharp.Collections.ListDebugView`1[Byte][System.Byte]:get__FullList():System.Byte[]:this
          12 (20.00% of base) : 802.dasm - Microsoft.FSharp.Collections.ListDebugView`1[Byte][System.Byte]:get_Items():System.Byte[]:this
          12 (18.75% of base) : 15261.dasm - System.MemoryExtensions:ClampStart(System.ReadOnlySpan`1[Int32],int):int
          12 (18.75% of base) : 15264.dasm - System.MemoryExtensions:ClampStart(System.ReadOnlySpan`1[Int64],long):int
          12 (17.65% of base) : 206572.dasm - System.Xml.Linq.XElement:Attribute(System.Xml.Linq.XName):System.Xml.Linq.XAttribute:this
          16 (16.67% of base) : 82945.dasm - Microsoft.Diagnostics.Tracing.TraceEventSource:GetContainerID(long):System.String:this
           8 (16.67% of base) : 162472.dasm - NodeKeyValueCollection:System.Collections.ICollection.get_Count():int:this

956 total methods with Code Size differences (0 improved, 956 regressed), 2 unchanged.

It is worth noting that because of fixed size encoding of arm64, very few methods benefit from loop alignment. For example, below is the diff for xarch.

x64:

Collection	Methods affected	Total padding (in bytes)
Benchmarks	470	4908
Libraries	1637	14967

arm64:

Collection	Methods affected	Total padding (in bytes)
Benchmarks	293	2476
Libraries	956	7404

Finally, as expected, the allocation size regression is same as code size regression we see above and there is no mismatch because there is we do not over-estimation the instruction sizes for arm64.

I tried to measure performance for some of the benchmarks that asmdiff pointed out and I didn't notice any significant performance and stability difference (cc @TamarChristinaArm ), but the plan is to merge this in and monitor the perf lab to see if it impacts benchmarks over the time. If we see that it adversely affects the benchmarks, we will revert this feature for arm64.

Author:	kunalspathak
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

kunalspathak · 2021-10-18T15:24:36Z

@dotnet/jit-contrib

kunalspathak · 2021-10-18T17:32:16Z

From the private perf run (thanks to @DrewScoggins), all the benchmarks flagged are noisy and there is not an obvious regression from this work. Once this PR is merged, we will continue monitoring the stability of benchmarks.

Windows/arm64 - https://pvscmdupload.blob.core.windows.net/drewtest/reports/report_Default_ca%3DARM64_cb%3Drefs-heads-loopaligntest_co%3DWindows1019041_cr%3Ddotnetruntime_cc%3DCompilationMode%3Dtiered-RunKind%3Dmicro_bb%3Drefs-heads-main_2021-10-18.html

Ubuntu/arm64 - https://pvscmdupload.blob.core.windows.net/drewtest/reports/report_Default_ca%3DARM64_cb%3Drefs-heads-loopaligntest_co%3DUbuntu1804ARM_cr%3Ddotnetruntime_cc%3DCompilationMode%3Dtiered-RunKind%3Dmicro_bb%3Drefs-heads-main_2021-10-18.html

src/coreclr/jit/compiler.cpp

src/coreclr/jit/emit.cpp

BruceForstall · 2021-10-19T00:10:50Z

src/coreclr/jit/instr.h

+    INS_OPTS_D_TO_H       // Double to Half
+
+#if FEATURE_LOOP_ALIGN
+    , INS_OPTS_ALIGN      // Align instruction


It looks like you can have an INS_align with INS_OPTS_NONE that indicates the align instruction is ignored? In which case, shouldn't the comment say, "Align instruction that will be used (not ignored)"?

Couldn't you just use INS_align without any special INS_OPTS to mean "align 4 bytes", and munge it to INS_nop if you decide not to use it for alignment?

The problem is there is a real INS_nop in arm64 that we use it occasionally and it will be hard to distinguish at few places if the INS_nop is from the alignment or something else. With INS_OPTS it becomes easier because all the way through the instruction remain ins_align .

src/coreclr/jit/emit.cpp

kunalspathak · 2021-10-20T06:12:59Z

Addressed all your feedback. Please take another look.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 7, 2021

kunalspathak changed the title ~~Loopalignment arm64~~ Loop Alignment support for Arm64 Oct 7, 2021

kunalspathak added 8 commits October 13, 2021 16:25

Enable FEATURE_LOOP_ALIGN for Arm64

1d7fc85

basic loop alignment for arm64

ee831bf

misc changes

044febc

perf score should account for align

14c0756

some fixes

d904a05

updated some asserts

946af3a

jit format

478be27

Fix test cases

1569b59

kunalspathak force-pushed the loopalignment-arm64 branch from 025c2fe to 1569b59 Compare October 13, 2021 23:26

kunalspathak added 2 commits October 14, 2021 22:15

Misc changes

655ba11

jit format

ef36174

kunalspathak marked this pull request as ready for review October 18, 2021 15:24

BruceForstall self-requested a review October 18, 2021 23:40

BruceForstall requested changes Oct 19, 2021

View reviewed changes

Review comments

ee3a529

BruceForstall approved these changes Oct 20, 2021

View reviewed changes

kunalspathak merged commit 7bd68df into dotnet:main Oct 20, 2021

BruceForstall mentioned this pull request Oct 26, 2021

Test failure JIT/Regression/CLR-x86-JIT/V1-M13-RTM/b98958/b98958/b98958.sh #60820

Closed

ghost locked as resolved and limited conversation to collaborators Nov 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loop Alignment support for Arm64 #60135

Loop Alignment support for Arm64 #60135

kunalspathak commented Oct 7, 2021

ghost commented Oct 7, 2021

kunalspathak commented Oct 18, 2021

kunalspathak commented Oct 18, 2021

BruceForstall Oct 19, 2021

BruceForstall Oct 19, 2021

kunalspathak Oct 20, 2021

kunalspathak commented Oct 20, 2021

Loop Alignment support for Arm64 #60135

Loop Alignment support for Arm64 #60135

Conversation

kunalspathak commented Oct 7, 2021

ghost commented Oct 7, 2021

kunalspathak commented Oct 18, 2021

kunalspathak commented Oct 18, 2021

BruceForstall Oct 19, 2021

Choose a reason for hiding this comment

BruceForstall Oct 19, 2021

Choose a reason for hiding this comment

kunalspathak Oct 20, 2021

Choose a reason for hiding this comment

kunalspathak commented Oct 20, 2021