Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable improvements to debug emission for optimized code #38894

Closed
wants to merge 2 commits into from

Conversation

AndyAyersMS
Copy link
Member

Reverts #33021, which was a reversion of #2107.

Fix the IG boundary sensitivity for debug emission.

Allow debug info to be summarized in a way that makes it amenable to analyze
with our jitutils tooling (namely, jit-analyze).

Reverts dotnet#33021, which was a reversion of dotnet#2107.

Fix the IG boundary sensitivity for debug emission.

Allow debug info to be summarized in a way that makes it amenable to analyze
with our jitutils tooling (namely, `jit-analyze`).
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 7, 2020
@AndyAyersMS
Copy link
Member Author

Leaving as draft for now, until I can pass the cross-crossgen check.

Updated jit-diffs tooling (will PR shortly) shows number of variables reported is strictly greater for the new change:

Found 191 files with textual diffs.

Summary of Debug Var Count diffs:
(Lower is better)

Total Number of Variabless of diff: 58506 (7.776% of base)
    diff is a regression.

Top file regressions (Number of Variabless):
        8998 : Microsoft.CodeAnalysis.VisualBasic.dasm (9.635% of base)
        6345 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (12.971% of base)
        5711 : System.Private.CoreLib.dasm (6.699% of base)
        4007 : System.Private.Xml.dasm (9.036% of base)
        3142 : Microsoft.CodeAnalysis.CSharp.dasm (4.000% of base)
        1769 : Microsoft.CodeAnalysis.dasm (6.154% of base)
        1751 : System.Linq.dasm (11.175% of base)
        1734 : System.Linq.Parallel.dasm (5.510% of base)
        1720 : System.Collections.Immutable.dasm (8.217% of base)
        1439 : System.Data.Common.dasm (7.860% of base)
         978 : Microsoft.VisualBasic.Core.dasm (17.233% of base)
         881 : System.Private.DataContractSerialization.dasm (6.381% of base)
         861 : System.Linq.Expressions.dasm (6.734% of base)
         781 : System.Threading.Tasks.Dataflow.dasm (5.537% of base)
         771 : Newtonsoft.Json.dasm (6.180% of base)
         730 : Microsoft.CSharp.dasm (12.074% of base)
         657 : System.Text.Json.dasm (9.716% of base)
         543 : System.Reflection.Metadata.dasm (6.977% of base)
         541 : System.Net.Http.dasm (6.901% of base)
         524 : System.DirectoryServices.dasm (11.359% of base)

178 total files with Debug Var Count differences (0 improved, 178 regressed), 86 unchanged.

Top method regressions (Number of Variabless):
          98 (33.333% of base) : System.Linq.Parallel.dasm - ConcatQueryOperator`1:WrapHelper(PartitionedStream`2,PartitionedStream`2,IPartitionedStreamRecipient`1,QuerySettings,bool):this (98 methods)
          98 (40.000% of base) : System.Linq.Parallel.dasm - QueryOperator`1:ExecuteAndCollectResults(PartitionedStream`2,int,bool,bool,QuerySettings):ListQueryResults`1 (98 methods)
          98 (40.000% of base) : System.Linq.Parallel.dasm - TakeOrSkipQueryOperator`1:WrapPartitionedStream(PartitionedStream`2,IPartitionedStreamRecipient`1,bool,QuerySettings):this (98 methods)
          98 (40.000% of base) : System.Linq.Parallel.dasm - TakeOrSkipWhileQueryOperator`1:WrapPartitionedStream(PartitionedStream`2,IPartitionedStreamRecipient`1,bool,QuerySettings):this (98 methods)
          85 (22.078% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilder`1:ToDictionary(Func`2,IEqualityComparer`1):Dictionary`2:this (98 methods)
          76 (107.042% of base) : System.Collections.Immutable.dasm - Node:NodeTreeFromList(IOrderedCollection`1,int,int):Node (42 methods)
          70 (77.778% of base) : System.Private.CoreLib.dasm - Enumerator:MoveNext():bool:this (132 methods)
          56 (266.667% of base) : CommandLine.dasm - <RenderUsageTextAsLines>d__56`1:MoveNext():bool:this (14 methods)
          56 (66.667% of base) : System.Collections.Immutable.dasm - Enumerator:MoveNext():bool:this (154 methods)
          56 (33.333% of base) : System.ComponentModel.Composition.Registration.dasm - PartBuilder`1:ExportProperty(Expression`1,Action`1):PartBuilder`1:this (112 methods)
          56 (33.333% of base) : System.ComponentModel.Composition.Registration.dasm - PartBuilder`1:ImportProperty(Expression`1,Action`1):PartBuilder`1:this (112 methods)
          56 (33.333% of base) : System.Composition.Convention.dasm - PartConventionBuilder`1:ExportProperty(Expression`1,Action`1):PartConventionBuilder`1:this (112 methods)
          56 (33.333% of base) : System.Composition.Convention.dasm - PartConventionBuilder`1:ImportProperty(Expression`1,Action`1):PartConventionBuilder`1:this (112 methods)
          56 (16.667% of base) : System.Linq.Parallel.dasm - ConcatQueryOperator`1:WrapPartitionedStream(PartitionedStream`2,PartitionedStream`2,IPartitionedStreamRecipient`1,bool,QuerySettings):this (98 methods)
          49 (50.000% of base) : Microsoft.CodeAnalysis.dasm - UnionCollection`1:Create(ImmutableArray`1,Func`2):ICollection`1 (98 methods)
          49 (20.588% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilder`1:SelectDistinct(Func`2):ImmutableArray`1:this (98 methods)
          49 (20.000% of base) : System.Linq.Parallel.dasm - FirstQueryOperator`1:WrapPartitionedStream(PartitionedStream`2,IPartitionedStreamRecipient`1,bool,QuerySettings):this (98 methods)
          49 (11.111% of base) : System.Linq.Parallel.dasm - IndexedSelectQueryOperator`2:WrapPartitionedStream(PartitionedStream`2,IPartitionedStreamRecipient`1,bool,QuerySettings):this (98 methods)
          49 (11.111% of base) : System.Linq.Parallel.dasm - IndexedWhereQueryOperator`1:WrapPartitionedStream(PartitionedStream`2,IPartitionedStreamRecipient`1,bool,QuerySettings):this (98 methods)
          49 (20.000% of base) : System.Linq.Parallel.dasm - LastQueryOperator`1:WrapPartitionedStream(PartitionedStream`2,IPartitionedStreamRecipient`1,bool,QuerySettings):this (98 methods)

Top method regressions (percentages):
           1 (     ì of base) : Microsoft.CodeAnalysis.dasm - StackTrace:GetString():String (2 methods)
           1 (     ì of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - LookupResult:CreatePool():ObjectPool`1 (2 methods)
           1 (     ì of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Page:CreatePool():ObjectPool`1 (2 methods)
           2 (     ì of base) : Microsoft.CSharp.dasm - UnsafeMethods:Create_IUnknownRelease():IUnknownReleaseDelegate (2 methods)
           1 (     ì of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - DiaLoader:GetDiaSourceObject():IDiaDataSource3 (2 methods)
           1 (     ì of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - SymbolPath:get_SymbolPathFromEnvironment():String (2 methods)
           1 (     ì of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceEventNativeMethods:GetHRForLastWin32Error():int (2 methods)
           1 (     ì of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - KernelTraceEventParser:get_NonOSKeywords():int (2 methods)
           1 (     ì of base) : Microsoft.Extensions.Caching.Memory.dasm - CacheEntryHelper:get_Current():CacheEntry (2 methods)
           1 (     ì of base) : Microsoft.Extensions.Caching.Memory.dasm - CacheEntryHelper:GetOrCreateScopes():CacheEntryStack (2 methods)
           2 (     ì of base) : Microsoft.Extensions.DependencyModel.dasm - DependencyContextPaths:GetCurrent():DependencyContextPaths (2 methods)
           1 (     ì of base) : Microsoft.Extensions.DependencyModel.dasm - ApplicationEnvironment:GetApplicationBasePath():String (2 methods)
           3 (     ì of base) : Microsoft.VisualBasic.Core.dasm - VBMath:Randomize() (2 methods)
           1 (     ì of base) : Microsoft.VisualBasic.Core.dasm - ProjectData:GetProjectData():ProjectData (2 methods)
           1 (     ì of base) : Microsoft.VisualBasic.Core.dasm - Utils:get_VBRuntimeAssembly():Assembly (2 methods)
           1 (     ì of base) : Microsoft.VisualBasic.Core.dasm - DateAndTime:get_DateString():String (2 methods)
           1 (     ì of base) : Microsoft.VisualBasic.Core.dasm - Information:Erl():int (2 methods)
           6 (     ì of base) : Newtonsoft.Json.dasm - BinderWrapper:CreateMemberCalls() (2 methods)
           1 (     ì of base) : Newtonsoft.Json.dasm - JsonTypeReflector:get_FullyTrusted():bool (2 methods)
           5 (     ì of base) : OSExtensions.dasm - ETWKernelControl:LoadKernelTraceControl() (2 methods)

31706 total methods with Debug Var Count differences (0 improved, 31706 regressed), 211626 unchanged.

Number of debug clasuses increases by around 20%. Merging kicks fairly often an in some cases reduces the overall debug volume for a method.

As noted in #33189 there are future improvements to merging possible which should result in substantial reductions and (when implemented) we may well end up with less debug volume than we had before we did any of this.

Found 191 files with textual diffs.

Summary of Debug Clause Count diffs:
(Lower is better)

Total Number of Live Ranges of diff: 298737 (20.260% of base)
    diff is a regression.

Top file regressions (Number of Live Rangess):
       50091 : Microsoft.CodeAnalysis.VisualBasic.dasm (27.784% of base)
       33056 : Microsoft.CodeAnalysis.CSharp.dasm (21.729% of base)
       24484 : System.Private.CoreLib.dasm (14.113% of base)
       20480 : System.Private.Xml.dasm (22.189% of base)
       17501 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (18.505% of base)
        9477 : Microsoft.CodeAnalysis.dasm (17.956% of base)
        8713 : System.Linq.Parallel.dasm (14.858% of base)
        8501 : System.Data.Common.dasm (22.532% of base)
        8401 : Microsoft.VisualBasic.Core.dasm (56.933% of base)
        6704 : System.Threading.Tasks.Dataflow.dasm (26.951% of base)
        5396 : System.Collections.Immutable.dasm (12.149% of base)
        4582 : System.Linq.Expressions.dasm (17.933% of base)
        4379 : Newtonsoft.Json.dasm (17.270% of base)
        4018 : System.Private.DataContractSerialization.dasm (14.149% of base)
        4008 : System.Linq.dasm (12.389% of base)
        3671 : Microsoft.CSharp.dasm (29.894% of base)
        3067 : System.Net.Http.dasm (19.677% of base)
        2965 : System.Configuration.ConfigurationManager.dasm (29.757% of base)
        2626 : System.Text.Json.dasm (17.114% of base)
        2504 : System.Data.OleDb.dasm (33.382% of base)

Top file improvements (Number of Live Rangess):
         -15 : System.Reflection.TypeExtensions.dasm (-6.250% of base)

191 total files with Debug Clause Count differences (1 improved, 190 regressed), 73 unchanged.

Top method regressions (Number of Live Rangess):
         771 (244.762% of base) : Microsoft.VisualBasic.Core.dasm - VBBinder:BindToMethod(int,ref,byref,ref,CultureInfo,ref,byref):MethodBase:this (2 methods)
         755 (153.144% of base) : System.Linq.Parallel.dasm - SortHelper`2:MergeSortCooperatively():this (14 methods)
         641 (220.275% of base) : System.Data.Common.dasm - BinaryNode:EvalBinaryOp(int,ExpressionNode,ExpressionNode,DataRow,int,ref):Object:this (2 methods)
         461 (307.333% of base) : System.Private.CoreLib.dasm - DefaultBinder:BindToMethod(int,ref,byref,ref,CultureInfo,ref,byref):MethodBase:this (2 methods)
         392 (60.308% of base) : System.Diagnostics.DiagnosticSource.dasm - SynchronizedList`1:EnumWithFunc(Function`2,byref,byref):this (98 methods)
         374 (314.286% of base) : System.Data.Common.dasm - XmlTreeGen:HandleTable(DataTable,XmlDocument,XmlElement,bool):XmlElement:this (2 methods)
         335 (183.060% of base) : Microsoft.CSharp.dasm - ExpressionBinder:bindUserDefinedConversion(Expr,CType,CType,bool,byref,bool):bool:this (2 methods)
         335 (304.545% of base) : System.Runtime.Numerics.dasm - Number:NumberToStringFormat(byref,byref,ReadOnlySpan`1,NumberFormatInfo) (2 methods)
         306 (332.609% of base) : Microsoft.VisualBasic.Core.dasm - VBBinder:GetMostSpecific(MethodBase,MethodBase,ref,ref,bool,int,int,ref):int:this (2 methods)
         306 (382.500% of base) : System.Data.Odbc.dasm - MultipartIdentifier:ParseMultipartIdentifier(String,String,String,ushort,int,bool,String,bool):ref (2 methods)
         303 (378.750% of base) : System.Data.OleDb.dasm - MultipartIdentifier:ParseMultipartIdentifier(String,String,String,ushort,int,bool,String,bool):ref (2 methods)
         296 (161.749% of base) : System.Data.Common.dasm - RBTree`1:RBDeleteX(int,int,int):int:this (14 methods)
         287 (260.909% of base) : System.Private.CoreLib.dasm - Number:NumberToStringFormat(byref,byref,ReadOnlySpan`1,NumberFormatInfo) (2 methods)
         277 (24.171% of base) : System.Linq.Parallel.dasm - UnionQueryOperator`1:WrapPartitionedStreamFixedBothTypes(PartitionedStream`2,PartitionedStream`2,IPartitionedStreamRecipient`1,int,CancellationToken):this (98 methods)
         257 (25,700.000% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SyntaxFacts:GetText(ushort):String (2 methods)
         223 (455.102% of base) : Microsoft.CodeAnalysis.CSharp.dasm - OverloadResolution:IsApplicable(Symbol,EffectiveParameters,AnalyzedArguments,ImmutableArray`1,bool,bool,bool,byref):MemberAnalysisResult:this (2 methods)
         223 (193.913% of base) : System.Configuration.ConfigurationManager.dasm - MgmtConfigurationRecord:CopyConfigDefinitionsRecursive(ConfigDefinitionUpdates,XmlUtil,XmlUtilWriter,bool,LocationUpdates,SectionUpdates,bool,String,int,int):bool:this (2 methods)
         220 (271.605% of base) : System.Security.AccessControl.dasm - CommonAcl:RemoveQualifiedAces(SecurityIdentifier,int,int,ubyte,bool,int,Guid,Guid):bool:this (2 methods)
         210 (142.857% of base) : System.Numerics.Tensors.dasm - Tensor`1:GetTriangle(int,bool):Tensor`1:this (14 methods)
         209 (31.523% of base) : System.Linq.Parallel.dasm - FirstQueryOperator`1:WrapHelper(PartitionedStream`2,IPartitionedStreamRecipient`1,QuerySettings):this (98 methods)

Top method improvements (Number of Live Rangess):
        -232 (-28.642% of base) : System.Private.CoreLib.dasm - Task`1:ContinueWith(Func`3,Object,TaskScheduler,CancellationToken,int):Task`1:this (98 methods)
        -209 (-33.333% of base) : System.Private.CoreLib.dasm - Task`1:ContinueWith(Func`3,Object,int):Task`1:this (98 methods)
        -196 (-29.563% of base) : System.Private.CoreLib.dasm - Task`1:ContinueWith(Func`2,TaskScheduler,CancellationToken,int):Task`1:this (98 methods)
        -182 (-27.744% of base) : System.Private.CoreLib.dasm - TaskFactory`1:ContinueWhenAll(ref,Func`2,int):Task`1:this (112 methods)
        -182 (-27.744% of base) : System.Private.CoreLib.dasm - TaskFactory`1:ContinueWhenAny(ref,Func`2,int):Task`1:this (112 methods)
        -160 (-33.333% of base) : System.Private.CoreLib.dasm - Task`1:ContinueWith(Func`2,int):Task`1:this (98 methods)
        -160 (-33.333% of base) : System.Private.CoreLib.dasm - Task`1:ContinueWith(Func`3,Object):Task`1:this (98 methods)
        -160 (-30.246% of base) : System.Private.CoreLib.dasm - Task`1:ContinueWith(Func`3,Object,CancellationToken):Task`1:this (98 methods)
        -126 (-25.820% of base) : System.Private.CoreLib.dasm - TaskFactory`1:ContinueWhenAll(ref,Func`2):Task`1:this (112 methods)
        -126 (-23.162% of base) : System.Private.CoreLib.dasm - TaskFactory`1:ContinueWhenAll(ref,Func`2,CancellationToken):Task`1:this (112 methods)
        -126 (-25.820% of base) : System.Private.CoreLib.dasm - TaskFactory`1:ContinueWhenAny(ref,Func`2):Task`1:this (112 methods)
        -126 (-23.162% of base) : System.Private.CoreLib.dasm - TaskFactory`1:ContinueWhenAny(ref,Func`2,CancellationToken):Task`1:this (112 methods)
        -111 (-33.333% of base) : System.Private.CoreLib.dasm - Task`1:ContinueWith(Func`2):Task`1:this (98 methods)
        -111 (-29.058% of base) : System.Private.CoreLib.dasm - Task`1:ContinueWith(Func`2,CancellationToken):Task`1:this (98 methods)
         -98 (-14.627% of base) : System.Data.Common.dasm - EnumerableRowCollection`1:AddSortExpression(Func`2,IComparer`1,bool,bool):this (98 methods)
         -98 (-30.625% of base) : System.Linq.dasm - EnumerablePartition`1:Select(Func`2):IEnumerable`1:this (98 methods)
         -98 (-30.625% of base) : System.Linq.dasm - RepeatIterator`1:Select(Func`2):IEnumerable`1:this (98 methods)
         -98 (-30.625% of base) : System.Linq.dasm - Iterator`1:Select(Func`2):IEnumerable`1:this (98 methods)
         -98 (-30.625% of base) : System.Linq.dasm - Lookup`2:ApplyResultSelector(Func`3):IEnumerable`1:this (98 methods)
         -70 (-33.333% of base) : System.Private.DataContractSerialization.dasm - XmlDictionaryAsyncCheckWriter:WriteArray(String,String,String,ref,int,int):this (20 methods)

Top method regressions (percentages):
           1 (     ì of base) : Microsoft.CodeAnalysis.dasm - StackTrace:GetString():String (2 methods)
           1 (     ì of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - LookupResult:CreatePool():ObjectPool`1 (2 methods)
           1 (     ì of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Page:CreatePool():ObjectPool`1 (2 methods)
           2 (     ì of base) : Microsoft.CSharp.dasm - UnsafeMethods:Create_IUnknownRelease():IUnknownReleaseDelegate (2 methods)
           2 (     ì of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - SymbolPath:get_SymbolPathFromEnvironment():String (2 methods)
           2 (     ì of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceEventNativeMethods:GetHRForLastWin32Error():int (2 methods)
           1 (     ì of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - DiaLoader:GetDiaSourceObject():IDiaDataSource3 (2 methods)
           1 (     ì of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - KernelTraceEventParser:get_NonOSKeywords():int (2 methods)
           2 (     ì of base) : Microsoft.Extensions.Caching.Memory.dasm - CacheEntryHelper:GetOrCreateScopes():CacheEntryStack (2 methods)
           1 (     ì of base) : Microsoft.Extensions.Caching.Memory.dasm - CacheEntryHelper:get_Current():CacheEntry (2 methods)
           2 (     ì of base) : Microsoft.Extensions.DependencyModel.dasm - DependencyContextPaths:GetCurrent():DependencyContextPaths (2 methods)
           1 (     ì of base) : Microsoft.Extensions.DependencyModel.dasm - ApplicationEnvironment:GetApplicationBasePath():String (2 methods)
           5 (     ì of base) : Microsoft.VisualBasic.Core.dasm - VBMath:Randomize() (2 methods)
           2 (     ì of base) : Microsoft.VisualBasic.Core.dasm - ProjectData:GetProjectData():ProjectData (2 methods)
           2 (     ì of base) : Microsoft.VisualBasic.Core.dasm - Utils:get_VBRuntimeAssembly():Assembly (2 methods)
           1 (     ì of base) : Microsoft.VisualBasic.Core.dasm - DateAndTime:get_DateString():String (2 methods)
           1 (     ì of base) : Microsoft.VisualBasic.Core.dasm - Information:Erl():int (2 methods)
           6 (     ì of base) : Newtonsoft.Json.dasm - BinderWrapper:CreateMemberCalls() (2 methods)
           1 (     ì of base) : Newtonsoft.Json.dasm - JsonTypeReflector:get_FullyTrusted():bool (2 methods)
           5 (     ì of base) : OSExtensions.dasm - ETWKernelControl:LoadKernelTraceControl() (2 methods)

Top method improvements (percentages):
         -43 (-78.182% of base) : Microsoft.CodeAnalysis.CSharp.dasm - LanguageParser:IsTerminator():bool:this (2 methods)
         -10 (-76.923% of base) : System.Private.Xml.dasm - XmlAttributes:get_XmlFlags():int:this (2 methods)
          -9 (-75.000% of base) : System.Private.Uri.dasm - GenericUriParser:MapGenericParserOptions(int):int (2 methods)
         -17 (-70.833% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceEventSession:SetStackTraceIds(int,long,int):int (2 methods)
          -6 (-66.667% of base) : System.Private.Xml.dasm - XPathConvert:CbitZeroLeft(int):int (2 methods)
          -8 (-66.667% of base) : System.Reflection.Metadata.dasm - CustomAttribute:ProjectAttributeTargetValue(int):int (2 methods)
         -42 (-65.625% of base) : System.Data.Common.dasm - RBTree`1:GetIntValueFromBitMap(int):int (14 methods)
          -7 (-63.636% of base) : System.Security.Cryptography.X509Certificates.dasm - X509Pal:MapNameToStrFlag(int):int (2 methods)
          -5 (-62.500% of base) : System.Private.CoreLib.dasm - CompareInfo:GetNativeCompareFlags(int):int (2 methods)
         -13 (-61.905% of base) : System.Security.Claims.dasm - Claim:WriteTo(BinaryWriter,ref):this (2 methods)
          -6 (-60.000% of base) : System.Runtime.Numerics.dasm - BigIntegerCalculator:LeadingZeros(int):int (2 methods)
          -6 (-60.000% of base) : System.Runtime.Numerics.dasm - NumericsHelpers:CbitHighZero(int):int (2 methods)
         -19 (-55.882% of base) : System.Private.Uri.dasm - UncNameHelper:IsValid(long,int,byref,bool):bool (2 methods)
         -21 (-55.263% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MemberSemanticModel:GetEnclosingBinderInternal(Binder,VisualBasicSyntaxNode,VisualBasicSyntaxNode,int):Binder:this (2 methods)
         -12 (-54.545% of base) : System.Security.Claims.dasm - ClaimsIdentity:WriteTo(BinaryWriter,ref):this (2 methods)
          -2 (-50.000% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SyntaxFacts:IsFixedStatementExpression(SyntaxNode):bool (2 methods)
          -3 (-50.000% of base) : Microsoft.CodeAnalysis.dasm - ModulePropertiesForSerialization:GetCorHeaderFlags():int:this (2 methods)
          -3 (-50.000% of base) : System.Data.Common.dasm - SqlString:CompareOptionsFromSqlCompareOptions(int):int (2 methods)
          -3 (-50.000% of base) : System.DirectoryServices.AccountManagement.dasm - SDSUtils:MapOptionsToAuthTypes(int):int (2 methods)
         -11 (-50.000% of base) : System.Private.DataContractSerialization.dasm - UniqueId:UnsafeParse(long,int):this (2 methods)

141834 total methods with Debug Clause Count differences (13300 improved, 128534 regressed), 101498 unchanged.

@AndyAyersMS
Copy link
Member Author

Also need to verify no-diffs for debuggable codegen.

@AndyAyersMS
Copy link
Member Author

Have at least one failure to debug (linux arm64 crossgen):

Assert failure(PID 19535 [0x00004c4f], Thread: 19535 [0x4c4f]): Assertion failed '!m_VariableLiveRanges->back().m_EndEmitLocation.Valid()' in 'System.Numerics.Matrix4x4:<Invert>g__SseImpl|59_0(System.Numerics.Matrix4x4,byref):bool' during 'Generate code' (IL size 1451)

    File: /__w/1/s/src/coreclr/src/jit/codegencommon.cpp Line: 12403
    Image: /__w/1/s/artifacts/bin/coreclr/Linux.arm64.Checked/x64/crossgen

@AndyAyersMS
Copy link
Member Author

No diffs in debug codegen. Debug debug gen is different (debug clauses re-ordered), but all methods in SPC have the same number of clauses and same number of variables reported. So semantically "no-diff". Note the prolog/body merging opportunities seen in opt codegen are not as prevalent as most incoming args are in regs and get spilled to the stack.

This was measured with COMPlus_JitDebuggable=1 and by running jit-diffs using a checked core_root so that setting kicked in; it is ignored by release runtimes.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Jul 8, 2020

Example of "improved" optimized debug gen, from clause merging. Also note the debug range for var1 is a bit more extensive, as it can include parts of blocks (and a prolog-body merge opportunity for var 0):

; Variable debug info: 8 live ranges, 2 vars for method CompareInfo:GetNativeCompareFlags(int):int
  0(   UNKNOWN) : From 00000000h to 00000001h, in rcx
  1(   UNKNOWN) : From 0000000Fh to 00000014h, in rax
  1(   UNKNOWN) : From 00000019h to 0000001Eh, in rax
  1(   UNKNOWN) : From 00000021h to 00000026h, in rax
  1(   UNKNOWN) : From 00000029h to 0000002Eh, in rax
  1(   UNKNOWN) : From 00000033h to 0000003Bh, in rax
  0(   UNKNOWN) : From 00000000h to 00000040h, in rcx
  1(   UNKNOWN) : From 00000040h to 00000048h, in rax

becomes

; Variable debug info: 3 live ranges, 2 vars for method CompareInfo:GetNativeCompareFlags(int):int
  0(   UNKNOWN) : From 00000000h to 00000001h, in rcx
  0(   UNKNOWN) : From 00000000h to 00000040h, in rcx
  1(   UNKNOWN) : From 00000005h to 0000004Dh, in rax

@AndyAyersMS
Copy link
Member Author

Example of "rearranged" debug debug gen:

; Variable debug info: 7 live ranges, 4 vars for method System.AppContext:Setup(long,long,int)
  1(   UNKNOWN) : From 00000000h to 00000029h, in rdx
  2(   UNKNOWN) : From 00000000h to 00000029h, in r8
  0(   UNKNOWN) : From 00000000h to 00000029h, in rcx
  1(   UNKNOWN) : From 00000037h to 00000117h, in rbp[24] (1 slot)
  2(   UNKNOWN) : From 00000037h to 00000117h, in rbp[32] (1 slot)
  3(   UNKNOWN) : From 00000037h to 00000117h, in rbp[-4] (1 slot)
  0(   UNKNOWN) : From 00000037h to 00000117h, in rbp[16] (1 slot)

becomes

; Variable debug info: 7 live ranges, 4 vars for method System.AppContext:Setup(long,long,int)
  0(   UNKNOWN) : From 00000000h to 00000029h, in rcx
  0(   UNKNOWN) : From 00000037h to 00000117h, in rbp[16] (1 slot)
  1(   UNKNOWN) : From 00000000h to 00000029h, in rdx
  1(   UNKNOWN) : From 00000037h to 00000117h, in rbp[24] (1 slot)
  2(   UNKNOWN) : From 00000000h to 00000029h, in r8
  2(   UNKNOWN) : From 00000037h to 00000117h, in rbp[32] (1 slot)
  3(   UNKNOWN) : From 00000037h to 00000117h, in rbp[-4] (1 slot)

@BruceForstall was asking about the need for those prolog records, and I also wonder whether they are useful, especially here in debug codegen. IL offset 0 is at 37 here, so presumably a breakpoint at the open brace would used the body ranges, and the prolog ranges might only really be valid early on in the prolog and go stale part way through.

Need to investigate this as a follow-up. Apparently at one time we generated fully accurate prolog debug, see the various psi* methods (eg psiBegProlog) and this comment:

/*****************************************************************************
Enable this macro to get accurate prolog information for every instruction
in the prolog. However, this is overkill as nobody steps through the
disassembly of the prolog. Even if they do they will not expect rich debug info.
We still report all the arguments at the very start of the method so that
the user can see the arguments at the very start of the method (offset=0).
Disabling this decreased the debug maps in mscorlib by 10% (01/2003)
*/
#if 0
#define ACCURATE_PROLOG_DEBUG_INFO
#endif

@BruceForstall
Copy link
Member

The "rearranged" ranges seems potentially problematic depending on how the data is laid out and how the debugger looks for it: does it binary search for range starts, for example? i.e., does it need to be sorted by range start address?

@cshung Looked into the output pre-/post- this new format, so might have some comments here.

@cshung
Copy link
Member

cshung commented Jul 8, 2020

The "rearranged" ranges seem potentially problematic depending on how the data is laid out and how the debugger looks for it: does it binary search for range starts, for example? i.e., does it need to be sorted by range start address?

@cshung Looked into the output pre-/post- this new format so might have some comments here.

I believe the variable debug info is simply accessed through a linear search, so as long as the ranges do not overlap, the order should not matter. The lastGoodOne is suspicious though.

// FindNativeInfoInILVariableArray
// Linear search through an array of NativeVarInfos, to find the variable of index dwIndex, valid
// at the given ip. Returns CORDBG_E_IL_VAR_NOT_AVAILABLE if the variable isn't valid at the given ip.
// Arguments:
// input: dwIndex - variable number
// ip - IP
// nativeInfoList - list of instances of NativeVarInfo
// output: ppNativeInfo - the element of nativeInfoList that corresponds to the IP and variable number
// if we find such an element or NULL otherwise
// Return value: HRESULT: returns S_OK or CORDBG_E_IL_VAR_NOT_AVAILABLE if the variable isn't found
//
HRESULT FindNativeInfoInILVariableArray(DWORD dwIndex,
SIZE_T ip,
const DacDbiArrayList<ICorDebugInfo::NativeVarInfo> * nativeInfoList,
const ICorDebugInfo::NativeVarInfo ** ppNativeInfo)
{
_ASSERTE(ppNativeInfo != NULL);
*ppNativeInfo = NULL;
// A few words about this search: it must be linear, and the
// comparison of startOffset and endOffset to ip must be
// <=/>. startOffset points to the first instruction that will
// make the variable's home valid. endOffset points to the first
// instruction at which the variable's home invalid.
int lastGoodOne = -1;
for (unsigned int i = 0; i < (unsigned)nativeInfoList->Count(); i++)
{
if ((*nativeInfoList)[i].varNumber == dwIndex)
{
if ( (lastGoodOne == -1) ||
((*nativeInfoList)[lastGoodOne].startOffset < (*nativeInfoList)[i].startOffset) )
{
lastGoodOne = i;
}
if (((*nativeInfoList)[i].startOffset <= ip) &&
((*nativeInfoList)[i].endOffset > ip))
{
*ppNativeInfo = &((*nativeInfoList)[i]);
return S_OK;
}
}
}
// workaround:
//
// We didn't find the variable. Was the endOffset of the last range for this variable
// equal to the current IP? If so, go ahead and "lie" and report that as the
// variable's home for now.
//
// Rationale:
//
// * See TODO comment in code:Compiler::siUpdate (jit\scopeinfo.cpp). In optimized
// code, the JIT can report var lifetimes as being one instruction too short.
// This workaround makes up for that. Example code:
//
// static void foo(int x)
// {
// int b = x; // Value of "x" would not be reported in optimized code without the workaround
// bar(ref b);
// }
//
// * Since this is the first instruction after the last range a variable was alive,
// we're essentially assuming that since that instruction hasn't been executed
// yet, and since there isn't a new home for the variable, that the last home is
// still good. This actually turns out to be true 99.9% of the time, so we'll go
// with it for now.
// * We've been lying like this since 1999, so surely it's safe.
if ((lastGoodOne > -1) && ((*nativeInfoList)[lastGoodOne].endOffset == ip))
{
*ppNativeInfo = &((*nativeInfoList)[lastGoodOne]);
return S_OK;
}
return CORDBG_E_IL_VAR_NOT_AVAILABLE;
} // FindNativeInfoInILVariableArray

@dotnet/dotnet-diag

@AndyAyersMS
Copy link
Member Author

For the assert, seems like we're doing two updates to a live range.

Generating: N2853 (???,???) [003801] ------------                 IL_OFFSET void   IL offset: 0x377 REG NA
Generating: N2855 (  1,  1) [000779] -------N---Z       t779 =    LCL_VAR   simd16<System.Runtime.Intrinsics.Vector128`1[Single]> V04 loc2         u:2 d12 (last use) REG d12 $1e2
Generating: N001 (  1,  1) [004661] ------------      t4661 =    LCL_VAR   simd16<System.Runtime.Intrinsics.Vector128`1[Single]> V04 loc2          d12 REG d12
                                                              /--*  t4661  simd16 
Generating: N002 (  2,  2) [004662] -----------z      t4662 = *  SIMD      simd16 float UpperRestore Internal REG d17
IN0349:                           ldr     d17, [fp,#584]	// [V04 loc2+0x08]
IN034a:                           mov     v12.d[1], v17.d[0]
                                                              /--*  t779   simd16 
Generating: N2857 (  1,  3) [002433] DA----------              *  STORE_LCL_VAR simd16<System.Runtime.Intrinsics.Vector128`1[Single]> V131 tmp98       d:1 d12 REG d12
							V04 in reg d12 is becoming dead  [000779]
							Live regs: 980000 {x19 x20 x23 d8 d10 d12 d14} => 980000 {x19 x20 x23 d8 d10 d14}
							Live vars: {V01 V02 V04 V10 V11 V12 V13 V14 V15 V17 V18 V21 V22 V23 V24 V190 V219} => {V01 V02 V10 V11 V12 V13 V14 V15 V17 V18 V21 V22 V23 V24 V190 V219}
IN034b:                           str     q12, [fp,#576]	// [V04 loc2]
							V04 in reg d12 is becoming dead  [000779]
							Live regs: (unchanged) 980000 {x19 x20 x23 d8 d10 d14}

First update happens with the first liveness change (via TreeLifeUpdater::UpdateLiveVar). Since this is a spill case, we then do a second update in genSpillVar. The latter invokes siUpdateVariableLiveRange which asserts that the current live range is not yet closed out, but it is....

Seems like perhaps the update in UpdateLiveVar should be skipped, if we have a spill case (basically, defer to genSpillVar)?

@AndyAyersMS
Copy link
Member Author

Deferring the update when spilling allows crossgen to get past the point of failure above, but it nows hit a similar looking issue later on in the same method. Not sure I completely understand what we're trying to do here:

N3053 (  1,  1) [000809] -----------Z       t809 =    LCL_VAR   simd16<System.Runtime.Intrinsics.Vector128`1[Single]> V10 loc8         u:5 d10 (last use) REG d10 $61f
N001 (  1,  1) [004669] ------------      t4669 =    LCL_VAR   simd16<System.Runtime.Intrinsics.Vector128`1[Single]> V10 loc8          d10 REG d10
                                                  /--*  t4669  simd16 
N002 (  2,  2) [004670] -----------z      t4670 = *  SIMD      simd16 float UpperRestore Internal REG d17
                                                  /--*  t809   simd16 
N3055 (???,???) [004364] ------------      t4364 = *  PUTARG_REG simd16 REG d0
N3057 (  1,  1) [003340] ------------      t3340 =    CNS_INT(h) long   0x7f811dafb370 ftn REG x11 $184
                                                  /--*  t3340  long   
N3059 (???,???) [004365] ------------      t4365 = *  PUTARG_REG long   REG x11
                                                  /--*  t4363  simd16 arg2 in d1
                                                  +--*  t4364  simd16 arg1 in d0
                                                  +--*  t4365  long   arg0 in x11
N3061 ( 17,  8) [000811] --CXG-------       t811 = *  CALL r2r_ind simd16 System.Runtime.Intrinsics.X86.Sse.Multiply REG d0 $69e
                                                  /--*  t811   simd16 
N3063 ( 17,  8) [000819] DA-XG-------              *  STORE_LCL_VAR simd16<System.Runtime.Intrinsics.Vector128`1[Single]> V10 loc8         d:6 d10 REG d10

The Z on [809]means this local is being spilled, but is also last use. The upper half is already spilled, and it's an argument to a call.

So the variable life tracker honors the spill, and leaves an open live range. Then V10 is redefined by the call, and we go to start a new live range, but there's already a live range open from the spill, and we assert.

Codegen is:

IN037b:                           ldr     d17, [fp,#488]	// [V10 loc8+0x08]       ;; restore upper half
IN037c:                           mov     v10.d[1], v17.d[0]
IN037d:                           str     q10, [fp,#480]	// [V10 loc8]            ;; spill the whole thing
IN037e:                           mov     v0.16b, v10.16b
IN037f:                           adrp    x11, [RELOC #0x7f811dafb370]
IN0380:                           add     x11, x11, #0
IN0381:                           ldr     x0, [x11]
IN0382:                           blr     x0
IN0383:                           mov     v10.16b, v0.16b                                ;; new def (same local V10)

@CarolEidt this looks like a dead spill to me....?

This is on Linux arm64, crossgenning System.Numerics.Matrix4x4:<Invert>g__SseImpl|59_0

@sdmaclea
Copy link
Contributor

sdmaclea commented Jul 8, 2020

The arm64 ABI guarantees the low half of some of the floating point registers are caller preserved, but the callee must preserve the upper half. This looks like code generated for that purpose.

@safern safern mentioned this pull request Jul 8, 2020
@AndyAyersMS
Copy link
Member Author

@sdmaclea thanks -- the upper restore before the call makes sense, but not the spill/kill sequence; seems like IN037d could just be dropped...? I'll look at the codegen without this change to see if the spilled value is indeed dead.

IN037d:     str     q10, [fp,#480]	// [V10 loc8]      ;; spill the whole thing
...
IN0383:     mov     v10.16b, v0.16b                        ;; and then (presumably) never look at the spilled bits

@CarolEidt
Copy link
Contributor

I'd have to look in more detail at this, but I would guess that there's some interaction between identifying live vectors at the call (that need to be partially saved) and vectors that are passed into the call (and therefore, in some sense, live at the call) but are last use. It would seem that there must be some other subtlety here, otherwise this would seem like it would be a more common issue, but I can't venture a guess at what that might be.

@AndyAyersMS
Copy link
Member Author

Ok, here's another oddity -- doesn't explain what goes wrong above, but does suggest why this method may be a fairly stressful case.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe bool Invert(Matrix4x4 matrix, out Matrix4x4 result)
{
if (Sse.IsSupported)
{
return SseImpl(matrix, out result);
}
return SoftwareFallback(matrix, out result);
static unsafe bool SseImpl(Matrix4x4 matrix, out Matrix4x4 result)
{
// This implementation is based on the DirectX Math Library XMMInverse method
// https://github.com/microsoft/DirectXMath/blob/master/Inc/DirectXMathMatrix.inl

We really should not be crossgenning an SseImpl method for arm64 as it can never be called on that platform, and the method assumes SSE support. But we do anyways. Because we're compiling this SSE method for arm64 most of the basic inlines fail as the inliner sees these methods will throw:

**************** Inline Tree
Inlines into 06001C4B System.Numerics.Matrix4x4:<Invert>g__SseImpl|59_0(System.Numerics.Matrix4x4,byref):bool
  [0 IL=0008 TR=000005 06003F53] [FAILED: does not return] System.Runtime.Intrinsics.X86.Sse:LoadVector128(long):System.Runtime.Intrinsics.Vector128`1[Single]
  [0 IL=0022 TR=000015 06003F53] [FAILED: does not return] System.Runtime.Intrinsics.X86.Sse:LoadVector128(long):System.Runtime.Intrinsics.Vector128`1[Single]
  [0 IL=0036 TR=000025 06003F53] [FAILED: does not return] System.Runtime.Intrinsics.X86.Sse:LoadVector128(long):System.Runtime.Intrinsics.Vector128`1[Single]
  [0 IL=0050 TR=000035 06003F53] [FAILED: does not return] System.Runtime.Intrinsics.X86.Sse:LoadVector128(long):System.Runtime.Intrinsics.Vector128`1[Single]
...

Currently, failed "does not return" inline methods don't result in control flow dead ends . So these all look like normal calls in the root method that might return. As a result we have a ton of call-crossing vector lifetimes and are giving the upper save/restore logic a good workout.

I don't know yet if crossgenning this method is an artifact of running cross crossgen (that is, I am running a crossgen hosted on x64 here) or a "feature".

I wonder if the SseImpl version should have another IsSupported check guarding some sort of not supported on this platform throw...?

cc @tannergooding

@tannergooding
Copy link
Member

Yes, I believe it should. There are a few requirements/assumptions around checks that exist in S.P.Corelib, one of which I believe is that the method using a relevant intrinsic must also do the IsSupported check.

I thought there might have been an exception for the baseline ISAs, but I might be misremembering.

CC. @davidwrighton, do we have the rules documented somewhere?

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Jul 9, 2020

@CarolEidt I definitely see what look like dead spills, eg cases in UpdateLiveVar where both spill and isDying are true; this is seemingly always just after some upper vector restore, so I wonder if handing an upper restore at a last use confuses the spill logic somehow?

I can save off a jit dump if you like... (edit:...trying, but getting timeouts creating a gist)

@AndyAyersMS
Copy link
Member Author

The asserts we see here are essentially ones where the live range tracker sees two consecutive "start" or "end" events for some variable and so is not sure how to properly update live ranges.

Seems like in general we may want to decouple the debug tracking from a strict reliance on getting consistent liveness updates; there's no reason why a dead spill (even if suboptimal) should lead to a debug assert. On the other hand we expect these cases of inconsistent liveness to be quite rare and should not be willing to accept them as a matter of routine.

Also seems likely that the debug gen will need re-examination for multi-reg variables, and perhaps also for eh write thru; such variables can have multiple/duplicate locations.

@tannergooding
Copy link
Member

Here are the rules for HWIntrinsics and S.P.Corelib:

The rule is as I remembered:

Any use of a platform intrinsic in the codebase MUST be wrapped with a call to the associated IsSupported property. This wrapping MUST be done within the same function that uses the hardware intrinsic, and MUST NOT be in a wrapper function unless it is one of the intrinsics that are enabled by default for crossgen compilation of System.Private.CoreLib

However, maybe this is incorrect when viewing ARM64 vs SSE/SSE2?

@sdmaclea
Copy link
Contributor

sdmaclea commented Jul 9, 2020

MUST NOT be in a wrapper function unless it is one of the intrinsics that are enabled by default for crossgen compilation of System.Private.CoreLib

This statement is a bit confusing. It is not clear how that could be accomplished except for intrinsics that are enabled always on all platforms.

@AndyAyersMS
Copy link
Member Author

Agree, it seems like any platform-dependent intrinsic use must always be guarded by an IsSupported check in the same method, unless perhaps the code is conditionally compiled.

I did a quick scan through the code and this method is the only one I found not following the rules. But my scan wasn't comprehensive.

@davidwrighton
Copy link
Member

@tannergooding @AndyAyersMS The rules as written are around correctness and I don't believe that one violates the rules I meant to write, as any platform that might implement SSE is also guaranteed to implement SSE. However, as you say it doesn't make all that much sense to compile the SSE special helper on Arm64, and it would likely be a bit cleaner if the SseImpl function had its own IsSupported check. OTOH, it seems to have found a lovely little corner case bug in the jit, so... yay?

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this pull request Jul 14, 2020
Though this method is only ever callable by code that's already done the right
IsSupported check, add a redundant check to the method itself, so that when
crossgenning SPC on non-xarch platforms we won't try and compile the xarch
specific parts of this method.  This should save some time and a bit of file
size for non-xarch SPCs.

See notes on dotnet#38894 for context.
@AndyAyersMS
Copy link
Member Author

Going to hold off on this until after .NET 5.

AndyAyersMS added a commit that referenced this pull request Jul 17, 2020
Though this method is only ever callable by code that's already done the right
IsSupported check, add a redundant check to the method itself, so that when
crossgenning SPC on non-xarch platforms we won't try and compile the xarch
specific parts of this method.  This should save some time and a bit of file
size for non-xarch SPCs.

See notes on #38894 for context.
Jacksondr5 pushed a commit to Jacksondr5/runtime that referenced this pull request Aug 10, 2020
Though this method is only ever callable by code that's already done the right
IsSupported check, add a redundant check to the method itself, so that when
crossgenning SPC on non-xarch platforms we won't try and compile the xarch
specific parts of this method.  This should save some time and a bit of file
size for non-xarch SPCs.

See notes on dotnet#38894 for context.
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants