Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64 CG2 compilation of System.Text.Json crashes on JIT assert 'block->bbWeight > BB_ZERO_WEIGHT' #52785

Closed
trylek opened this issue May 14, 2021 · 14 comments · Fixed by #53096
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@trylek
Copy link
Member

trylek commented May 14, 2021

OS: Windows
Architecture: arm64
Example run: https://dev.azure.com/dnceng/public/_build/results?buildId=1136542&view=logs&j=438f2a33-0bac-577f-c1e5-b7956f9ac284&t=437a67a3-60d6-56bd-4640-6cece310571b

Diagnostic info:

16 / 257 (6%, 1 failed): failed in 9686 msecs, exit code -2147483645 = 0x80000003, expected 0: dotnet.exe D:\workspace\_work\1\s\artifacts\bin\coreclr\windows.arm64.Checked\x64\crossgen2\crossgen2.dll @D:\workspace\_work\1\s\artifacts\tests\coreclr\obj\windows.arm64.Checked\crossgen.out\System.Text.Json.dll.rsp
20 / 257 (6%, 1 failed): launching: D:\workspace\_work\1\s\.dotnet\dotnet.exe D:\workspace\_work\1\s\artifacts\bin\coreclr\windows.arm64.Checked\x64\crossgen2\crossgen2.dll @D:\workspace\_work\1\s\artifacts\tests\coreclr\obj\windows.arm64.Checked\crossgen.out\System.Data.OleDb.dll.rsp
  D:\workspace\_work\1\s\src\coreclr\jit\fgdiagnostic.cpp:2700
  Assertion failed 'block->bbWeight > BB_ZERO_WEIGHT' in 'System.Text.Json.JsonReaderHelper:IndexOfOrLessThan(byref,ubyte,ubyte,ubyte,int):int' during 'Optimize layout' (IL size 1021)

/cc @dotnet/crossgen-contrib @dotnet/jit-contrib

@trylek trylek added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 14, 2021
@trylek trylek added this to the 6.0.0 milestone May 14, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label May 14, 2021
@trylek trylek removed the untriaged New issue has not been triaged by the area owner label May 14, 2021
@mangod9
Copy link
Member

mangod9 commented May 14, 2021

assume this is a recent regression?

@trylek
Copy link
Member Author

trylek commented May 14, 2021

FWIW, on OSX ARM64 there's another non-deterministic issue, the CG2 compiler sometimes crashes on a nullref when internally sorting dependency nodes; that is demonstrated in the OSX arm64 leg of the same run as quoted in the title:

      Unhandled exception. System.AggregateException: One or more errors occurred. (Object reference not set to an instance of an object.)
       ---> System.NullReferenceException: Object reference not set to an instance of an object.
         at ILCompiler.Sorting.Implementation.MergeSortCore`5.ParallelSort(TDataStructure arrayToSort, Int32 index, Int32 length, TComparer comparer)
         --- End of inner exception stack trace ---
         at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
         at ILCompiler.Sorting.Implementation.MergeSortCore`5.ParallelSortApi(TDataStructure arrayToSort, TComparer comparer)
         at ILCompiler.MetadataManager.GetCompiledMethods(EcmaModule moduleToEnumerate, CompiledMethodCategory methodCategory)
         at ILCompiler.DependencyAnalysis.NodeFactory.EnumerateCompiledMethods(EcmaModule moduleToEnumerate, CompiledMethodCategory methodCategory)+MoveNext()
         at ILCompiler.DependencyAnalysis.ReadyToRun.ExceptionInfoLookupTableNode.LayoutMethodsWithEHInfo()
         at ILCompiler.DependencyAnalysis.ReadyToRun.ExceptionInfoLookupTableNode.ShouldSkipEmittingObjectNode(NodeFactory factory)
         at ILCompiler.DependencyAnalysis.ReadyToRun.HeaderNode.GetData(NodeFactory factory, Boolean relocsOnly)
         at ILCompiler.DependencyAnalysis.ReadyToRunObjectWriter.EmitPortableExecutable()
         at ILCompiler.ReadyToRunCodegenCompilation.Compile(String outputFile)
         at ILCompiler.Program.RunSingleCompilation(Dictionary`2 inFilePaths, InstructionSetSupport instructionSetSupport, String compositeRootPath, Dictionary`2 unrootedInputFilePaths, HashSet`1 versionBubbleModulesHash, CompilerTypeSystemContext typeSystemContext)
         at ILCompiler.Program.Run(String[] args)
         at ILCompiler.Program.Main(String[] args)

@mangod9
Copy link
Member

mangod9 commented May 14, 2021

I think the osx arm64 NRE failure is a known issue with the runtime, not specifically related to cg2 itself.

@trylek
Copy link
Member Author

trylek commented May 14, 2021

I see the first repro of the OSX arm64 bug in Steve MacLean's run from May 5 (last Wednesday):

https://dev.azure.com/dnceng/public/_build/results?buildId=1124050&view=ms.vss-test-web.build-test-results-tab&runId=34195084&resultId=100691&paneView=debug

I'm looking for the first occurrence of the Windows bug.

@trylek
Copy link
Member Author

trylek commented May 14, 2021

OK, so I see the first occurrence of the Windows bug in @BruceForstall's run from May 6:

https://dev.azure.com/dnceng/public/_build/results?buildId=1127214&view=results

@davidwrighton
Copy link
Member

@AndyAyersMS is the expert for issues with basic block weight.

@AndyAyersMS
Copy link
Member

I'll take a look at the windows arm64 failure.

@AndyAyersMS
Copy link
Member

We end up with a negative block count and that leads to the assert. Suspect the issue is in fgConnectFallThrough.

BB56 [0106]  2       BB54,BB57             0.82  5123    [000..000)-> BB59 ( cond )                     i internal bwd bwd-target IBC 
BB57 [0107]  1       BB56                  0.82  5123    [000..000)-> BB56 ( cond )                     i internal bwd IBC 
BB59 [0109]  3       BB54,BB56,BB57        0.47  2915    [000..000)-> BB74 (always)                     i internal IBC 

Decided to reverse conditional branch at block BB56 branch to BB59 because of IBC profile data
fgFindInsertPoint(regionIndex=0, putInTryRegion=true, startBlk=BB01, endBlk=BB00, nearBlk=BB59, jumpBlk=BB00, runRarely=false)
Relocated uncommon block BB57 by reversing conditional jump at BB56
Relocated block [BB57..BB57] inserted after BB59
New Basic Block BB79 [0137] created.
Setting edge weights for BB57 -> BB79 to [-0.0002441406 .. 0]
Added an unconditional jump to BB59 after block BB57

BB56 [0106]  2       BB54,BB57             0.82  5123    [000..000)-> BB57 ( cond )                     i internal bwd bwd-target IBC 
BB59 [0109]  3       BB54,BB56,BB79        0.47  2915    [000..000)-> BB74 (always)                     i internal IBC 
BB57 [0107]  1       BB56                  0.82  5123    [000..000)-> BB56 ( cond )                     i internal bwd IBC 
BB79 [0137]  1       BB57                -2e-08     0    [???..???)-> BB59 (always)                     internal IBC

@AndyAyersMS
Copy link
Member

Root cause seems to be upstream, when we compute edge weights we're willing to set a min edge weight to something below zero (and root cause of that is inconsistent profile data, which we have to tolerate).

AndyAyersMS added a commit that referenced this issue May 18, 2021
If the solver wants to set the edge weight below zero, set it to
zero if within slop, or disallow if not.

Addresses assert seen in #52785.
@AndyAyersMS
Copy link
Member

@trylek can you verify this is now fixed?

@trylek
Copy link
Member Author

trylek commented May 21, 2021

Sorry for the late response. Thank you for fixing the failure in System.Text.Json, I confirm we're no longer hitting it, e.g. in this recent CG2 run:

https://dev.azure.com/dnceng/public/_build/results?buildId=1147260&view=logs&j=438f2a33-0bac-577f-c1e5-b7956f9ac284&t=437a67a3-60d6-56bd-4640-6cece310571b

Instead of the original failure, all arm64 builds and Windows x64 builds are now failing in System.IO.Compression with another JIT assert:

58 / 257 (24%, 1 failed): failed in 4806 msecs, exit code -2147483645 = 0x80000003, expected 0: dotnet.exe D:\workspace\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\crossgen2\crossgen2.dll @D:\workspace\_work\1\s\artifacts\tests\coreclr\obj\windows.x64.Checked\crossgen.out\System.IO.Compression.dll.rsp
67 / 257 (24%, 1 failed): launching: D:\workspace\_work\1\s\.dotnet\dotnet.exe D:\workspace\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\crossgen2\crossgen2.dll @D:\workspace\_work\1\s\artifacts\tests\coreclr\obj\windows.x64.Checked\crossgen.out\System.Linq.Queryable.dll.rsp
  D:\workspace\_work\1\s\src\coreclr\jit\morph.cpp:12306
  Assertion failed '(effOp1->gtOper == GT_CNS_INT) && (effOp1->IsIntegralConst(0) || effOp1->IsIntegralConst(1))' in 'System.IO.Compression.ZipArchive:.ctor(System.IO.Stream,int,bool,System.Text.Encoding):this' during 'Optimize Valnum CSEs' (IL size 489)

This seems to be the tracking bug for the remaining issue: #33091

@AndyAyersMS
Copy link
Member

Let me see what's up with that new failure... I have a hunch it may be related to #52524.

@AndyAyersMS
Copy link
Member

have a hunch it may be related to #52524.

Nope, the problem is caused by my change #52827. Seems like it should be easy to fix.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue May 21, 2021
In particular we need to set `GTF_DONT_CSE` so that CSE doesn't
introduce commas under `GT_JTRUE` nodes.

Fixes dotnet#52785.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 21, 2021
@trylek
Copy link
Member Author

trylek commented May 21, 2021

Awesome, thanks for the quick investigation!

AndyAyersMS added a commit that referenced this issue May 21, 2021
In particular we need to set `GTF_DONT_CSE` so that CSE doesn't
introduce commas under `GT_JTRUE` nodes.

Fixes #52785.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 21, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jun 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants