Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failed 'unreached' in '<ToString_TestData>d__71:MoveNext():bool:this' during 'Do value numbering' (IL size 6679; hash 0x288e6df7; Tier1) #69047

Closed
jakobbotsch opened this issue May 9, 2022 · 14 comments · Fixed by #69051
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms'

Comments

@jakobbotsch
Copy link
Member

Build: https://dev.azure.com/dnceng/public/_build/results?buildId=1759044&view=results
Leg: Libraries Test Run checked coreclr Linux_musl x64 Release
Log: https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-5e336d55e08940948e/System.Runtime.Tests/1/console.80904107.log?helixlogtype=result

Assert failure(PID 26 [0x0000001a], Thread: 100 [0x0064]): Assertion failed 'unreached' in '<ToString_TestData>d__71:MoveNext():bool:this' during 'Do value numbering' (IL size 6679; hash 0x288e6df7; Tier1)

    File: /__w/1/s/src/coreclr/jit/valuenum.cpp Line: 1905
    Image: /root/helix/work/correlation/dotnet

This only failed on Alpine, although that may tiering related.

cc @SingleAccretion

@ghost ghost added the untriaged New issue has not been triaged by the area owner label May 9, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 9, 2022
@ghost
Copy link

ghost commented May 9, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Build: https://dev.azure.com/dnceng/public/_build/results?buildId=1759044&view=results
Leg: Libraries Test Run checked coreclr Linux_musl x64 Release
Log: https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-5e336d55e08940948e/System.Runtime.Tests/1/console.80904107.log?helixlogtype=result

Assert failure(PID 26 [0x0000001a], Thread: 100 [0x0064]): Assertion failed 'unreached' in '<ToString_TestData>d__71:MoveNext():bool:this' during 'Do value numbering' (IL size 6679; hash 0x288e6df7; Tier1)

    File: /__w/1/s/src/coreclr/jit/valuenum.cpp Line: 1905
    Image: /root/helix/work/correlation/dotnet

This only failed on Alpine, although that may tiering related.

cc @SingleAccretion

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@jakobbotsch jakobbotsch added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' and removed untriaged New issue has not been triaged by the area owner labels May 9, 2022
@jakobbotsch
Copy link
Member Author

jakobbotsch commented May 9, 2022

There is an unfortunate mismatch between PR validation and rolling validation: on PRs we build libraries tests in debug while in rolling validation we build them in release. I'm not sure why we have this mismatch, two failures (including this one) made it into main last week because of it. cc @agocke, maybe something we need to reconsider? Also @dotnet/area-infrastructure-libraries

@jakobbotsch
Copy link
Member Author

I was able to repro this issue on ubuntu 20.04 and record a SPMI collection. I've attached the failing context, collected on 5ecaae9.

repro.zip

@jakobbotsch
Copy link
Member Author

This does not seem to repro with MSVC. It also does not repro under COMPlus_JitDump.

@SingleAccretion SingleAccretion self-assigned this May 9, 2022
@ViktorHofer
Copy link
Member

The mismatch in configurations is intentional as we don't have enough resources to test all configurations on PR runs. With the mixed matrix we make sure that all configurations are tested.

@jakobbotsch
Copy link
Member Author

The problem is in VNForBitCast here:

if ((srcVNFunc.m_func == VNF_ZeroObj) ||

srcVNFunc might not be valid here (GetVNFunc above could have failed). So the failure happens only if this happens to match VNF_ZeroObj.

@SingleAccretion do you want to submit the fix?

@SingleAccretion
Copy link
Contributor

SingleAccretion commented May 9, 2022

@jakobbotsch thank you for diagnosing this! Yes, will submit the fix shortly.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 9, 2022
@jakobbotsch
Copy link
Member Author

jakobbotsch commented May 9, 2022

The mismatch in configurations is intentional as we don't have enough resources to test all configurations on PR runs. With the mixed matrix we make sure that all configurations are tested.

I wonder if we should reconsider the matrix. Running the JIT over all libraries tests in debug vs in release is a very significant difference. Achieving our goal of 95% pass rate seems difficult if we intentionally have large discrepancies like this.

What do you think about getting the checked coreclr x debug tests coverage as part of weekly runs instead of in PRs? Then for PRs with coreclr changes we would instead run checked coreclr x release tests, the same as rolling builds.

@ViktorHofer
Copy link
Member

ViktorHofer commented May 9, 2022

@jkotas @danmoseley regarding changing the configuration matrix in PRs vs rolling builds.

@jkotas
Copy link
Member

jkotas commented May 9, 2022

For PRs, we want to maximize coverage variety while minimizing costs.

We have checked CoreCLR + Debug libraries tests for many platforms in the PR mix, but no checked CoreCLR + Release libraries tests. I think we should change checked CoreCLR + Debug libraries for subset of platforms to checked CoreCLR + Release libraries to increase the variety.

@bartonjs
Copy link
Member

bartonjs commented May 9, 2022

When a PR fails a test, the debug callstacks and the Debug.Asserts are useful in helping track it down... so I believe that debug libraries builds in PRs that change libraries is the priority.

Running the JIT over all libraries tests in debug vs in release is a very significant difference. Achieving our goal of 95% pass rate seems difficult if we intentionally have large discrepancies like this.

Is it? I feel like .NET developers expect that JIT and R2R make functionally equivalent outputs. Changes in the CLR might invalidate that, so checked CLR + runtime libraries is an interesting bucket for PRs that have changes to those areas (or, if that's hard, "for PRs that change either CoreCLR or the mono runtime")

@danmoseley
Copy link
Member

PR's today seem to include 3 variations for Linux coreclr x64:

Libraries Test Run release coreclr Linux_musl x64 Debug
Libraries Test Run release coreclr Linux x64 Debug
Libraries Test Run checked coreclr Linux x64 Debug

The last one could change to checked/release -- would we lose important coverage?

@jkotas
Copy link
Member

jkotas commented May 9, 2022

I always look at the whole matrix. We still have checked coreclr Linux_musl x64 Debug. Linux and Linux_musl are very similar. So I do not think we would lose any significant coverage by switching checked coreclr Linux x64 Debug to checked coreclr Linux x64 Release

@jakobbotsch
Copy link
Member Author

@jkotas

We have checked CoreCLR + Debug libraries tests for many platforms in the PR mix, but no checked CoreCLR + Release
libraries tests. I think we should change checked CoreCLR + Debug libraries for subset of platforms to checked CoreCLR + Release libraries to increase the variety.

Sounds good to me. I'll try to work on that this week.

@bartonjs

When a PR fails a test, the debug callstacks and the Debug.Asserts are useful in helping track it down... so I believe that debug libraries builds in PRs that change libraries is the priority.

Running the JIT over all libraries tests in debug vs in release is a very significant difference. Achieving our goal of 95% pass rate seems difficult if we intentionally have large discrepancies like this.

Is it? I feel like .NET developers expect that JIT and R2R make functionally equivalent outputs. Changes in the CLR might invalidate that, so checked CLR + runtime libraries is an interesting bucket for PRs that have changes to those areas (or, if that's hard, "for PRs that change either CoreCLR or the mono runtime")

Agreed, for libraries changes it makes sense to run libraries tests in debug. However in this case, when we have coreclr changes (e.g. changes in a JIT optimization pass), we run checked coreclr + debug libraries tests. This does not exercise the JIT changes to nearly the same degree as if the libraries tests are compiled in release.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 9, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jun 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms'
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants