Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dotnet test --blame-crash collects crash DMP even after successful test runs #14578

Closed
AArnott opened this issue Nov 17, 2020 · 24 comments · Fixed by microsoft/vstest#2645
Closed
Assignees

Comments

@AArnott
Copy link
Contributor

AArnott commented Nov 17, 2020

⚠ This is a regression in the .NET 5 SDK and blocks adoption of the .NET 5 SDK by this template repo and others.

When running dotnet test --blame-crash, a dmp file is produced at the conclusion of each test run even though no crash or timeout occurred.

Repro steps

git clone https://github.com/aarnott/library.template
cd library.template
git checkout c234c68d0b45e5cc621db26063c199b51b4a45fd
dotnet test --blame-crash

Observed output is below. Notice lines such as:

D:\git\lib.template\test\Library.Tests\TestResults\5d81aeb2-f632-461f-b151-7f96f9b8005f\testhost_22312_20201117T083038_crashdump.dmp

 dotnet test --blame-crash
  Determining projects to restore...
  All projects are up-to-date for restore.
  Library -> D:\git\lib.template\bin\Library\Debug\netstandard2.0\Library.dll
D:\git\lib.template\test\Library.Tests\CalculatorTests.cs(14,5): warning SA1512: Single-line comments should not be followed by blank line [D:\git\lib.template\test\Library.Tests\Library.Tests.csproj]
D:\git\lib.template\test\Library.Tests\CalculatorTests.cs(14,5): warning SA1005: Single line comment should begin with a space. [D:\git\lib.template\test\Library.Tests\Library.Tests.csproj]
D:\git\lib.template\test\Library.Tests\CalculatorTests.cs(14,5): warning SA1512: Single-line comments should not be followed by blank line [D:\git\lib.template\test\Library.Tests\Library.Tests.csproj]
D:\git\lib.template\test\Library.Tests\CalculatorTests.cs(14,5): warning SA1005: Single line comment should begin with a space. [D:\git\lib.template\test\Library.Tests\Library.Tests.csproj]
D:\git\lib.template\test\Library.Tests\CalculatorTests.cs(14,5): warning SA1512: Single-line comments should not be followed by blank line [D:\git\lib.template\test\Library.Tests\Library.Tests.csproj]
D:\git\lib.template\test\Library.Tests\CalculatorTests.cs(14,5): warning SA1005: Single line comment should begin with a space. [D:\git\lib.template\test\Library.Tests\Library.Tests.csproj]
  Library.Tests -> D:\git\lib.template\bin\Library.Tests\Debug\net472\Library.Tests.dll
  Library.Tests -> D:\git\lib.template\bin\Library.Tests\Debug\netcoreapp3.1\Library.Tests.dll
  Library.Tests -> D:\git\lib.template\bin\Library.Tests\Debug\netcoreapp2.1\Library.Tests.dll
Microsoft (R) Test Execution Command Line Tool Version 16.8.0
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
Data collector 'Blame' message: All tests finished running, Sequence file will not be generated.

Attachments:
  D:\git\lib.template\test\Library.Tests\TestResults\98a4200c-be91-4880-917b-9633e603c62f\testhost_35340_20201117T083036_crashdump.dmp
Passed!  - Failed:     0, Passed:     1, Skipped:     0, Total:     1, Duration: 3 ms - Library.Tests.dll (netcoreapp3.1)
Test run for D:\git\lib.template\bin\Library.Tests\Debug\netcoreapp2.1\Library.Tests.dll (.NETCoreApp,Version=v2.1)
Microsoft (R) Test Execution Command Line Tool Version 16.8.0
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
Data collector 'Blame' message: All tests finished running, Sequence file will not be generated.

Attachments:
  D:\git\lib.template\test\Library.Tests\TestResults\5d81aeb2-f632-461f-b151-7f96f9b8005f\testhost_22312_20201117T083038_crashdump.dmp
Passed!  - Failed:     0, Passed:     1, Skipped:     0, Total:     1, Duration: 6 ms - Library.Tests.dll (netcoreapp2.1)
Test run for D:\git\lib.template\bin\Library.Tests\Debug\net472\Library.Tests.dll (.NETFramework,Version=v4.7.2)
Microsoft (R) Test Execution Command Line Tool Version 16.8.0
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
Data collector 'Blame' message: All tests finished running, Sequence file will not be generated.

Attachments:
  D:\git\lib.template\test\Library.Tests\TestResults\22ff3440-7aec-4a15-b81e-a90ef4f607ca\testhost.net472.x86_12208_20201117T083041_crashdump.dmp
Passed!  - Failed:     0, Passed:     1, Skipped:     0, Total:     1, Duration: 8 ms - Library.Tests.dll (net472)

Build succeeded.

D:\git\lib.template\test\Library.Tests\CalculatorTests.cs(14,5): warning SA1512: Single-line comments should not be followed by blank line [D:\git\lib.template\test\Library.Tests\Library.Tests.csproj]
D:\git\lib.template\test\Library.Tests\CalculatorTests.cs(14,5): warning SA1005: Single line comment should begin with a space. [D:\git\lib.template\test\Library.Tests\Library.Tests.csproj]
D:\git\lib.template\test\Library.Tests\CalculatorTests.cs(14,5): warning SA1512: Single-line comments should not be followed by blank line [D:\git\lib.template\test\Library.Tests\Library.Tests.csproj]
D:\git\lib.template\test\Library.Tests\CalculatorTests.cs(14,5): warning SA1005: Single line comment should begin with a space. [D:\git\lib.template\test\Library.Tests\Library.Tests.csproj]
D:\git\lib.template\test\Library.Tests\CalculatorTests.cs(14,5): warning SA1512: Single-line comments should not be followed by blank line [D:\git\lib.template\test\Library.Tests\Library.Tests.csproj]
D:\git\lib.template\test\Library.Tests\CalculatorTests.cs(14,5): warning SA1005: Single line comment should begin with a space. [D:\git\lib.template\test\Library.Tests\Library.Tests.csproj]
    6 Warning(s)
    0 Error(s)

Time Elapsed 00:00:09.54
@AArnott
Copy link
Contributor Author

AArnott commented Nov 17, 2020

@nohwnd noticed that this only happens when the tests run on runtimes before .NET 5.0. But since nearly everyone targets less than .NET 5.0 at this point, that seems like a crucial scenario to test and support.

@AArnott
Copy link
Contributor Author

AArnott commented Nov 17, 2020

Just to be clear about the impact of this bug, when a pipeline is carefully written to collect crash dumps so that any test failures can be investigated, having a dump unnecessarily created ends up getting collected as an artifact, which slows down the pipeline and greatly swells the storage requirements for that cloud build account. The below screenshot shows how Windows agents (the only OS that supports dumps) collect nearly 1GB of dmp's vs. a few KB of files from the non-Windows agents where no dmp file creation bug exists:

image

@marcpopMSFT marcpopMSFT added the untriaged Request triage from a team member label Nov 17, 2020
@marcpopMSFT marcpopMSFT added Area-DotNet Test and removed untriaged Request triage from a team member labels Nov 17, 2020
@nohwnd
Copy link
Member

nohwnd commented Dec 3, 2020

This should be fixed on the latest release/5.1.2xx and master build. Please give it a try. I did locally and it no longer collects dump when there is no crash. I used version 5.0.200-preview.20601.7

inserted in #14764
inserted in #14782

@AArnott
Copy link
Contributor Author

AArnott commented Dec 3, 2020

Thanks, @nohwnd. But it's still broken. Now instead of collecting dumps from every run (successful or not), it never collects dumps. .NET 5.0 works fine as was the case before. But <.NET 5.0 test runs now never collect crash dumps.
Check out this log from AzP.

AArnott added a commit to microsoft/CsWin32 that referenced this issue Dec 3, 2020
This because of dotnet/sdk#14578 collecting dozens of DMP files totalling 3GB of unnecessary DMP files as pipeline artifacts.
AArnott added a commit to microsoft/CsWin32 that referenced this issue Dec 3, 2020
This because of dotnet/sdk#14578 collecting dozens of DMP files totalling 3GB of unnecessary DMP files as pipeline artifacts.
AArnott added a commit to microsoft/CsWin32 that referenced this issue Dec 3, 2020
This because of dotnet/sdk#14578 collecting dozens of DMP files totalling 3GB of unnecessary DMP files as pipeline artifacts.
@nohwnd
Copy link
Member

nohwnd commented Jan 5, 2021

@AArnott I tried it locally and it collected dumps even on versions that are not net5.0. Could you run the pipeline again please so I can see where it does not work? The build you linked above does not exist anymore. Thanks.

@nohwnd
Copy link
Member

nohwnd commented Jan 5, 2021

I have a whole suite of tests, I will re-run then tomorrow on the newest version to see if I can repro. I am back after a month, so I only quickly tried it so far. Hopefully I will be able to fix this soon.

@AArnott
Copy link
Contributor Author

AArnott commented Jan 6, 2021

@nohwnd the problem is not that dumps are not collected. The problem is that they are collected even when no test failure occurred.

@nohwnd
Copy link
Member

nohwnd commented Jan 6, 2021

Thanks, @nohwnd Jakub Jares FTE. But it's still broken. Now instead of collecting dumps from every run (successful or not), it never collects dumps. .NET 5.0 works fine as was the case before. But <.NET 5.0 test runs now never collect crash dumps.
Check out this log from AzP.

Here you said that it no longer collects them, so that is what I wondered about. The build is unfortunately gone.

@AArnott
Copy link
Contributor Author

AArnott commented Jan 6, 2021

Well, I guess it's been too long and I don't remember the history exactly.
So looking for past evidence, I requeued the build where <net5.0 crash dumps are not collected.

@nohwnd
Copy link
Member

nohwnd commented Jan 7, 2021

I still see "Build not found" on the latest link you posted. Do I need special permissions?

I've updated my tests and re-run them on Windows, Mac and Ubuntu and they show that the dumps are collected and that no dump is collected when there is no crash. According to my tests it works on 5.0.101 and 5.0.200-preview, and the tests fail on 5.0.100 which does not have the fix. https://dev.azure.com/jajares/blame/_build/results?buildId=262&view=results

Looking forward to seeing your build, maybe there are more issues that my tests don't identify.

@AArnott
Copy link
Contributor Author

AArnott commented Jan 7, 2021

I still see "Build not found" on the latest link you posted. Do I need special permissions?

No. I can even view it in an InPrivate window with no cookies. Are you sure you're using the link from my message I posted yesterday? I did not update the original link

If you still cannot view it, you can perhaps try looking at the commit being built or even build it yourself. It's a YAML pipeline so it should be trivial to set up.
Otherwise, we can share screen today.

@nohwnd
Copy link
Member

nohwnd commented Jan 7, 2021

I don't know what I did wrong. I can see your build now. The problem is that procdump is configured to only catch stack overflow and access violation, but you are killing the process via Environment.FailFast. If you force stack overflow (e.g. by calling itself in the test method: public void Crash() => Crash();, you should start getting all dumps. At least I do locally when I make this change in your code and run dotnet test -c Debug --filter "TestCategory!=FailsInCloudTest" -v n /p:CollectCoverage=true --logger trx --settings ".github/workflows/Windows.runsettings".

@ShreyasRmsft do you remember why you did it this way please?

( As a side note even without the filter, FailFast won't trigger dump creation in ProcDump. )

@ShreyasRmsft
Copy link

@nohwnd that was before my time when blame was implemented for crash dumps only. I think Environment.FailFast was simply not a scenario at that time.

@AArnott
Copy link
Contributor Author

AArnott commented Feb 5, 2021

@nohwnd If you notice the artifacts collected in the pipeline, only one of the 4 test runs collected a DMP, even though they all used FailFast. Looking at the logs reveals that it was the net5.0 test run that collected the dump. Why the difference in behavior between runtimes?

I ran a series of tests with each SDK version. Here are the results (all on Windows agents), where ❌ means no dump collected and ✅ means a dump was collected.

SDK 3.1.404

Failure method net472 netcoreapp2.1 netcoreapp3.1
No crash
FailFast
Stack overflow

SDK 5.0.102

Failure method net472 netcoreapp2.1 netcoreapp3.1 net5.0
No crash
FailFast
Stack overflow x3 x3

Notice that new to the 5.0 SDK, the netcoreapp3.1 runtime (and net5.0 runtime) collect 3 dumps for one crash. Why is that?

SDK 5.0.200-preview.20601.7

Failure method net472 netcoreapp2.1 netcoreapp3.1 net5.0
No crash
FailFast
Stack overflow x3 x3

Analysis

Apparently the 5.0.200-preview named above did not change behavior relevant to this bug. The regressions in the .NET 5 SDK are:

  1. FailFast no longer leads to dump collection on netcoreapp2.1 and netcoreapp3.1. FailFast is often used so that a dump will be collected (by Watson) at that very moment. We would want the same functionality in test runs.
  2. Stack overflows now produce 3 dumps instead of 1 on the netcoreapp3.1 and net5.0 runtimes.

Can we get an update on when/whether these are bugs and when we can expect them to be fixed?

@AArnott
Copy link
Contributor Author

AArnott commented Aug 31, 2021

This is blocking our adequate testing of CsWin32 (we don't test on net472 any more) and led to shipping a major regression. Can we please get traction on this?

@nohwnd
Copy link
Member

nohwnd commented Sep 1, 2021

@AArnott 2) this should be fixed in net6.0 sdk is upgrading an option for you? #15258

Looking into 2.

@nohwnd
Copy link
Member

nohwnd commented Sep 1, 2021

  1. is caused by the change in approach to collecting dumps. Previously we would collect dump on exit all the time, and if collect dump always option was enabled, we would copy the dump to the output folder, otherwise we would ignore it. This is why FailFast will produce dump with 3.1 SDK. With the addition of .NET Core dumper, which we can't monitor and which can produce dump for any child process, the dump collection was changed to grab all dump files from a randomized temp folder, to ensure we get all of them. This made it produce dumps even for successful runs, which was undesirable.

Let's see if I can revert that behavior without returning dumps that were collected on successful runs.

@AArnott
Copy link
Contributor Author

AArnott commented Sep 1, 2021

I generally try to avoid preview SDKs for production repos, but I'm happy to at least try it to validate. And for this CsWin32 repo in particular I think we can take a dependency on the preview SDK.

I'm a little confused as to which fixes are already in the 6.0 SDK and which ones you're still working on. Should I wait for all fixes before trying it out?

@nohwnd
Copy link
Member

nohwnd commented Sep 2, 2021

  1. should be fixed in the preview sdk. 1) I am working on.

@AraHaan
Copy link
Member

AraHaan commented Dec 30, 2021

was this fixed yet?

@nohwnd
Copy link
Member

nohwnd commented Feb 1, 2022

@AArnott could you upgrade your pipeline to net6 SDK and run it again please?

@AArnott
Copy link
Contributor Author

AArnott commented Feb 1, 2022

The repo has been updated to 6.0. The dotnet test --blame-crash command no longer captures dumps after successful runs. Thank you.

@AArnott AArnott closed this as completed Feb 1, 2022
@AraHaan
Copy link
Member

AraHaan commented Feb 1, 2022

This is good to know, thanks for checking.

@nohwnd
Copy link
Member

nohwnd commented Feb 2, 2022

Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants