Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visual Studio constantly crashes #16869

Closed
vsfeedback opened this issue Mar 13, 2024 · 26 comments
Closed

Visual Studio constantly crashes #16869

vsfeedback opened this issue Mar 13, 2024 · 26 comments
Labels
Area-Compiler-Checking Type checking, attributes and all aspects of logic checking Impact-High (Internal MS Team use only) Describes an issue with extreme impact on existing code.
Milestone

Comments

@vsfeedback
Copy link

This issue has been moved from a ticket on Developer Community.


[severity:It's more difficult to complete my work]
Editing a somewhat large F# file, the Visual Studio consistently/constantly crashes. I can start VS, go to the file, edit a bit and then it crashes (the start-edit-crash "inner loop"). I have the same problem with some other files in our solution.

I hope you prioritize crashes and have someone looking at the telemetry you are hopefully getting, although this has been going on for a while now.

The activity recorder does not work for me, at least not in Firefox. :-( It just starts and then immediately stops and "collects" the traces, before I have even had a chance to perform any action in VS. Is Firefox not supported?

I now also use JetBrains Rider, because VS is so unstable. I don't care about new features if VS feels like a minefield where I have to learn which files I cannot edit in due to crashes.


Original Comments

Feedback Bot on 3/7/2024, 04:07 AM:

(private comment, text removed)

Bent Rasmussen on 3/7/2024, 00:53 PM:

(private comment, text removed)

Bent Rasmussen on 3/7/2024, 00:55 PM:

(private comment, text removed)


Original Solutions

(no solutions)

@vzarytovskii
Copy link
Member

@vzarytovskii vzarytovskii added the Impact-High (Internal MS Team use only) Describes an issue with extreme impact on existing code. label Mar 13, 2024
@vzarytovskii
Copy link
Member

vzarytovskii commented Mar 13, 2024

I can't imagine what issue drags the whole VS down. Wondering if it's editor in general, or FCS itself. The fact, that it crashes not immediately, makes me think it's a memory leak or something similar.

@ijklam
Copy link
Contributor

ijklam commented Mar 13, 2024

This may temporary solve by closing "Remove unnecessary parentheses". The feature may cause a StackOverflowException. In my VS, this is start from 17.9.
Can reproduce by opening ServiceDeclarationLists.fs and move cursor up and down.

图片

@vzarytovskii
Copy link
Member

This may temporary solve by closing "Remove unnecessary parentheses". The feature may cause a StackOverflowException. In my VS, this is start from 17.9. Can reproduce by opening ServiceDeclarationLists.fs and move cursor up and down.

图片

Oh, that's interesting, than's for pointing this out

@brianrourkeboll
Copy link
Contributor

brianrourkeboll commented Mar 13, 2024

@Tangent-90 do you by chance have an example source file that triggers this?

Never mind, I see you mentioned ServiceDeclarationLists.fs.

@psfinaki
Copy link
Member

So I was just going to write "can reproduce in main" but then just in case I updated to the very latest main - and no, I cannot reproduce this anymore. I was playing with ServiceDeclarationLists and it was crashing stably within a minute or so whereas now I cannot crash whatever I do.

@brianrourkeboll thanks for coming to the discussion. Do you think this PR could make any difference in this scenario? Maybe it will instantly click in your head :)


Regardless, the parentheses thing may or may not be related to the original problem so I am trying to figure out more details internally.

@brianrourkeboll
Copy link
Contributor

brianrourkeboll commented Mar 13, 2024

@psfinaki what if you find a line with unnecessary parens and try to apply "fix all in document" in ServiceDeclarationLists.fs?

E.g.:

member x.Equals(item1, item2) = (fullDisplayTextOfModRef item1 = fullDisplayTextOfModRef item2)

For me, applying the fix to that line works fine, but applying it to the document does not. When you invoke "fix all in document," all files before the one you're applying it to are also parsed and checked1.

When trying to apply "fix all in document," I sometimes see a stack overflow inside the call to project here when the analyzer is applied to pars.fs. The actual exception is raised here, but there are thousands of stack frames from the inner loop function in pickAll in ServiceParseTreeWalk:

Screenshot 2024-03-13 114835

I also sometimes instead see a stack overflow in ConstraintSolver.fs here, ultimately I think from here, when doing the exact same thing, though:

image

In fact, I now keep getting the second exception and can no longer reproduce the first 🙃.

Either way, I wonder if the problem might have to do with tail-call optimization not being applied when it should be?

Footnotes

  1. I noticed that way back when I first added the analyzer... I'm not sure where that's coming from, but it's not from the analyzer itself.

@bent-rasmussen
Copy link

Hey @vzarytovskii I apparently can't comment on the now closed VS feedback issue, so quoting you here:

It seems that we have a bunch of telemetry, and I can see crashes, but not watson “dumps” (thus, I can’t see what caused it), which might mean that windows error reporting is blocked. You mentioned, that you see that behaviour in other files as well, are all of them F# files? Does it happen when IDE is idle as well, or only when you’re doing some work? It seems that in some of the sessions you have quite a bunch memory consumed by devenv, how much ram does your laptop has?

Yes, they were all F# files. I am not aware of issues with other kinds of files but we mostly work in F# in our small company - although we now also use a bit of C# on the frontend side. I have not noticed VS crashing without interaction but I won't rule it out.

Perhaps I can try to collect some detailed telemetry for you, if you tell me how, and VS continues being able to crash. At least I've experienced 10-20+ crashes so far. Can I somehow enable OpenTelemetry for F# in VS or run an ETW trace with some relevant providers switched on?

@vzarytovskii
Copy link
Member

Hey @vzarytovskii I apparently can't comment on the now closed VS feedback issue, so quoting you here:

It seems that we have a bunch of telemetry, and I can see crashes, but not watson “dumps” (thus, I can’t see what caused it), which might mean that windows error reporting is blocked. You mentioned, that you see that behaviour in other files as well, are all of them F# files? Does it happen when IDE is idle as well, or only when you’re doing some work? It seems that in some of the sessions you have quite a bunch memory consumed by devenv, how much ram does your laptop has?

Yes, they were all F# files. I am not aware of issues with other kinds of files but we mostly work in F# in our small company - although we now also use a bit of C# on the frontend side. I have not noticed VS crashing without interaction but I won't rule it out.

Perhaps I can try to collect some detailed telemetry for you, if you tell me how, and VS continues being able to crash. At least I've experienced 10-20+ crashes so far. Can I somehow enable OpenTelemetry for F# in VS or run an ETW trace with some relevant providers switched on?

I don't think OTEL will show anything (also, no easy way of turning it on to pipe it somewhere). Do you have windows crash reporting off by any chance? Or are you cancelling it? It's just weird that there's no Watson data (data windows collects when something crashes). Or does it by any chance offer you to launch it under debugger? This will also work, since you can then look at it and (without sharing with me) can at least tell what sort of exception is that.

It's also weird that I don't see that many crashes in the past 90 days, only around 5-7 spread across 17.8 and 17.9.

@brianrourkeboll
Copy link
Contributor

brianrourkeboll commented Mar 13, 2024

I also sometimes see a stack overflow in ConstraintSolver.fs here, ultimately I think from here, instead when doing the exact same thing, though:

image

In fact, I now keep getting the second exception and can no longer reproduce the first 🙃.

For what it's worth, I also get this stack overflow when applying "fix all in document" for "remove unused binding" in ServiceDeclarationLists.fs.

@psfinaki
Copy link
Member

I think I can confirm the above - my VS has crashed once I tried to apply "Simplify Names" on that whole document as well. We might have not "load tested" our code fixes to catch that earlier, but probably we also don't try to apply them on a big scale.

@bent-rasmussen do you use code fixes? Do you have especially large code files?

@bent-rasmussen
Copy link

Do you have windows crash reporting off by any chance? Or are you cancelling it? It's just weird that there's no Watson data (data windows collects when something crashes). Or does it by any chance offer you to launch it under debugger?

I looked a bit around in group policy settings and most things are "not configured" but I found out I've used a tool to disable some services and reverted a setting that seems to limit diagnostics telemetry, I didn't notice it did that :-( After that change, I tried crashing VS again:

Faulting application name: devenv.exe, version: 17.9.34622.214, time stamp: 0x65d7c444
Faulting module name: KERNELBASE.dll, version: 10.0.22621.3235, time stamp: 0x2b72307b
Exception code: 0xe053534f
Fault offset: 0x0000000000065b0c
Faulting process ID: 0x0x3224
Faulting application start time: 0x0x1DA756B3F7424AA
Faulting application path: C:\Program Files\Microsoft Visual Studio\2022\Professional\Common7\IDE\devenv.exe
Faulting module path: C:\WINDOWS\System32\KERNELBASE.dll
Report ID: 4a4921c4-0d2c-4892-894e-463f3b8c4666
Faulting package full name: 
Faulting package-relative application ID: 

@bent-rasmussen
Copy link

I think I can confirm the above - my VS has crashed once I tried to apply "Simplify Names" on that whole document as well. We might have not "load tested" our code fixes to catch that earlier, but probably we also don't try to apply them on a big scale.

@bent-rasmussen do you use code fixes? Do you have especially large code files?

I don't use code fixes a lot and they were not involved in the current issue.

I'm not sure what qualifies as a large file but the one I am crashing with now is ~1300 LOC.

@bent-rasmussen
Copy link

D'oh! - error reporting was turned off - as you rightly suspected Vlad - but now it is enabled:

image

New crash:

Faulting application name: devenv.exe, version: 17.9.34622.214, time stamp: 0x65d7c444
Faulting module name: KERNELBASE.dll, version: 10.0.22621.3235, time stamp: 0x2b72307b
Exception code: 0xe053534f
Fault offset: 0x0000000000065b0c
Faulting process ID: 0x0x1FB0
Faulting application start time: 0x0x1DA756F86CA9356
Faulting application path: C:\Program Files\Microsoft Visual Studio\2022\Professional\Common7\IDE\devenv.exe
Faulting module path: C:\WINDOWS\System32\KERNELBASE.dll
Report ID: 8d7505a7-f11b-4bbc-861b-9e40525fe87c
Faulting package full name: 
Faulting package-relative application ID: 

And windows error reporting:

Fault bucket 1204705653696841124, type 4
Event Name: APPCRASH
Response: Not available
Cab Id: 0

Problem signature:
P1: devenv.exe
P2: 17.9.34622.214
P3: 65d7c444
P4: KERNELBASE.dll
P5: 10.0.22621.3235
P6: 2b72307b
P7: e053534f
P8: 0000000000065b0c
P9: 
P10: 

Attached files:
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER.63f9d25b-4d53-4f68-be2d-6624dddf8320.tmp.mdmp
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER.e0768dce-1749-4d83-b700-64000e0d89aa.tmp.WERInternalMetadata.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER.0e000c4c-f6e2-46c5-8c63-0ced30ac3b78.tmp.csv
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER.a4bf5617-a57e-4547-b5ad-da442b3d287b.tmp.txt
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER.4a2b64ad-809d-49ea-8105-7c5aeb05f31b.tmp.xml

These files may be available here:
\\?\C:\ProgramData\Microsoft\Windows\WER\ReportArchive\AppCrash_devenv.exe_4879d5a51bb34538e74e7ffb55537091b216d8ae_4541e7ec_60c55b3f-aa21-4311-84c1-49f23cb169da

Analysis symbol: 
Rechecking for solution: 0
Report Id: 8d7505a7-f11b-4bbc-861b-9e40525fe87c
Report Status: 268435456
Hashed bucket: e05286d14de62117d0b7f968c38145a4
Cab Guid: 0

And

Fault bucket 125730739576, type 5
Event Name: PerfWatsonVS12Data
Response: Not available
Cab Id: 2198135043187788833

Problem signature:
P1: PerfWatsonTcdb
P2: 0
P3: 0
P4: 0
P5: 0
P6: 
P7: 
P8: 
P9: 
P10: 

Attached files:
\\?\C:\Users\Bent Rasmussen\AppData\Local\Temp\VSTelem.Out\202403131757_d17.9_17.9.34622.214_9352_1b004dd3-086b-4172-bfb2-4c317d262465.tcdb
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER.e51bd3bd-e9b3-4ce1-97ad-5a5c7ff4fbef.tmp.WERInternalMetadata.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER.f6237eeb-f576-4ece-8a3a-4358b6aeb404.tmp.csv
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER.2847d2e9-3cc8-4edc-b991-964e9ca5e3e8.tmp.txt
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER.cd5ae3f8-c5fc-408b-aa7c-6d16d9301237.tmp.xml
\\?\C:\Users\Bent Rasmussen\AppData\Local\Temp\WER.a8463d16-bcbb-40c0-b7ad-dbc169ffc793.tmp.WERDataCollectionStatus.txt

These files may be available here:
\\?\C:\ProgramData\Microsoft\Windows\WER\ReportArchive\NonCritical_PerfWatsonTcdb_688f135dd185f2a8133a474a2518efc6ce6cc4a_00000000_cab_6d6305e4-50d8-48aa-adb5-a8583bb3353e

Analysis symbol: 
Rechecking for solution: 0
Report Id: 6d6305e4-50d8-48aa-adb5-a8583bb3353e
Report Status: 268435464
Hashed bucket: fb30d2b9a7acf0305d4da51668259751
Cab Guid: a19ecac5-d178-47f0-9e81-582cf18bcc21

@vzarytovskii
Copy link
Member

Do you have windows crash reporting off by any chance? Or are you cancelling it? It's just weird that there's no Watson data (data windows collects when something crashes). Or does it by any chance offer you to launch it under debugger?

I looked a bit around in group policy settings and most things are "not configured" but I found out I've used a tool to disable some services and reverted a setting that seems to limit diagnostics telemetry, I didn't notice it did that :-( After that change, I tried crashing VS again:


Faulting application name: devenv.exe, version: 17.9.34622.214, time stamp: 0x65d7c444

Faulting module name: KERNELBASE.dll, version: 10.0.22621.3235, time stamp: 0x2b72307b

Exception code: 0xe053534f

Fault offset: 0x0000000000065b0c

Faulting process ID: 0x0x3224

Faulting application start time: 0x0x1DA756B3F7424AA

Faulting application path: C:\Program Files\Microsoft Visual Studio\2022\Professional\Common7\IDE\devenv.exe

Faulting module path: C:\WINDOWS\System32\KERNELBASE.dll

Report ID: 4a4921c4-0d2c-4892-894e-463f3b8c4666

Faulting package full name: 

Faulting package-relative application ID: 

So, what we figured out internally:

It looks like the user has windows telemetry level set to "Security", which means we can't get dumps from them via Watson.
 
I would ask them to enable dump collection for devenv.exe on the machine and have them manually upload the dump by following these instructions:

Run the following command in an administrator command prompt:

reg add "HKLM\Software\Microsoft\Windows\Windows Error Reporting\LocalDumps\devenv.exe" /v DumpType /d 2 /t REG_DWORD

This command will enable Windows to automatically collect a crash dump and stores it under %LOCALAPPDATA%\CrashDumps

Use Visual Studio normally to reproduce the crash, once a crash occurs let all Windows crash dialogs finish processing.

Go to %LOCALAPPDATA%\CrashDumps and locate the latest dump file for devenv.exe process.

Please zip the DMP file, reply to this thread and attach the dump using the attach / insert button (or if not available, you can upload and send to me to vlza@microsoft.com)

You can run the following command to revert back dump collection to its original state:

reg delete "HKLM\Software\Microsoft\Windows\Windows Error Reporting\LocalDumps" /f

More details at about collecting User Mode Dumps : Collecting User-Mode Dumps - Win32 apps | Microsoft Docs

@vzarytovskii
Copy link
Member

I have one strong suspicion: when I rewrote a bunch of things to state machines, some tail recursive things became non-tail recursive and might blow up now. Fix would be (when identified) to use async in them until we support tail calls in state machines.

@brianrourkeboll
Copy link
Contributor

@vzarytovskii

Fix would be (when identified) to use async in them until we support tail calls in state machines.

Hmm, yeah, the stack overflow I was seeing in TcModuleOrNamespaceElementNonMutRec does seem to go away for me if I switch it and the other functions with which it's mutually recursive to use async.

The parens analyzer one being triggered in FSharp.Compiler.Service seems likely to be unrelated to the async/cancellable thing, other than that it's also a stack overflow. I think this is what's happening:

All analyzers are run for all (preceding?) files in the project when "fix all in document" is selected for a code fix in a given document. It seems like we shouldn't be doing that in general...

In the FSharp.Compiler.Service project, that also means that if you run it on pretty much any other file, it is also run on pars.fs, which has a ~17,000-line array expression in it. My hunch is that something about the way pick and dive in SyntaxTraversal.traverse build up function chains cause the stack to overflow when a big enough call chain is kicked off. It may take a bit more work to address that properly, although I haven't thought it through yet...

@vzarytovskii
Copy link
Member

Cancellable itself wasn't moved to state machines, so it's slightly surprising to me that changing it to async fixed things.

Another suspicion I have is that we moved a bunch of things on stack (option -> voption), which essentially decreased its stack even if we use guard.

image

image

This is from the dump provided. Some deeply nested expressions there. As immediate solution we can decrease threshold for the stack size, but it will take time to release. Another option is trying setting an environment variable which controls it (need to look it up once in the office).

I also wonder why does it not fail in rider (or maybe it does and is getting restarted by IDE).

@vzarytovskii
Copy link
Member

A bit of explanation and what can be tried to make the situation better while we investigate what can be improved from our side;
For deeply recursive operations, such as type checking for arbitrary expressions, we use something called StackGuard, since those operations are not necessary tail recursive (nor they can be in many situations, see more here) and may result in overflowing the stack (which happens in this and couple of other cases). Stack guards were made configurable based on environment variables, could you please try if your particular issue is still happening if you set the FSHARP_TcStackGuardDepth environment variable to something between 50 and 70, if not try setting it to 40? If this doesn't help, then we're likely overflowing somewhere else and it will need more investigations.

@bent-rasmussen
Copy link

Sounds good! I've set FSHARP_TcStackGuardDepth to 50 and restarted VS. I will report back later with results if there is a crash or after some time if there is no crash.

@bent-rasmussen
Copy link

I've not seen any crashes since the environment variable was set, despite editing that file several times. It is looking promising. (As a side-note, surely not related to this issue, VS seems to also have started showing a lot of red squigglies - even when there aren't any errors. The squigglies then disappear once I build the whole solution.)

@vzarytovskii
Copy link
Member

I've not seen any crashes since the environment variable was set, despite editing that file several times. It is looking promising. (As a side-note, surely not related to this issue, VS seems to also have started showing a lot of red squigglies - even when there aren't any errors. The squigglies then disappear once I build the whole solution.)

Thanks for testing it, Bent. Interesting, I might look at the squiggles issues once done with stack issues.
We have identified couple more places, but also meanwhile we will reduce stack threshold for type checking and introduce couple more stack guards while we're figuring out long term sustainable solution.

@brianrourkeboll
Copy link
Contributor

Cancellable itself wasn't moved to state machines, so it's slightly surprising to me that changing it to async fixed things.

Hmm, it could be a coincidence, but I consistently get that stack overflow with cancellable, and it goes away if I change those functions to use async.

@majocha
Copy link
Contributor

majocha commented Mar 15, 2024

Cancellable itself wasn't moved to state machines, so it's slightly surprising to me that changing it to async fixed things.

Hmm, it could be a coincidence, but I consistently get that stack overflow with cancellable, and it goes away if I change those functions to use async.

@brianrourkeboll kindof what StackGuard does:

https://github.com/dotnet/fsharp/blob/main/docs/large-inputs-and-stack-overflows.md#stack-guards

type StackGuard(maxDepth: int, name: string) =
let mutable depth = 1
[<DebuggerHidden; DebuggerStepThrough>]
member _.Guard(f) =
depth <- depth + 1
try
if depth % maxDepth = 0 then
let diagnosticsLogger = DiagnosticsThreadStatics.DiagnosticsLogger
let buildPhase = DiagnosticsThreadStatics.BuildPhase
let ct = Cancellable.Token
async {
do! Async.SwitchToNewThread()
Thread.CurrentThread.Name <- $"F# Extra Compilation Thread for {name} (depth {depth})"
use _scope = new CompilationGlobalsScope(diagnosticsLogger, buildPhase)
use _token = Cancellable.UsingToken ct
return f ()
}
|> Async.RunImmediate
else
f ()
finally
depth <- depth - 1

@vzarytovskii
Copy link
Member

vzarytovskii commented Mar 15, 2024

Cancellable itself wasn't moved to state machines, so it's slightly surprising to me that changing it to async fixed things.

Hmm, it could be a coincidence, but I consistently get that stack overflow with cancellable, and it goes away if I change those functions to use async.

@brianrourkeboll kindof what StackGuard does:

https://github.com/dotnet/fsharp/blob/main/docs/large-inputs-and-stack-overflows.md#stack-guards

type StackGuard(maxDepth: int, name: string) =
let mutable depth = 1
[<DebuggerHidden; DebuggerStepThrough>]
member _.Guard(f) =
depth <- depth + 1
try
if depth % maxDepth = 0 then
let diagnosticsLogger = DiagnosticsThreadStatics.DiagnosticsLogger
let buildPhase = DiagnosticsThreadStatics.BuildPhase
let ct = Cancellable.Token
async {
do! Async.SwitchToNewThread()
Thread.CurrentThread.Name <- $"F# Extra Compilation Thread for {name} (depth {depth})"
use _scope = new CompilationGlobalsScope(diagnosticsLogger, buildPhase)
use _token = Cancellable.UsingToken ct
return f ()
}
|> Async.RunImmediate
else
f ()
finally
depth <- depth - 1

Yeah, there are similarities, but async doesn't necessarily guarantee thread switch (unless we have a bind for cancellable in async extension which does exactly that).

@abonie abonie added Area-Compiler-Checking Type checking, attributes and all aspects of logic checking and removed Needs-Triage labels Mar 18, 2024
@vzarytovskii
Copy link
Member

We merged a workaround for this specific crash (cancellable being non-tail-recursive when running on framework CLR). This will need more investigation however. We need to profile checking and see where do we recurse, but either not under stack guard or not tail-recursive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compiler-Checking Type checking, attributes and all aspects of logic checking Impact-High (Internal MS Team use only) Describes an issue with extreme impact on existing code.
Projects
Archived in project
Development

No branches or pull requests

8 participants