-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClosePseudoConsole API hanging #1810
Comments
This just needs somebody to sit down to debug. Thanks for the report. |
This is because the ConPTY is being started with It looks like if the PTY starts up fast enough so that the shutdown message comes after the cursor is inherited OR if the shutdown happens before the PTY starts asking for the cursor inherit, then everything is good. Given this is a race condition, I can't really blame it on the caller holding it wrong. I'll work to resolve it such that a shutdown is a valid way of halting the request for the cursor position while it is starting up. |
Oh, yeah, conhost is open source now. This is the VtIo startup code that is waiting to hear the cursor position from the calling terminal: Lines 238 to 255 in 8fa42e0
And this is the Signal thread that has noticed the shutdown and is attempting to acquire the lock so it can finish closing client process state: Lines 503 to 527 in 8fa42e0
Here is the stack of the conhost threads when it is stuck:
The resolution is likely to make the terminal/src/host/VtInputThread.cpp Line 111 in 8fa42e0
|
@Tyriar You might be able to work around this by not passing Disabling (We should definitely still fix the thing that @miniksa found) |
We added that because for tasks we write to the terminal before the process is launched, we could probably avoid almost all crashes by just using it when a terminal has already been written too which happens pretty infrequently. I'll start the conversation for the workaround in the vscode issue, thanks. |
The workaround might not be fixing the issue microsoft/vscode#76548 (comment) |
@Tyriar, did this behavior change between Code Insiders July 2019 and Code Insiders August 2019? I was attempting to fix this issue and then VS Code updated itself and now I can no longer repro it the same way. |
Ugh, it looks like it did. Now to find where the heck I can get the July 2019 one again and have it not auto update. |
@miniksa installer/zip at the top: https://code.visualstudio.com/updates/v1_36, you can set |
Beautiful, thank you! @Tyriar |
It's sounding like it's still happening when we use that flag (since we use it on tasks), just a lot more often than I would have expected based on my experience reproducing microsoft/vscode#71966 (comment) We can't really disable it for tasks as then the updates don't get printed to the terminal, that was another bug: #919 |
@Tyriar, OK. I'm working on a fix for the one specific stack/repro I had right now above. I have a solution for that particular race, but it revealed a deadlock behind that one. It looks like we're attempting to further paint one last frame out the output channel before everything is torn down. My suspicion here is that the write of the last frame is stuck because the read operation is happening on the same thread on your side of the fence as the call to close the pseudo console. So there's a deadlock as the output channel isn't being drained so the paint frame doesn't think it is done and is holding the session open until it is drained. Again, not something that is specifically your fault. We should be more robust than that. A workaround for this case might be to close the handles you're not intending to listen to anymore before calling ClosePseudoConsole. Then we can't get stuck on sending you information during teardown because the pipes will be broken. There's plenty of code handling teardown when the PTY pipes are broken. It's taking me a while to come up with a complete solution here because it seems like there's a wide spread problem in our code with reliability around synchronization and threading for the PTY. Thanks for your patience. |
@miniksa so https://github.com/microsoft/node-pty/blob/3e645898f83370db2894cdde09ac180235e1cfb7/src/win/conpty.cc#L438 should move above the ClosePseudoConsole call? |
No, I mean the handles |
@zadjii-msft, to totally fix this I need to re-evaluate terminal/src/interactivity/base/ServiceLocator.cpp Lines 39 to 46 in ff87190
I don't have full context on the issue you fixed in the past, so I don't want to break that while I'm fixing this. Right now I'm deadlocked on the Pty Signal Input thread getting a shutdown message and the If I in any way attempt to make this final paint asynchronous, there's then no guarantee that it will actually write before our process is torn down because we're calling RundownAndExit here. I'm trying to determine if there's a difference in context between this type of shutdown and the one that you fixed in MSFT: 15506250. If not, then we probably need to make it so the Pty Signal receiving a shutdown doesn't force the process closed through RundownAndExit but rather politely notifies everyone else that it's time to go and just lets whatever happen happen (and fix up any further issues that leave behind conhosts unexpectedly from there). Edit: Locked stack:
Oh, also a disconnect request is coming in here at the same time from the client:
|
So, the situation MSFT:15506250 is referring to is back in the earliest WSL tests of the conpty API. When you're using conpty to host a commandline like @benhillis can keep me honest if I'm mis-remembering this. This feels to me specifically different than the ClosePseudoConsole route, and I think you'd be fine changing it as you suggest. If someone called ClosePseudoConsole, then we could presume that they don't care anymore about something we have buffered, while in the scenario where the client exits (triggering our exit), we'd still want to deliver all the buffered output. |
I just hit the hang when closing a regular terminal (not a task) which has |
Next time you hit the hang or any other hang like this, can you please take a process dump of the supposedly hung conhost so I can check the stacks? If you're hitting a similar situation without the flag set, then you're getting hung in a different situation which I'm not necessarily working on fixing right now in this issue. |
Note that I've made a PR that fixes this in node-pty microsoft/node-pty#415, however it does not yet work in Electron 8 (haven't tried 9) when node integration is enabled as |
VS Code update: Electron isn't going to support |
I am currently actively looking into finally working around this ConPTY behavior. Honestly, if your above quoted workaround is the only way to work around this, I'd rather drop Windows support (sadly) or just wait until maybe or maybe not eventually ConPTY can handle with read/write operations in one thread. The reasoning is not that I want to be stubborn but rather that ConPTY (windows) is the only platform that requires this architectural change just to not freeze at shutdown. I'm trying it though, but it seems like not worth it for now. :) UPDATE: Okay, I've actually forgotten my own source code, and I remember why I did already split up PTY reads from writes (because ConPTY does not support non-blocking I/O). My problem was, that if the connected process behind ConPTY terminated, the already started ConPTY's Thanks :) |
@miniksa @Tyriar has merged microsoft/node-pty#415 (comment) into Node-pty. Can you provide an update when this might make it upstream to VSCode? |
Probably tomorrow's insiders, follow microsoft/vscode#116185 for updates |
Any updates? I am also having the same bug in my code. When I call |
microsoft/vscode#144016 is likely related |
Problem: * Calling `RundownAndExit` tries to flush out the last frame from `VtEngine` * `VtEngine` indirectly calls `RundownAndExit` if the pipe is gone via `VtIo` * `RundownAndExit` is called by other parts of OpenConsole * `RundownAndExit` must be `[[noreturn]]` for most parts of OpenConsole * `VtIo` itself has a mutex ensuring orderly shutdown * In short, doing a thread safe orderly shutdown requires us to hold both, a lock in `RundownAndExit` and `VtIo` at the same time, but while other parts need a blocking `RundownAndExit`, `VtIo` needs a non-blocking one * Refactoring the code to use optionally non-blocking `RundownAndExit` requires refactoring and might prove to be just as buggy Solution: * Simply don't call `RundownAndExit` in `VtEngine` at all * In case the write pipe breaks: * `VtEngine` will close the handle * The client should notice that their read pipe is broken and close their write pipe sooner or later * Once we notice that our read pipe is broken, we call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` but without a pipe it won't do anything * In case the read pipe breaks or any other part calls `RundownAndExit`: * We call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` and depending on whether the write pipe is broken or not it will simply write into it or ignore it Closes #14132 Pretty sure this also applies to #1810 ## Validation Steps Performed * Open 5 tabs and run MSYS2's `bash --login` in each of them * `Enter-VsDevShell` in another tab * Close window * 5 tab processes are killed instantly, 1 after ~3s ✅ * Replace conhost with OpenConsole via sfpcopy * Launch Dozens of Git Bash tabs in VS Code * Close them randomly * Remaining ones still work, processes are gone ✅
Problem: * Calling `RundownAndExit` tries to flush out the last frame from `VtEngine` * `VtEngine` indirectly calls `RundownAndExit` if the pipe is gone via `VtIo` * `RundownAndExit` is called by other parts of OpenConsole * `RundownAndExit` must be `[[noreturn]]` for most parts of OpenConsole * `VtIo` itself has a mutex ensuring orderly shutdown * In short, doing a thread safe orderly shutdown requires us to hold both, a lock in `RundownAndExit` and `VtIo` at the same time, but while other parts need a blocking `RundownAndExit`, `VtIo` needs a non-blocking one * Refactoring the code to use optionally non-blocking `RundownAndExit` requires refactoring and might prove to be just as buggy Solution: * Simply don't call `RundownAndExit` in `VtEngine` at all * In case the write pipe breaks: * `VtEngine` will close the handle * The client should notice that their read pipe is broken and close their write pipe sooner or later * Once we notice that our read pipe is broken, we call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` but without a pipe it won't do anything * In case the read pipe breaks or any other part calls `RundownAndExit`: * We call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` and depending on whether the write pipe is broken or not it will simply write into it or ignore it Closes #14132 Pretty sure this also applies to #1810 ## Validation Steps Performed * Open 5 tabs and run MSYS2's `bash --login` in each of them * `Enter-VsDevShell` in another tab * Close window * 5 tab processes are killed instantly, 1 after ~3s ✅ * Replace conhost with OpenConsole via sfpcopy * Launch Dozens of Git Bash tabs in VS Code * Close them randomly * Remaining ones still work, processes are gone ✅ (cherry picked from commit 1774cfd) Service-Card-Id: 86174637 Service-Version: 1.16
Problem: * Calling `RundownAndExit` tries to flush out the last frame from `VtEngine` * `VtEngine` indirectly calls `RundownAndExit` if the pipe is gone via `VtIo` * `RundownAndExit` is called by other parts of OpenConsole * `RundownAndExit` must be `[[noreturn]]` for most parts of OpenConsole * `VtIo` itself has a mutex ensuring orderly shutdown * In short, doing a thread safe orderly shutdown requires us to hold both, a lock in `RundownAndExit` and `VtIo` at the same time, but while other parts need a blocking `RundownAndExit`, `VtIo` needs a non-blocking one * Refactoring the code to use optionally non-blocking `RundownAndExit` requires refactoring and might prove to be just as buggy Solution: * Simply don't call `RundownAndExit` in `VtEngine` at all * In case the write pipe breaks: * `VtEngine` will close the handle * The client should notice that their read pipe is broken and close their write pipe sooner or later * Once we notice that our read pipe is broken, we call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` but without a pipe it won't do anything * In case the read pipe breaks or any other part calls `RundownAndExit`: * We call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` and depending on whether the write pipe is broken or not it will simply write into it or ignore it Closes #14132 Pretty sure this also applies to #1810 ## Validation Steps Performed * Open 5 tabs and run MSYS2's `bash --login` in each of them * `Enter-VsDevShell` in another tab * Close window * 5 tab processes are killed instantly, 1 after ~3s ✅ * Replace conhost with OpenConsole via sfpcopy * Launch Dozens of Git Bash tabs in VS Code * Close them randomly * Remaining ones still work, processes are gone ✅ (cherry picked from commit 1774cfd) Service-Card-Id: 86178271 Service-Version: 1.15
Problem: * Calling `RundownAndExit` tries to flush out the last frame from `VtEngine` * `VtEngine` indirectly calls `RundownAndExit` if the pipe is gone via `VtIo` * `RundownAndExit` is called by other parts of OpenConsole * `RundownAndExit` must be `[[noreturn]]` for most parts of OpenConsole * `VtIo` itself has a mutex ensuring orderly shutdown * In short, doing a thread safe orderly shutdown requires us to hold both, a lock in `RundownAndExit` and `VtIo` at the same time, but while other parts need a blocking `RundownAndExit`, `VtIo` needs a non-blocking one * Refactoring the code to use optionally non-blocking `RundownAndExit` requires refactoring and might prove to be just as buggy Solution: * Simply don't call `RundownAndExit` in `VtEngine` at all * In case the write pipe breaks: * `VtEngine` will close the handle * The client should notice that their read pipe is broken and close their write pipe sooner or later * Once we notice that our read pipe is broken, we call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` but without a pipe it won't do anything * In case the read pipe breaks or any other part calls `RundownAndExit`: * We call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` and depending on whether the write pipe is broken or not it will simply write into it or ignore it Closes #14132 Pretty sure this also applies to #1810 ## Validation Steps Performed * Open 5 tabs and run MSYS2's `bash --login` in each of them * `Enter-VsDevShell` in another tab * Close window * 5 tab processes are killed instantly, 1 after ~3s ✅ * Replace conhost with OpenConsole via sfpcopy * Launch Dozens of Git Bash tabs in VS Code * Close them randomly * Remaining ones still work, processes are gone ✅
Problem: * Calling `RundownAndExit` tries to flush out the last frame from `VtEngine` * `VtEngine` indirectly calls `RundownAndExit` if the pipe is gone via `VtIo` * `RundownAndExit` is called by other parts of OpenConsole * `RundownAndExit` must be `[[noreturn]]` for most parts of OpenConsole * `VtIo` itself has a mutex ensuring orderly shutdown * In short, doing a thread safe orderly shutdown requires us to hold both, a lock in `RundownAndExit` and `VtIo` at the same time, but while other parts need a blocking `RundownAndExit`, `VtIo` needs a non-blocking one * Refactoring the code to use optionally non-blocking `RundownAndExit` requires refactoring and might prove to be just as buggy Solution: * Simply don't call `RundownAndExit` in `VtEngine` at all * In case the write pipe breaks: * `VtEngine` will close the handle * The client should notice that their read pipe is broken and close their write pipe sooner or later * Once we notice that our read pipe is broken, we call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` but without a pipe it won't do anything * In case the read pipe breaks or any other part calls `RundownAndExit`: * We call `RundownAndExit` * `RundownAndExit` might call back into `VtEngine` and depending on whether the write pipe is broken or not it will simply write into it or ignore it Closes #14132 Pretty sure this also applies to #1810 ## Validation Steps Performed * Open 5 tabs and run MSYS2's `bash --login` in each of them * `Enter-VsDevShell` in another tab * Close window * 5 tab processes are killed instantly, 1 after ~3s ✅ * Replace conhost with OpenConsole via sfpcopy * Launch Dozens of Git Bash tabs in VS Code * Close them randomly * Remaining ones still work, processes are gone ✅ (cherry picked from commit 1774cfd)
We are pretty sure that #14160 fixed this? |
Nice! Sounds like it. |
There's still an easily reproducible bug in WSL however. It calls The bug reproduces if you execute any CLI application that writes more than 4kB of text before conhost has a chance to exit. This happens with applications similar to |
I'm gonna close this out, because we think this was fixed in 1.17. It'll obviously be a while before we can get a newer version ingested into VsCode (cause we basically need to do... #6999? That doesn't look right but it's close), but the code on our end should be finished. If this still repros with the 1.17+ version of the API, then we'll probably need something that builds on that fix, rather than revert that and try something new. |
Environment
Steps to reproduce
terminal.integrated.windowsEnableConpty
Expected behavior
No hanging.
Actual behavior
The window hangs. Using windbg, I see the following:
Notice the many leftover conhosts in the task manager as well as the hang in ClosePseudoConsole.
Here is the link to the code in node-pty calling into the PseudoConsole API.
https://github.com/microsoft/node-pty/blob/04445ed76f90b4f56a190982ea2d4fcdd22a0ee7/src/win/conpty.cc#L429
/cc @daimms
The text was updated successfully, but these errors were encountered: