-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WSL2: Programms e.g. git exiting unexpectedly #7506
Comments
Same issue noticed, same symptoms. Repeatedly opening multiple bash windows quickly cause some of them to fail with 'process exited with code 1' |
I also have the same issue. I need to try open several bash windows to get one without 'process exited with code 1'. |
This is utterly annoying. Some mentioned |
I've found a workaround on JetBrains' forum that works for me. This should give a better clue about the problem to the WSL 2 devs as well:
Then everything works fine for few hours. |
yeah, also solved this problem for the moment with the workaround described in https://youtrack.jetbrains.com/issue/IDEA-276250. As it looks like a bug specific to Windows 11 (Windows 10 looks okay), when can we expect a fix for this from the developers? |
Just wanted to add that I am experiencing a very similarly looking issue with WSL on Windows 10 for almost a year (on the Insider channel, Dev if I recall correctly) so I don't think that it is a Windows 11 specific issue, and probably not new at all. I was using RubyMine with WSL2 and git was randomly failing to respond (it was usually responded on a new try). |
If you are getting same error output as OP then init process failed at |
I swear this had gotten worse since upgrading to v0.61.4/5 but that might just be coincidence. @OneBlue @benhillis would it be possible to get some eyes on this? (or a comment saying that eyes have been on it) It's a real barrier to being able to comfortably use JetBrains products on WSLv2, and so far I've not seen a lot of hope that this will get improved or fixed. Everyone over on IDEA-276250 seems to think that #7883 is the problem behind this, which you said a fix was being rolled out yet this problem remains; I also don't actually see a strong connection between this issue and that one (for one it's about Windows programs hanging inside WSL when I'd expect IntelliJ to be calling WSL programs from Windows + hanging wouldn't cause an exit code of 1 since by definition there is no exit) so am not sure why people are thinking this is the same issue. Everyone keeps suggesting removing the interop sockets in I've also not seen any indication that MS & JetBrains are working together on this, which seems silly - either JetBrains are really abusing WSLv2 somehow, which probably means there's some opportunity to improve WSLv2 by trying to implement "something" that'll let them do what they need to do in a stable way which would benefit others too (this problem occurs across the board in IntelliJ: ESLint, TypeScript, Node, Git, Ruby, I would love to be wrong about this: maybe you folks are in deep discussions or maybe that issue is the fix it's just not completely rolled out yet (in which case it'd be cool to get that confirmed!), but it's really frustrating to not see any engagement on something that is so disruptive - up until now I've avoided commenting because I know this (i.e. Linux Kernels, cross-vm communications, low-level OS apis, etc) is not my domain but I feel I'm at the point where I have to say something to try and poke the bear. fwiw I'm able to reproduce this from PowerShell just with It doesn't happen a lot, but it does happen - it actually feels like it happens more the less you do the command, e.g. if I run that command a bunch in a short period I generally won't hit this bug, but if I alt-tab away for a few minutes and come back it seems to happen first time. This could just be more coincidence though 🤷 |
Thanks for reporting this @G-Rath. Can you capture logs of a repro where wsl.exe fails and returns 1 ? |
/logs |
Hello! Could you please provide more logs to help us better diagnose your issue? To collect WSL logs, download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:
The scipt will output the path of the log file once done. Once completed please upload the output files to this Github issue. Click here for more info on logging Thank you! |
Here you go! WslLogs-2022-07-02_11-56-34.zip It's from me doing |
@OneBlue I'm assuming that those logs are not terminal/instance specific (so those were captured in an admin pwsh instance after I'd reproduced the issue in a second non-admin pwsh instance) but let me know if that's not the case and I'll capture the logs from the same terminal that I did the reproduction in |
@OneBlue these logs might be better: I think the way I was initially starting pwsh as an admin meant it didn't correctly handle the "press any key" part so it just immediately collected logs which I assume won't include the stuff you want. These logs were done by:
|
Ok looking at the logs I see multiple things: 1 - Errors while accessing the terminal HANDLE. It doesn't appear to be fatal, but it's definitely unexpected 2 - A failed bind() call for the interop socket, which returns After looking at this though I can confirm that this has nothing do with #7883. |
As far as I know they're not doing anything specifically with the sockets, they're just calling Is there anything else I can do to help you with this? |
One thing that would be interesting to see would be what's running inside WSL once you get in this state. If you can repro this easily, can you share a screenshot of htop, and |
I'll see if I can get that for you 👍 |
Ok so I've been trying to reproduce it again but so far had no luck - I restarted my laptop since I was working on a few things so wanted to make the processes all fresh and whatnot, and now I've not been able to reproduce it since (and have not yet opened my IDE). I'm not that surprised since as in the OP a restart typically resolves the problem for a while, but this is all pointing to there being a stateful component to this (and that it's probably not just the number count for the sockets since I've got 2500+ of them now). This is the script I'm using to try and reproduce this in bulk:
I've got this running in three terminals to try and simulate multiple calls happening at once in case that's part of the trigger but so far they've been fine - I've got more than 5000 sockets without trouble, yet usually this happens when there's less than 1000 sockets. To me this does also indicate there's a potentially just "cosmetic" bug given that all those sockets should be for processes that have long since terminated:
I would have thought these sockets should get deleted/closed/cleaned up at some point, but there might not be any cost to them still existing so maybe this is already known 🤷 Currently I still don't think the IDE is doing something extoic with WSL that causes this (based on what I have/have not seen in its logs & from my digging), but I'm now going to open it to see if the issue become reproducible since I expect it to be interacting with WSL in a more complex way than my little script. |
Ok I was able to reproduce this easily with my script as soon as I opened a Go project in my IDE. @OneBlue here's a "before" and "after" of There doesn't seem to be anything interesting in (out of shot there are just a couple more I'm going to restart my laptop again and this time try reproducing with the IDE open straight away to see if that reduces the diff of those |
After restarting my laptop and then starting up the IDE the issue is reproducible pretty much straight away so yeah I'm starting to think it is doing something that's a factor. I also remembered something that I think might have a role in this: in order to get file events, the IDE uses a filesystem watcher that it runs in the WSL instance in the background - since I'm not a JetBrains dev I can't be sure of much more than this, but my thinking is that that would be long-running process which somehow has to be communicated with.. I could be completely off the mark though, as I'm not that knowledgeable on this sort of stuff - I did do a bit of digging in the JetBrains open source repository:
I might try and have a play around with the |
From the JetBrains side, I'd like to see the issue resolved, but can't see so far how we can help. To the best of my knowledge, IDEA doesn't do anything hacky to WSL or its VMs - it just launches |
@trespasserw I think a good place to start could be to collect a list of all calls to WSL the IDE makes as part of it's usual operations. The best way to try and do that is probably to implement a flag that has classes like My investigation have shown that just opening a project in the IDE seems to be enough to cause this bug (which have been Go, Ruby, Python, and Javascript projects) - if I knew details of the actual commands the IDE is calling I would be happy to continue with my investigation to try and pin down a reproduction. Also after having played around with |
Something else that just occurred to me is that IDEA does interact with WSL in another way: via it's filesystem on This might be something for @OneBlue to weight in on as to if these two areas could impact each other (I've got no idea how the fs side interacts with the kernel/process side), but I'm wondering if this issue could be because of potentially high interaction from IDEA over that? This is a shot in the dark, but that would be something IDEA always interacts with when it starts up, and keeps interacting over time... |
I'd like to know what exactly is needed. Blanket collection like you're suggesting is possible, but will produce too much raw data to comfortably dig through. |
Thank you for all the info @G-Rath. Based on the error we're seeing I'm suspecting that someone has a file descriptor opened on the unix socket that init uses. Unfortunately, the output of Can you share that output again before and after a repro of the issue, along with an htop screenshot (if possible can you also press 'F5' before taking the htop screenshot, so we get a tree view). Once you have the repro, please also share the output of |
@OneBlue here you go htop before: htop after: |
I've also captured some output caused by having IntelliJ run Interestingly (or maybe not?) nothing seems to happen in htop for |
Maybe this information helps getting closer to the cause. On my win 11 machine I have this problem since months. Think it already started 2021. On my win 10 Pro 21H2 machine I didn't have this problem before, until today, after installation of KB5015807 (https://support.microsoft.com/en-au/topic/july-12-2022-kb5015807-os-builds-19042-1826-19043-1826-and-19044-1826-8c8ea8fe-ec83-467d-86fb-a2f48a85eb41) |
Removing all files in #!/bin/sh
skip_files=$(ps -Ao pid= | sed 's/.*/&_interop/')
find /run/WSL -name '*_interop' $(printf "! -name %s " $skip_files) | xargs rm -f This is still a workaround but I never encounter interop issues again. |
Thank you for all the info @G-Rath. I have a couple theories on what could be causing this, but I need more info to confirm. When this reproes again, can you please share:
|
@OneBlue here you go. before: before-ss-elx-out.txt after: after-ss-elx-out.txt logs: |
Thanks for everything @G-Rath, with your help we found the root cause: There's a code path in which we leave the unix socket we use for interop behind. We checked in a fix to make sure that the unix sockets are properly removed when the session leader exits. The fix is included in 0.64.0 |
Thanks to everybody for the help sorting this out! |
@OneBlue I've just updated to the latest |
I'm still having this issue - seems like a kernel version problem. When will the fix in 0.64.0 make it into LTS? As of today, I am not seeing anything after March 2022 on either https://www.catalog.update.microsoft.com/Search.aspx?q=wsl or https://github.com/microsoft/WSL2-Linux-Kernel which I believe is what Microsoft Update pulls from if you're not using Windows Insider / Fast-Ring; nor am I seeing anything in Microsoft Store. |
I'm not a Windows Insider and cannot join to access 0.64.0 because someone named "organization" doesn't want me to. Releeease this to the masses now, it was a big bug! Financial impacts! Critical levels of highest criticality! Didn't this make the news? What if $MSFT falls precipitously!?! Our pants are literally on fire. It's patch Tuesday TODAY!!!! |
Actually no, the fix is in WSL itself, not a kernel release.
Those are only kernel releases. Since Windows 10 21H2 and Windows 11 were released, there hasn't been a new GA release of WSL that I'm aware of. In November, the WSL team transitioned to a "Preview" model via an App installed through the Microsoft Store. You can see the latest WSL release notes here in this repo, on the releases page. (Edit: I just noticed that there are binary packages here. I have not tried installing it directly yet.) Windows 11 users can install the latest, which is currently 0.64.0, through the Microsoft Store using the "Windows Subsystem for Linux Preview". It's not yet clear or announced how this will come to GA users. I personally have theorized several possibilities, but I have no insight into the decision making process:
|
@NotTheDr01ds What a tortured release process. WSL2 looks it is being run like a product to this outsider, not an OS feature, and it should get patch releases on the team's release cadence just like any other Microsoft app. I could understand WSL1 being released with Windows because it was all nanoprocesses with lots of low level kernel/file system hackery. |
Because I didn't have the same problems in Win 10 (until 10 Pro 21H2) as I did in Win 11 there must definatley be a difference between the two versions. Luckily it also didn't happen in Win 10 again; so was just once at 13th of July (above). Until the problems are not fixed on Win 11 I wouldn't suggest to upgrade to Windows 11 only for the "worse" WSL experience :) |
@surfaceowl Apologies - I was wrong on one thing in my comment yesterday (which I've now edited). I completely missed the fact that there are binary releases of the Preview here in this repo. On the Releases page, expand the Assets for a particular release, and you'll see the WarningI am not sure how "supported" the following is. I'm posting this on the basis that:
Second WarningI do not know what effect these will have on a Windows 10 system (or even if they will install). They are clearly intended for Windows 11, since they included WSLg (which makes use of functionality only available in Windows 11). If they will cause issues on Windows 10, then I would hope that there is a mechanism that would prevent their installation there. That said, I have not attempted this on either of my Windows 10 systems, as those are both installations that I don't want to potentially corrupt. At some point, I may try to convert my "test" system back to Windows 10 and give it a go, but this is time intensive, of course. If you've ignored the warnings and want to proceed ...With that said, I was able to install the Preview without using the Microsoft Store by:
The command completes without any message, and then
|
@ProvokerDave It seems to me from the WSL team's actions that they definitely want to move to a "Product" release cycle, similar to what the Terminal team does. However, as I mentioned, there are Windows limitations that aren't yet fixed that don't seem to allow them to do this just yet. Hopefully, and if possible to do so in a secure manner, these limitations will be fixed in 22H2.
You don't need to be on a Windows Insider release in order to use 0.64.0 -- Just Windows 11. If you have Windows 11, and your organization won't let you use the Store to install the "Windows Subsystem for Linux Preview" (which is currently 0.64.0), then you can try the process in my comment above to install the Preview without access to the Store. |
@NotTheDr01ds Thanks for the powershell tips! |
I am having this problem with phpstorm on windows 11 2 years later. exact same issue. i would imagine i should be running the version that includes the bugfix 2 years lateR? |
Windows Build Number
11.0.22000.194
WSL Version
Kernel Version
5.10.60.1
Distro Version
Debian 11.0
Other Software
PhpStorm 2021.2.2
Build #PS-212.5284.49, built on September 16, 2021
git version 2.30.2
nginx/1.18.0
PHP 8.0.11 (cli) (built: Sep 23 2021 22:04:05) ( NTS )
Repro Steps
Expected Behavior
Actual Behavior
Diagnostic Logs
Excerpt of dmesg right after the issue happend:
[ 901.884823] init[20337]: segfault at 564e454c536f ip 0000000000262c2c sp 00007ffeee6a6fa0 error 6 in init[257000+ed000]
[ 901.884829] Code: 48 39 c8 75 1d 48 c7 c0 fe ff ff ff 44 89 e9 48 d3 c0 f0 48 21 05 fc 4a 0f 00 49 8b 4f 10 49 8b 47 18 48 89 48 10 49 8b 4f 10 <48> 89 41 18 49 8b 47 08 48 89 c1 48 83 c9 01 49 89 4f 08 48 83 e0
[ 901.884834] potentially unexpected fatal signal 11.
[ 901.884836] CPU: 15 PID: 20337 Comm: init Not tainted 5.10.60.1-microsoft-standard-WSL2 #1
[ 901.884843] RIP: 0033:0x262c2c
[ 901.884845] Code: 48 39 c8 75 1d 48 c7 c0 fe ff ff ff 44 89 e9 48 d3 c0 f0 48 21 05 fc 4a 0f 00 49 8b 4f 10 49 8b 47 18 48 89 48 10 49 8b 4f 10 <48> 89 41 18 49 8b 47 08 48 89 c1 48 83 c9 01 49 89 4f 08 48 83 e0
[ 901.884848] RSP: 002b:00007ffeee6a6fa0 EFLAGS: 00010297
[ 901.884850] RAX: 0000000002300000 RBX: 0000000000000020 RCX: 0000564e454c5357
[ 901.884851] RDX: 0000000000357718 RSI: 0000000000000003 RDI: 0000000000000001
[ 901.884852] RBP: 0000000000357720 R08: fefefefefefefeff R09: 8080808080808080
[ 901.884854] R10: fefefefefefefeff R11: 0000000000000206 R12: 0000000000000020
[ 901.884855] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000023028f0
[ 901.884857] FS: 0000000000356ce0 GS: 0000000000000000
[ 3029.277998] init: (13687) ERROR: Create:129: bind failed 98
[ 3128.007767] init: (14077) ERROR: Create:129: bind failed 98
[ 3131.144419] init: (14121) ERROR: Create:129: bind failed 98
[ 3135.575953] init: (14197) ERROR: Create:129: bind failed 98
[ 3149.998440] init: (14364) ERROR: Create:129: bind failed 98
[ 3178.126922] init: (14456) ERROR: Create:129: bind failed 98
[ 4025.063289] init: (15779) ERROR: Create:129: bind failed 98
The text was updated successfully, but these errors were encountered: