-
Notifications
You must be signed in to change notification settings - Fork 866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wsl.exe hangs from powershell until subsystem killed - arm64 aarch64 #10309
Comments
@hwine WSL version
What distro is |
I'll update when I get a chance, but there was nothing in the release notes that hinted at a change. This has been going on for months with wsl hangs (see also #9454). This is an example where the hang impacts win11 apps (
As noted, it's Ubuntu 22.04 - and termination-after-wsl2-hang works 99% of the time. Due to the length of time this has been occurring, I have tools that collect dumps of all relevant windows apps as it cleans up -- I'm hoping this will be easier for MS to troubleshoot than "it breaks in linux after some hours and random set of commands". What CPU are you using? This appears to be arm64 specific (aka aarch64). |
|
Thank you for reporting this @hwine. Can you confirm if you can still reproduce the issue with 1.3.14 ? |
I have updated to 1.3.14 and will update as soon as it does reproduce for I've stopped reporting the "hangs in linux" in #9454 as the dumps provided by myself and others didn't appear to provide useful information. Would you like the "hang in linux" dumps on this issue as well? |
Yes please ! /dumps |
Hello! Could you please provide logs and process dumps to help us better diagnose your issue? To collect WSL logs and dumps, download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:
The scipt will output the path of the log file once done. Once completed please upload the output files to this Github issue. Click here for more info on logging Thank you! |
Okay, I got a dump during a hang on error messages Directory: C:\Users\hwine\Desktop\plumbum
Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 2023-07-27 07:25 WslLogs-2023-07-27_07-25-50
Get-WindowsOptionalFeature: C:\Users\hwine\Desktop\plumbum\collect-wsl-logs.ps1:38
Line |
38 | Get-WindowsOptionalFeature -Online > $folder/optional-components.txt
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| Class not registered
Log collection is running. Please reproduce the problem and press any key to save the logs.
Saving logs...
InvalidOperation: C:\Users\hwine\Desktop\plumbum\collect-wsl-logs.ps1:104
Line |
104 | $DumpMethod = $Assembly.GetNestedType('NativeMethods', 'NonPublic …
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| You cannot call a method on a null-valued expression.
Directory: C:\Users\hwine\Desktop\plumbum\WslLogs-2023-07-27_07-25-50
Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 2023-07-27 07:28 dumps
Writing C:\Users\hwine\Desktop\plumbum\WslLogs-2023-07-27_07-25-50\dumps/wsl.4892.dmp
InvalidOperation: C:\Users\hwine\Desktop\plumbum\collect-wsl-logs.ps1:117
Line |
117 | $Result = $DumpMethod.Invoke($null, @($process.Handle,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| You cannot call a method on a null-valued expression.
Failed to write dump for: C:\Users\hwine\Desktop\plumbum\WslLogs-2023-07-27_07-25-50\dumps/wsl.4892.dmp
Writing C:\Users\hwine\Desktop\plumbum\WslLogs-2023-07-27_07-25-50\dumps/wsl.18048.dmp
InvalidOperation: C:\Users\hwine\Desktop\plumbum\collect-wsl-logs.ps1:117
Line |
117 | $Result = $DumpMethod.Invoke($null, @($process.Handle,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| You cannot call a method on a null-valued expression.
Failed to write dump for: C:\Users\hwine\Desktop\plumbum\WslLogs-2023-07-27_07-25-50\dumps/wsl.18048.dmp
Writing C:\Users\hwine\Desktop\plumbum\WslLogs-2023-07-27_07-25-50\dumps/wsl.26104.dmp
InvalidOperation: C:\Users\hwine\Desktop\plumbum\collect-wsl-logs.ps1:117
Line |
117 | $Result = $DumpMethod.Invoke($null, @($process.Handle,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| You cannot call a method on a null-valued expression.
Failed to write dump for: C:\Users\hwine\Desktop\plumbum\WslLogs-2023-07-27_07-25-50\dumps/wsl.26104.dmp
Writing C:\Users\hwine\Desktop\plumbum\WslLogs-2023-07-27_07-25-50\dumps/wsl.26652.dmp
InvalidOperation: C:\Users\hwine\Desktop\plumbum\collect-wsl-logs.ps1:117
Line |
117 | $Result = $DumpMethod.Invoke($null, @($process.Handle,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| You cannot call a method on a null-valued expression.
Failed to write dump for: C:\Users\hwine\Desktop\plumbum\WslLogs-2023-07-27_07-25-50\dumps/wsl.26652.dmp
Writing C:\Users\hwine\Desktop\plumbum\WslLogs-2023-07-27_07-25-50\dumps/wslhost.14276.dmp
InvalidOperation: C:\Users\hwine\Desktop\plumbum\collect-wsl-logs.ps1:117
Line |
117 | $Result = $DumpMethod.Invoke($null, @($process.Handle,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| You cannot call a method on a null-valued expression.
Failed to write dump for: C:\Users\hwine\Desktop\plumbum\WslLogs-2023-07-27_07-25-50\dumps/wslhost.14276.dmp
Writing C:\Users\hwine\Desktop\plumbum\WslLogs-2023-07-27_07-25-50\dumps/wslservice.7120.dmp
InvalidOperation: C:\Users\hwine\Desktop\plumbum\collect-wsl-logs.ps1:117
Line |
117 | $Result = $DumpMethod.Invoke($null, @($process.Handle,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| You cannot call a method on a null-valued expression.
Failed to write dump for: C:\Users\hwine\Desktop\plumbum\WslLogs-2023-07-27_07-25-50\dumps/wslservice.7120.dmp
Collecting additional network state...
d---- 2023-07-27 07:28 networking
Logs saved in: C:\Users\hwine\Desktop\plumbum\WslLogs-2023-07-27_07-25-50.zip. Please attach that file to the GitHub issue.
(.venv) C:\Users\hwine\Desktop\plumbum [master ≡ +10 ~1 -0 !]> ls .\WslLogs-2023-07-27_07-25-50.zip
Directory: C:\Users\hwine\Desktop\plumbum
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 2023-07-27 07:28 30189 WslLogs-2023-07-27_07-25-50.zip |
Thank you @hwine. Unfortunately all the dumps in the .zip are empty. I think it's because of you're running an ARM64 system and the dump mecanism is different. Can you try to collect dumps manually via task manager for wslservice.exe and wsl.exe instead ? (details tab, right click on the process -> Create Memory dump file) |
@OneBlue -- I had also collected dumps for this event using a script I have, and just submitted the file via this procedure. I'd love feedback on my collection approach - I seem to be collecting "too much" data (I am not a windows developer). My script does a apps_of_interest = (
'vmcompute.exe',
'vmwp.exe',
'vmmemWSL',
'wslservice.exe',
'wsl.exe',
'wslhost.exe',
) |
Had what I consider a "normal hang" where wsl became unresponsive withing 3 hours of restart, with no activity. I was able to terminate the session normally with |
Updated WSL, still hangs on: start wsl, wait a few hours, find terminal unresponsive wsl versionC:\Users\hwine> wsl -v -l
WSL version: 1.3.15.0
Kernel version: 5.15.90.4-1
WSLg version: 1.0.55
MSRDC version: 1.2.4419
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25880.1000-230602-1350.main
Windows version: 10.0.22631.2129 |
Hung in linux, attempted to start a second wsl into same distribution from admin console. That WSL hung. Powershell outputC:\Users\hwine\Downloads> wsl -v -l
WSL version: 1.3.15.0
Kernel version: 5.15.90.4-1
WSLg version: 1.0.55
MSRDC version: 1.2.4419
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25880.1000-230602-1350.main
Windows version: 10.0.22631.2191
C:\Users\hwine\Downloads> wsl -l -v
NAME STATE VERSION
* work Stopped 2
rancher-desktop Stopped 2
Ubuntu Stopped 2
podman-machine-default Stopped 2
C:\Users\hwine\Downloads> wsl -l -v
NAME STATE VERSION
* work Running 2
rancher-desktop Stopped 2
Ubuntu Stopped 2
podman-machine-default Stopped 2
C:\Users\hwine\Downloads> wsl -d work --user root --system
The remote procedure call failed.
Error code: Wsl/Service/RPC_S_CALL_FAILED
C:\Users\hwine\Downloads> |
I think I see this problem as well intermittently. I'm running WSL 2 / Ubuntu with Windows On Arm. Occasionally my WSL terminal becomes non-responsive, and "wsl --shutdown" can take ~10 mins or so. My ARM device has 16 GB RAM, and a Snapdragon (TM) 8cx Gen 3 @ 3.0 GHz Processor. I also usually have Windows Subsystem for Android (Kindle App) running on the device at the same time in case that might matter. |
Hung in powershell, during Powershell outputC:\Users\hwine> .\shutdown-wsl.ps1
> date
Monday, September 18, 2023 09:29:24
> wsl -l -v
The remote procedure call failed.
Error code: Wsl/Service/RPC_S_CALL_FAILED
> date
Monday, September 18, 2023 09:34:46
> wsl -t work
The operation completed successfully.
> date
Monday, September 18, 2023 09:34:51
> wsl --shutdown
The remote procedure call failed.
Error code: Wsl/RPC_S_CALL_FAILED
> date
Monday, September 18, 2023 09:38:36
C:\Users\hwine> |
Thank you @hwine. Unfortunately, all of the .zip you share seem to be corrupted. Can you try to capture dumps again and maybe trying another archive format ? |
Ah, probably because I use |
Thank you @hwine. I can read those dumps. What I can see is that things appear to be stuck on Linux side. To investigate that, can you please:
|
@OneBlue - new data. Getting a reproduction my normal way (just doing my daily work) was too hit-or-miss, so I've been trying to find more directs steps to reproduce. Tonight, I got 2 The other difference is in which instances are in play:
Questions before I get to the data:
Hang 1Running only a single instance, managed by Powershell terminal outputC:\Users\hwine> wsl -l -v
NAME STATE VERSION
* work Stopped 2
rancher-desktop Stopped 2
Ubuntu Stopped 2
podman-machine-default Stopped 2
C:\Users\hwine> podman machine start
Starting machine "podman-machine-default"
This machine is currently configured in rootless mode. If your containers
require root permissions (e.g. ports < 1024), or if you run into compatibility
issues with non-podman clients, you can switch using the following command:
podman machine set --rootful
API forwarding for Docker API clients is not available due to the following startup failures.
could not start api proxy since expected pipe is not available: podman-machine-default
Podman clients are still able to connect.
Machine "podman-machine-default" started successfully
C:\Users\hwine> wsl -l -v
NAME STATE VERSION
* work Stopped 2
rancher-desktop Stopped 2
Ubuntu Stopped 2
podman-machine-default Running 2
C:\Users\hwine> podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost/pythonsamplevscodeflasktutorial latest 251788c9b3fc 2 months ago 181 MB
docker.io/library/httpd latest 911d72fc5020 2 months ago 200 MB
registry.access.redhat.com/ubi8-micro latest 9c36f420d8f4 3 months ago 30.4 MB
docker.io/library/python 3.10-slim 9c3d6fd4ce06 3 months ago 176 MB
C:\Users\hwine> podman pull mozilla/autograph
Resolving "mozilla/autograph" using unqualified-search registries (/etc/containers/registries.conf.d/999-podman-machine.conf)
Trying to pull docker.io/mozilla/autograph:latest...
Getting image source signatures
Copying blob sha256:d5323bd1ce153a2e0df17f04d0bdfdeb29b48ca521c3c7734845b5e264fa78b6
Copying blob sha256:dd9d1af719764344a5a57ac06d5f23a36bdca0aeec69294d60ab5c898ca87b38
Copying blob sha256:077c54d048f1f1a1f28079caa54bf5d99803f937ffe5c1dc6e207698f70b4e74
Copying blob sha256:0368544993b2deeeffdd19463cdf92ec4e39f83073de5de316f9f5c725ab403f
Copying blob sha256:c4cc477c22ba7abce860198107408434dd7bd73ddbaf82f1e69ab941b9979405
Copying blob sha256:bdd06459b13e2a5668e26bbb718e23a634b96dbd660dbcb0fa35bf139aaa475c
Copying blob sha256:32fd3f0602b55d34acc57a3a52990faef9a4a56277f0584180bad0b119101741
Copying blob sha256:a8eda44d10225bfdec8a303e2c7c77390a89014da9f219b7a8f31c0e2a3ff1e3
Copying blob sha256:a71fdc9ffb6d41be07213577c9926cd0e4a5c2d6bff17885f6b4c81fe6bec9dd
Copying blob sha256:b4defd68c5d013a02ed025aee2884809d3d7d065ec2527bcb8101bf4f2caa0c2
Copying blob sha256:800ca419a9280bff8497037f848c65dc3bd74c52f2de347d6ff6d86d2cf932a2
Copying blob sha256:dbd11605289510cd43c5d2ef73c82afa354acaca94a211733ce90013dc39a6c9
Copying blob sha256:8854a35fc76e028106edd61fd7f4879cbc08bc8fe41c7ffd2a599886a2404905
Copying blob sha256:1d5ac5bee8cdeb111ba384a6ab5b3b46ffdacf4b60f7ff7b6cdf5f50c456de3a
Copying blob sha256:034f999a41bce103bd8671272a73895869d124c9329732f7e176b3660ab78907 Process appeared to hang here, I used C:\Users\hwine> podman images
Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM
Error: unable to connect to Podman socket: failed to connect: ssh: handshake failed: read tcp 127.0.0.1:50765->127.0.0.1:53201: wsarecv: An existing connection was forcibly closed by the remote host.
C:\Users\hwine> wsl -l -v
NAME STATE VERSION
* work Stopped 2
rancher-desktop Stopped 2
Ubuntu Stopped 2
podman-machine-default Running 2
C:\Users\hwine> podman images
Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM
Error: unable to connect to Podman socket: failed to connect: ssh: handshake failed: read tcp 127.0.0.1:50783->127.0.0.1:53201: wsarecv: An existing connection was forcibly closed by the remote host.
C:\Users\hwine> podman system connection list
Name URI Identity
Default
podman-machine-default ssh://user@127.0.0.1:53201/run/user/1000/podman/podman.sock C:\Users\hwine\.ssh\podman-machine-default true
podman-machine-default-root ssh://root@127.0.0.1:53201/run/podman/podman.sock C:\Users\hwine\.ssh\podman-machine-default false
C:\Users\hwine> podman machine start
Starting machine "podman-machine-default"
C:\Users\hwine> .\shutdown-wsl.ps1
> date
Thursday, October 5, 2023 20:13:18
> wsl -l -v
NAME STATE VERSION
* work Stopped 2
rancher-desktop Stopped 2
Ubuntu Stopped 2
podman-machine-default Running 2
> date
Thursday, October 5, 2023 20:13:18
> wsl -t work
The operation completed successfully.
> date
Thursday, October 5, 2023 20:13:19
> wsl --shutdown That is the hang-in-wsl. Dump files from this first hang: Hang 2From an admin powershell prompt, attempting to run Powershell terminal output(.venv) C:\Users\hwine\Desktop\plumbum [master ≡ +3 ~1 -0 !]> wsl.exe --debug-shell That is the 2nd hang. Dump files from this second hang: termination messagesUpdate: After posting the dump files, I ran a script I use to terminate the various wsl processes from an admin console. Each of the two hangs produced a different message before returning to the powershell prompt: From hang 1: > date
Thursday, October 5, 2023 20:13:19
> wsl --shutdown
The remote procedure call failed.
Error code: Wsl/RPC_S_CALL_FAILED
> date
Thursday, October 5, 2023 21:15:16 From hang 2: (.venv) C:\Users\hwine\Desktop\plumbum [master ≡ +3 ~1 -0 !]> wsl.exe --debug-shell
No process is on the other end of the pipe.
Error code: Wsl/DebugShell/ERROR_PIPE_NOT_CONNECTED
(.venv) C:\Users\hwine\Desktop\plumbum [master ≡ +3 ~1 -0 !]> |
@OneBlue - can you also re-open, please -- the policy bot thinks I have more chops than I do! 😉 |
Thank you @hwine. This error is very interesting: Because it probably means that the service crashed. Could you set up crash dump collection to catch a dump if this crashes again ? Also, can you have a process in a background running: |
Okay, I've set up crash dump collection (I hope that works for arm64), and will open the In the meantime, I had yet-another hang from |
@OneBlue Another "hang" (I didn't think to check if a new
|
See #11274 (comment) This issue is fixed in 24H2. |
Windows Version
Windows version: 10.0.22631.2048
WSL Version
WSL version: 1.3.11.0
Are you using WSL 1 or WSL 2?
Kernel Version
Kernel version: 5.15.90.2-3
Distro Version
Ubuntu 22.04
Other Software
PSVersion 7.3.6
Device name
Processor Microsoft SQ2 @ 3.15 GHz 3.15 GHz
Installed RAM 16.0 GB (15.6 GB usable)
Device ID
Product ID
System type 64-bit operating system, ARM-based processor
Pen and touch Pen and touch support with 10 touch points
Repro Steps
Tried to terminate running-but-unresponsive wsl2 session from non-admin powershell. (See #9454 for linux side hang info.)
Expected Behavior
wsl.exe
should terminate wsl2 system.Actual Behavior
wsl.exe -t work
hung, and only continued when wsl subsystem terminated from an admin console.Powershell session:
Diagnostic Logs
Logs emailed per instructions
The text was updated successfully, but these errors were encountered: