-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WSL hang, arm64 cpu, not solved by other solutions #9454
Comments
Seconded, I experience the same behavior on my SP X SQ2, sometimes the threads that are running will be responsive for a while, but generally the entire system / WSL instance just freezes. The only way for me to kill WSL in this situation is to forcefully kill the WSL instance ( Considering the entire WSL instance (c.q. VM) does not respond at the time of these crashes, it's not possible to run any command into the system distribution, even if that was opened before the crash happens even. It feels like something is not entirely right with the process forking / VM on this chip, but it's difficult to do any troubleshooting (on the kernel level) as everything is frozen. I've learned to live with this for now, and regularly have to clean up corrupted git repositories or files if this crash happens during a write (or git commit). I have noticed that just walking away from the machine and letting it run for fifteen minutes or so, it often recovers. |
I'm seeing this too - if anything, more frequently than ever. I was on W10 22H2, now on W11 22H2 22621.1105 and it hasn't made any noticeable difference. Surface Pro X SQ2. I previously raised #7913 for what could be the same issue - but I no longer see RCU stalls reported in dmesg. I still see the "locked up" processes consuming 100% CPU according to |
/logs |
Hello! Could you please provide more logs to help us better diagnose your issue? To collect WSL logs, download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:
The scipt will output the path of the log file once done. Once completed please upload the output files to this Github issue. Click here for more info on logging Thank you! |
@hwine , Do you have the debug shell running? You could try the same steps from there? Regarding getting the dumps of init. |
You can enable the debug shell as described here: #7930
|
@pmartincic - thanks for the pointer -- I'm attempting to enable that, but having issues (or don't understand the full sequence). After adding To be clear, I'm attempting to follow the steps from this prior issue, with the
Also, I now have 2 sets of instructions for gathering logs:
I'm assuming only the former logs are being requested. Please correct me if I've misunderstood the process. |
For this to work you have to leave the detached window titled |
I misread your comment. We'll definitely want the logs as specified in /logs. It is not expected that no prompt appears in the debug shell or that the system shell window closes itself. To clarify, both #9454 (comment) and #9114 (comment) will give us information, but both different. It may be simpler for you to use the system shell, assuming you remember to launch it and it leave it open because it has |
Hello! Could you please provide more logs to help us better diagnose your issue? To collect WSL logs, download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:
The scipt will output the path of the log file once done. Once completed please upload the output files to this Github issue. Click here for more info on logging Thank you! |
Same here, if I run the system shell as specified and have the debug shell configured, it just disappears after a couple of seconds. Of course the system shell stays open, but that one freezes up when my main / working instance hangs. I think this is also the behavior that OP is talking about, see video: P.S. same behavior on regular WSL instances (so not the system distro, but e.g. |
Exactly! Thanks @maxboone! (I should learn how to make those videos.)
I do not get a debug shell popping up on my normal instances. 🤷 |
Okay, I can reproduce the issue with the debug shell not working on ARM. |
Okay, I think I have a good set of logs from both collection techniques. Note for the logs collected from an elevated powershell, the process seemed to hang for a bit, and the screen text didn't give me confidence I did it correctly. :/ screen text(.venv) C:\Users\hwine\Desktop\plumbum [master ≡ +8 ~2 -0 !]> Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/WSL/master/diagnostics/collect-wsl-logs.ps1" -OutFile collect-wsl-logs.ps1 (.venv) C:\Users\hwine\Desktop\plumbum [master ≡ +8 ~2 -0 !]> Set-ExecutionPolicy Bypass -Scope Process -Force (.venv) C:\Users\hwine\Desktop\plumbum [master ≡ +8 ~2 -0 !]> .\collect-wsl-logs.ps1 |
Thanks! I'm still working on the debug shell issue at the moment. |
Serendipity -- shortly after reading your response, I noticed an anomaly that might shed some light. I had one of the "WSL Debug Shell" windows display on my screen. What's odd about that is I had closed out of all wsl instances last night ( However:
I've not had luck with WSLg -- issues from a number of months ago suggested the SQ2 chip's graphics chip wasn't supported, so poor performance was expected. I haven't tried WSLg since then. @maxboone are you using WSLg or an X server? |
WSLg natively, but I must say that I don't use it often. |
As I understand it, there's nothing I can do about providing additional logs until @pmartincic resolves the debug shell bug. For whatever reason, the hanging is now occurring more frequently for me. Anecdotally, I am finding typical report[ +0.006416] sd 0:0:0:1: [sdb] Attached SCSI disk [ +0.005168] sd 0:0:0:2: [sdc] Attached SCSI disk [ +0.088459] Adding 2097152k swap on /dev/sdb. Priority:-2 extents:1 across:2097152k [ +0.011115] EXT4-fs (sdc): recovery complete [ +0.002790] EXT4-fs (sdc): mounted filesystem with ordered data mode. Opts: discard,errors=remount-ro,data=ordered> [ +0.077833] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ +0.263242] rsyslog: unrecognized service [Jan25 07:58] hv_balloon: Max. dynamic memory size: 7972 MB [Jan25 08:00] TCP: eth0: Driver has suspect GRO implementation, TCP performance may be compromised. [Jan25 11:32] rcu: INFO: rcu_sched self-detected stall on CPU [ +0.001879] rcu: 5-....: (14999 ticks this GP) idle=2b7/1/0x4000000000000002 softirq=54910/54910 fqs=7336 [ +0.000764] (t=15000 jiffies g=192973 q=27380) [ +0.000512] Task dump for CPU 5: [ +0.000316] task:bash state:R running task stack: 0 pid:20006 ppid: 645 flags:0x0000000a [ +0.000885] Call trace: [ +0.000222] dump_backtrace+0x0/0x1c8 [ +0.001334] show_stack+0x1c/0x28 [ +0.000345] sched_show_task+0x14c/0x180 [ +0.000376] dump_cpu_task+0x48/0x54 [ +0.000546] rcu_dump_cpu_stacks+0xf4/0x138 [ +0.000276] rcu_sched_clock_irq+0x938/0xaa0 [ +0.000479] update_process_times+0x9c/0x2e0 [ +0.000373] tick_sched_handle.isra.0+0x38/0x50 [ +0.000467] tick_sched_timer+0x50/0xa0 [ +0.000274] __hrtimer_run_queues+0x11c/0x328 [ +0.000345] hrtimer_interrupt+0x118/0x300 [ +0.000291] hv_stimer0_isr+0x28/0x30 [ +0.000516] hv_stimer0_percpu_isr+0x14/0x20 [ +0.000431] handle_percpu_devid_irq+0x8c/0x1c0 [ +0.000461] handle_domain_irq+0x64/0x90 [ +0.000345] gic_handle_irq+0xb8/0x128 [ +0.000324] call_on_irq_stack+0x28/0x3c [ +0.000330] do_interrupt_handler+0x54/0x5c [ +0.000397] el1_interrupt+0x2c/0x40 [ +0.000452] el1h_64_irq_handler+0x14/0x20 [ +0.000353] el1h_64_irq+0x74/0x78 [ +0.000319] clear_rseq_cs.isra.0+0x4c/0x60 [ +0.000483] do_notify_resume+0xfc/0x3c0 [ +0.000317] el0_svc+0x3c/0x48 [ +0.000315] el0t_64_sync_handler+0xa8/0xb0 [ +0.000324] el0t_64_sync+0x158/0x15c From scanning the RCU docs and configuration options, I'm hoping that some tuning may reduce the impact of these hangs until a fix is available. Before I start kernel-tuning-by-coincidence (which feels risky), does anyone have any suggestions? Absent any other guidance, my first experiment will be to change |
@hwine , My apologies, I got pre-empted by 9508. After that I can come back to this. |
I experience the same behavior since switching from Intel based laptop to on my SP 9 SQ3. Happens very frequently makes it quite hard to use my wsl for development work. When unresponsive VmmemWSL process has high CPU load. |
Good stuff @hwine! I'll try the same for my instances, where do you plan on tweaking the kernel settings. Considering it's a system-wide (all instances & For the non-system distros, I'll set these tweaks (or create a script that does them, if more tweaks are added) using the following in my
It's not. |
Noted that after switching to visual studio code insiders edition the problem seems to be much less prevalant - now only system sleep seems to make wsl hang. |
Until the debug feature is fixed, I'm assuming that running the
@maxboone -- you are far gutsier than I! An error there could prevent booting, and I don't want the recovery hassle. :) I'm manually running a script after I boot. Based on the docs, the new values "will [affect] the timeout for the next stall". So, as long as the system survives for a minute, I'm good. I'm just activating the changes -- this week had too many fire drills to try "on the clock". |
@craigloewen-msft, any chance to get some focus on this issue? Having purchased a perfectly capable (and expensive) PC on a platform which was very much being positioned as a developer platform at the time (I guess everything is AI nowadays, instead...) just to be forced to use Github Codespaces in order to actually get some work done doesn't feel great... What do you guys need from me to be able to successfully debug this? |
@david-nordvall can you open a new issue for us that describes your specific issue? And can you tag this issue number in it? Lastly, can you get WSL to the point where it's hanging for a long time and then can you collect crash dumps and attach them to the issue? Updated instructions on how to do that can be found here. Thank you! |
@craigloewen-msft No problem, have created issue 10667 and made crash dumps available there. |
|
I am having periodic freezes in WSL2 on a Thinkpad x13s (Qualcomm 8cx Gen 3) ARM64 Windows. I am using the collect-wsl-logs.ps1 script to try to trace what is happening, and I get events in logs.etl like "The description for Event ID 0 from source Microsoft-Windows-Host-Network-Service cannot be found." I have tried editing the /etc/hosts file and turning off regeneration of it, but this issue still persists. PS C:\Users\alexg\wsl> wsl --version It appears there is some sort of event that is not getting properly logged. I hope that the upcoming fix will address this issue. EDIT: 6:46 PM PST: I see on WSL2 reddit that one may need to re-enable the "Virtual Machine Platform" in Windows Features after the November 2023 Windows 11 updates. EDIT: 5:17 PM PST Nov 19 2023: I am still getting the freezes, but less often. EDIT: 5:09 PM PST Nov 22 2023: I should mention I am using VS Code WSL while observing all of these problems. |
Which patch would that be? I'm still experiencing this multiple times per day and have to force-kill wslservice.exe to get it restarted |
@david-nordvall Shame; I'm going to get 23H2 update now and see if that solves anything. Will report back in probably an hour or two. Force-Killing wslservice every so often (sometimes once per hour) is getting very tedious when you want to work on-the-go with your surface. |
@f0o I'm already on 23H2 but no luck, for me at least. But please report back. Once per hour sounds great, actually. If I start a moderately complex development environment in Docker (in WSL2) I can work for about 2-10 minutes before WSL hangs. |
@david-nordvall yikes. I'm also on VSCode with WSL2 -> Docker (DevContainers). It feels more or less random when it crashes. Sometimes it just dies instantly other times it survives an hour or two. Not really running much intensive work, usually just writing code and pushing to git for the CI to do all the heavy lifting of compiling/testing. Hell it even crashes when I just use WSL to ssh into some jumpbox for IRC'ing. I somehow dont think it's workload dependent but simply some interrupt hangup somewhere that ends up stalling the whole thing. |
23H2 deffo changed something because wsl wont start at all anymore 😂 Had to wipe all vms and resetup everything for some reason. Issue still persists. |
Update: I completely wiped my Surface Pro 9 and re-installed Windows (using the restore functionality built in to Windows). The issue still persists :-( |
same issue. ThinkPad x13s gen1. Win11 23H2 on arm64. WSL2 hangs after 20 minutes text editing work using vscode. wsl --shutdown takes about 20 min to complete. |
Thinkpad x13s gen1 -- looks like I found the solution, at least it works for me -- no hangs, no excessive cpu load and survives sleep/resume. I upgraded WSL to pre-release ( wsl --update --pre-release) |
Surface Pro X (SQ1, 23H2, Debian):
WSL version: 2.0.9.0 |
@pmartincic any updates? |
Still hangs after
|
I'm feeling like these RCU stalls are mainly happening for me after I've had my screen locked for a while, or even had my device suspend. Does that rhyme with experience others have? |
My initial experiences seemed to be linked to "fancy terminal usage" (e.g. tmux, nvim). See also #7913, #10667, and #9454. However, I have seen recent hangs with simply normal bash terminal idling, but have not checked if they are RCU related. (Passing that test is now my first "gate" for any wsl/win11 updates that might improve things enough to warrant more in-depth testing.) fwiw, there are some RCU tuning notes in that thread. IIRC, at the time it helped delay freezes, but not prevent them. Maybe they're worth another try now. |
Good call! I do indeed experience them often while using tmux, neovim or vscode (which probably does fancy terminal stuff with it's integrated terminal too). I'm hesitant to draw a relation there though, as I'm using (at least) one of those most of the time I'm working in Linux. I've done that at the time but it only gave (short-term) placebo results for me. I have to note that I've been running some Hyper-V VMs for a couple days now and haven't seen any issues there, however I haven't done a lot of work there either, will try working from a VM there for a while and see if similar issues pop up. I'm currently trying to build a kernel from that VM to see if there might be any cross-compilation shenanigans causing this issue (not that I really expect that). Feel free to try it out too: If that doesn't work, I'll rebuild it with debugging symbols enabled (and see if that adds more information to the RCU call stacks). I'll report back here in time. |
Well it's not much better, but I do feel that it recovers better with a newer kernel (whether local or cross compiled). Moreover, it seems that I can force it to happen by letting my machine go into standby while it's running something that either polls, or does some kind of watching. dmesg[ 3488.823969] rcu: INFO: rcu_sched self-detected stall on CPU [ 3488.825404] rcu: 6-....: (14979 ticks this GP) idle=125c/1/0x4000000000000000 softirq=34649/34649 fqs=7434 [ 3488.826547] rcu: (t=15000 jiffies g=204277 q=1309 ncpus=8) [ 3488.827311] CPU: 6 PID: 2530 Comm: node Not tainted 6.7.7-WSL2-STABLE+ #1 [ 3488.827430] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 3488.827486] pc : clear_rseq_cs.isra.0+0x20/0x38 [ 3488.828123] lr : __rseq_handle_notify_resume+0x174/0x498 [ 3488.828126] sp : ffff80008338bc80 [ 3488.828127] x29: ffff80008338bcd0 x28: 000000000000080c x27: 0000000000000000 [ 3488.828130] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [ 3488.828132] x23: 0000ff03322cb6d8 x22: 0000000000000000 x21: 00000000ffffffff [ 3488.828135] x20: ffff80008338beb0 x19: ffff000194cde740 x18: 0000000000000000 [ 3488.828137] x17: 0000000000000000 x16: 0000000000000000 x15: 0000fffa2c60c800 [ 3488.828139] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 3488.828141] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800080258d8c [ 3488.828143] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 3488.828145] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000fffffffffff8 [ 3488.828147] x2 : 0000000000000000 x1 : 0000ff0330e0f7e8 x0 : 0000000000000000 [ 3488.828174] Call trace: [ 3488.828200] clear_rseq_cs.isra.0+0x20/0x38 [ 3488.828203] do_notify_resume+0x1d0/0xe70 [ 3488.828286] el0_svc+0x90/0xb0 [ 3488.828411] el0t_64_sync_handler+0x138/0x148 [ 3488.828413] el0t_64_sync+0x14c/0x150 [ 3557.107643] br-1951cbf8ee7a: port 4(vetha101994) entered disabled state [ 3557.108802] vethf736b24: renamed from eth0 [ 3557.172825] br-1951cbf8ee7a: port 4(vetha101994) entered disabled state [ 3557.175231] vetha101994 (unregistering): left allmulticast mode [ 3557.175262] vetha101994 (unregistering): left promiscuous mode [ 3557.175268] br-1951cbf8ee7a: port 4(vetha101994) entered disabled state [ 3829.870764] br-1951cbf8ee7a: port 4(veth2d06f9f) entered blocking state [ 3829.870849] br-1951cbf8ee7a: port 4(veth2d06f9f) entered disabled state [ 3829.870944] veth2d06f9f: entered allmulticast mode [ 3829.871296] veth2d06f9f: entered promiscuous mode [ 3830.501386] eth0: renamed from veth268a22a [ 3830.534499] br-1951cbf8ee7a: port 4(veth2d06f9f) entered blocking state [ 3830.534511] br-1951cbf8ee7a: port 4(veth2d06f9f) entered forwarding state [ 4110.229251] rcu: INFO: rcu_sched self-detected stall on CPU [ 4110.230914] rcu: 3-....: (14990 ticks this GP) idle=9894/1/0x4000000000000000 softirq=73639/73639 fqs=6232 [ 4110.232326] rcu: (t=15001 jiffies g=219869 q=1843 ncpus=8) [ 4110.233456] CPU: 3 PID: 353 Comm: cron Not tainted 6.7.7-WSL2-STABLE+ #1 [ 4110.233462] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 4110.233464] pc : do_notify_resume+0x348/0xe70 [ 4110.233742] lr : do_notify_resume+0x2c8/0xe70 [ 4110.233744] sp : ffff8000834abd30 [ 4110.233745] x29: ffff8000834abe20 x28: 000000000000000d x27: 0000000000000000 [ 4110.233748] x26: 0000ffffcc8272f0 x25: 0000000000000000 x24: ffff000000c0d130 [ 4110.233751] x23: ffff8000834abdc0 x22: 0000000000000041 x21: 0000ffffcc827270 [ 4110.233753] x20: ffff000000c0c9c0 x19: ffff8000834abeb0 x18: 0000000000000000 [ 4110.233756] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000834abcd8 [ 4110.233758] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 4110.233760] x11: 0000000000000000 x10: 0000000000000ac0 x9 : ffff80008001b4f8 [ 4110.233762] x8 : ffff8000834abdb8 x7 : 0000000000000000 x6 : 0000000000000000 [ 4110.233765] x5 : 0001000000000000 x4 : 0000000000001250 x3 : 0000ffffcc829720 [ 4110.233767] x2 : 0000ffffcc8284c0 x1 : 0000ffffcc8284d0 x0 : 0000ffffcc8272f0 [ 4110.233770] Call trace: [ 4110.233797] do_notify_resume+0x348/0xe70 [ 4110.233801] el0_svc+0x90/0xb0 [ 4110.233921] el0t_64_sync_handler+0x138/0x148 [ 4110.233923] el0t_64_sync+0x14c/0x150 [ 5001.903196] rcu: INFO: rcu_sched self-detected stall on CPU [ 5001.905317] rcu: 6-....: (14975 ticks this GP) idle=595c/1/0x4000000000000000 softirq=44771/44771 fqs=6495 [ 5001.906636] rcu: (t=15001 jiffies g=237549 q=18838 ncpus=8) [ 5001.907838] CPU: 6 PID: 1321 Comm: nvim Not tainted 6.7.7-WSL2-STABLE+ #1 [ 5001.907843] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 5001.907845] pc : do_epoll_wait.part.0+0x2ec/0x6f8 [ 5001.908261] lr : do_epoll_wait.part.0+0x238/0x6f8 [ 5001.908262] sp : ffff80008470bca0 [ 5001.908291] x29: ffff80008470bd40 x28: ffff000055332600 x27: ffff000055332618 [ 5001.908294] x26: 0000fffffffffffc x25: ffff0000193b3dd0 x24: 0000000000000400 [ 5001.908297] x23: 0000ffffc0ebd4a8 x22: ffff80008470bcf0 x21: 0000000000000000 [ 5001.908299] x20: ffff0000193b3d80 x19: ffff80008470bcd8 x18: 0000000000000000 [ 5001.908302] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 5001.908304] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 5001.908333] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800080b5ce1c [ 5001.908335] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 5001.908337] x5 : 0000ffffc0ebd4a8 x4 : 0000000000000000 x3 : 0000000000000000 [ 5001.908339] x2 : 000000000000001b x1 : 0000000000000000 x0 : 0000000000000004 [ 5001.908342] Call trace: [ 5001.908370] do_epoll_wait.part.0+0x2ec/0x6f8 [ 5001.908373] do_epoll_pwait.part.0+0x38/0x120 [ 5001.908375] __arm64_sys_epoll_pwait+0x78/0x138 [ 5001.908376] invoke_syscall.constprop.0+0x54/0x128 [ 5001.908473] do_el0_svc+0x44/0xf0 [ 5001.908474] el0_svc+0x24/0xb0 [ 5001.908595] el0t_64_sync_handler+0x138/0x148 [ 5001.908597] el0t_64_sync+0x14c/0x150 [ 5168.075433] rcu: INFO: rcu_sched self-detected stall on CPU [ 5168.088496] rcu: 6-....: (59979 ticks this GP) idle=595c/1/0x4000000000000000 softirq=44771/44771 fqs=25910 [ 5168.091110] rcu: (t=60010 jiffies g=237549 q=65516 ncpus=8) [ 5168.092512] CPU: 6 PID: 1321 Comm: nvim Not tainted 6.7.7-WSL2-STABLE+ #1 [ 5168.092518] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 5168.092521] pc : do_epoll_wait.part.0+0x2ec/0x6f8 [ 5168.092529] lr : do_epoll_wait.part.0+0x238/0x6f8 [ 5168.092530] sp : ffff80008470bca0 [ 5168.092531] x29: ffff80008470bd40 x28: ffff000055332600 x27: ffff000055332618 [ 5168.092535] x26: 0000fffffffffffc x25: ffff0000193b3dd0 x24: 0000000000000400 [ 5168.092537] x23: 0000ffffc0ebd4a8 x22: ffff80008470bcf0 x21: 0000000000000000 [ 5168.092540] x20: ffff0000193b3d80 x19: ffff80008470bcd8 x18: 0000000000000000 [ 5168.092542] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 5168.092544] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 5168.092547] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800080b5ce1c [ 5168.092549] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 5168.092552] x5 : 0000ffffc0ebd4a8 x4 : 0000000000000000 x3 : 0000000000000000 [ 5168.092554] x2 : 000000000000001b x1 : 0000000000000000 x0 : 0000000000000004 [ 5168.092556] Call trace: [ 5168.092558] do_epoll_wait.part.0+0x2ec/0x6f8 [ 5168.092560] do_epoll_pwait.part.0+0x38/0x120 [ 5168.092562] __arm64_sys_epoll_pwait+0x78/0x138 [ 5168.092564] invoke_syscall.constprop.0+0x54/0x128 [ 5168.092570] do_el0_svc+0x44/0xf0 [ 5168.092572] el0_svc+0x24/0xb0 [ 5168.092576] el0t_64_sync_handler+0x138/0x148 [ 5168.092578] el0t_64_sync+0x14c/0x150 [ 5334.258791] rcu: INFO: rcu_sched self-detected stall on CPU [ 5334.260000] rcu: 6-....: (104955 ticks this GP) idle=595c/1/0x4000000000000000 softirq=44771/44771 fqs=44963 [ 5334.261527] rcu: (t=105014 jiffies g=237549 q=112023 ncpus=8) [ 5334.262403] CPU: 6 PID: 1321 Comm: nvim Not tainted 6.7.7-WSL2-STABLE+ #1 [ 5334.262408] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 5334.262410] pc : do_epoll_wait.part.0+0x2ec/0x6f8 [ 5334.262464] lr : do_epoll_wait.part.0+0x238/0x6f8 [ 5334.262465] sp : ffff80008470bca0 [ 5334.262466] x29: ffff80008470bd40 x28: ffff000055332600 x27: ffff000055332618 [ 5334.262469] x26: 0000fffffffffffc x25: ffff0000193b3dd0 x24: 0000000000000400 [ 5334.262472] x23: 0000ffffc0ebd4a8 x22: ffff80008470bcf0 x21: 0000000000000000 [ 5334.262474] x20: ffff0000193b3d80 x19: ffff80008470bcd8 x18: 0000000000000000 [ 5334.262477] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 5334.262479] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 5334.262481] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800080b5ce1c [ 5334.262484] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 5334.262486] x5 : 0000ffffc0ebd4a8 x4 : 0000000000000000 x3 : 0000000000000000 [ 5334.262519] x2 : 000000000000001b x1 : 0000000000000000 x0 : 0000000000000004 [ 5334.262522] Call trace: [ 5334.262524] do_epoll_wait.part.0+0x2ec/0x6f8 [ 5334.262526] do_epoll_pwait.part.0+0x38/0x120 [ 5334.262528] __arm64_sys_epoll_pwait+0x78/0x138 [ 5334.262529] invoke_syscall.constprop.0+0x54/0x128 [ 5334.262533] do_el0_svc+0x44/0xf0 [ 5334.262535] el0_svc+0x24/0xb0 [ 5334.262538] el0t_64_sync_handler+0x138/0x148 [ 5334.262540] el0t_64_sync+0x14c/0x150 [ 5438.715315] systemd-journald: systemd-journal: potentially unexpected fatal signal 6. [ 5438.715470] CPU: 2 PID: 195 Comm: systemd-journal Not tainted 6.7.7-WSL2-STABLE+ #1 [ 5438.715482] pstate: 80001000 (Nzcv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--) [ 5438.715489] pc : 0000fff48dec9df8 [ 5438.715492] lr : 0000fff48dec9dd8 [ 5438.715557] sp : 0000fffff35eee70 [ 5438.715560] x29: 0000fffff35eee70 x28: 0000fffff35ef4d0 x27: 0000fffff35ef110 [ 5438.715571] x26: 0000fffff35eef08 x25: 000000000000004b x24: 0000000000000000 [ 5438.715579] x23: 0000000000000000 x22: 0000000000000000 x21: 0000fff48c20eff0 [ 5438.715588] x20: 0000000000001a99 x19: 0000000000000109 x18: 0000000000000000 [ 5438.715600] x17: 0000fff48deceda0 x16: 0000000000000001 x15: ab4d18e2501f5b66 [ 5438.715610] x14: 000000000038fe48 x13: 084c9107e1e25704 x12: 000000000038fda0 [ 5438.715618] x11: b8e208097994f7d5 x10: 000000000047fa48 x9 : 0000000000000000 [ 5438.715627] x8 : 0000000000000062 x7 : 0000000000000000 x6 : 0000000000000000 [ 5438.715635] x5 : 00000000ffffffff x4 : 0000000000000000 x3 : 0000000000000000 [ 5438.715643] x2 : 0000000000001a99 x1 : 0000000000000109 x0 : 0000fff48c20eff0 [ 5500.427396] rcu: INFO: rcu_sched self-detected stall on CPU [ 5500.428731] rcu: 6-....: (149957 ticks this GP) idle=595c/1/0x4000000000000000 softirq=44771/44771 fqs=64339 [ 5500.430529] rcu: (t=150018 jiffies g=237549 q=158161 ncpus=8) [ 5500.431664] CPU: 6 PID: 1321 Comm: nvim Not tainted 6.7.7-WSL2-STABLE+ #1 [ 5500.431675] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 5500.431682] pc : do_epoll_wait.part.0+0x2ec/0x6f8 [ 5500.431696] lr : do_epoll_wait.part.0+0x238/0x6f8 [ 5500.431701] sp : ffff80008470bca0 [ 5500.431704] x29: ffff80008470bd40 x28: ffff000055332600 x27: ffff000055332618 [ 5500.431714] x26: 0000fffffffffffc x25: ffff0000193b3dd0 x24: 0000000000000400 [ 5500.431819] x23: 0000ffffc0ebd4a8 x22: ffff80008470bcf0 x21: 0000000000000000 [ 5500.431828] x20: ffff0000193b3d80 x19: ffff80008470bcd8 x18: 0000000000000000 [ 5500.431837] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 5500.431846] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 5500.431854] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800080b5ce1c [ 5500.431862] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 5500.431870] x5 : 0000ffffc0ebd4a8 x4 : 0000000000000000 x3 : 0000000000000000 [ 5500.431879] x2 : 000000000000001b x1 : 0000000000000000 x0 : 0000000000000004 [ 5500.431888] Call trace: [ 5500.431891] do_epoll_wait.part.0+0x2ec/0x6f8 [ 5500.431899] do_epoll_pwait.part.0+0x38/0x120 [ 5500.431904] __arm64_sys_epoll_pwait+0x78/0x138 [ 5500.431909] invoke_syscall.constprop.0+0x54/0x128 [ 5500.431917] do_el0_svc+0x44/0xf0 [ 5500.431923] el0_svc+0x24/0xb0 [ 5500.431929] el0t_64_sync_handler+0x138/0x148 [ 5500.431935] el0t_64_sync+0x14c/0x150 [ 5666.599687] rcu: INFO: rcu_sched self-detected stall on CPU [ 5666.601114] rcu: 6-....: (194955 ticks this GP) idle=595c/1/0x4000000000000000 softirq=44771/44771 fqs=83882 [ 5666.602695] rcu: (t=195023 jiffies g=237549 q=203648 ncpus=8) [ 5666.603842] CPU: 6 PID: 1321 Comm: nvim Not tainted 6.7.7-WSL2-STABLE+ #1 [ 5666.603853] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 5666.603860] pc : do_epoll_wait.part.0+0x2ec/0x6f8 [ 5666.603874] lr : do_epoll_wait.part.0+0x238/0x6f8 [ 5666.603879] sp : ffff80008470bca0 [ 5666.603882] x29: ffff80008470bd40 x28: ffff000055332600 x27: ffff000055332618 [ 5666.603893] x26: 0000fffffffffffc x25: ffff0000193b3dd0 x24: 0000000000000400 [ 5666.603903] x23: 0000ffffc0ebd4a8 x22: ffff80008470bcf0 x21: 0000000000000000 [ 5666.603911] x20: ffff0000193b3d80 x19: ffff80008470bcd8 x18: 0000000000000000 [ 5666.604048] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 5666.604057] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 5666.604066] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800080b5ce1c [ 5666.604074] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 5666.604082] x5 : 0000ffffc0ebd4a8 x4 : 0000000000000000 x3 : 0000000000000000 [ 5666.604091] x2 : 000000000000001b x1 : 0000000000000000 x0 : 0000000000000004 [ 5666.604100] Call trace: [ 5666.604104] do_epoll_wait.part.0+0x2ec/0x6f8 [ 5666.604110] do_epoll_pwait.part.0+0x38/0x120 [ 5666.604115] __arm64_sys_epoll_pwait+0x78/0x138 [ 5666.604122] invoke_syscall.constprop.0+0x54/0x128 [ 5666.604131] do_el0_svc+0x44/0xf0 [ 5666.604138] el0_svc+0x24/0xb0 [ 5666.604145] el0t_64_sync_handler+0x138/0x148 [ 5666.604151] el0t_64_sync+0x14c/0x150 [ 5832.768305] rcu: INFO: rcu_sched self-detected stall on CPU [ 5832.769532] rcu: 6-....: (239703 ticks this GP) idle=595c/1/0x4000000000000000 softirq=44771/44771 fqs=85733 [ 5832.771313] rcu: (t=240027 jiffies g=237549 q=208202 ncpus=8) [ 5832.773553] rcu: rcu_sched kthread starved for 40626 jiffies! g237549 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=3 [ 5832.775413] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 5832.776782] rcu: RCU grace-period kthread stack dump: [ 5832.777585] task:rcu_sched state:R running task stack:0 pid:16 tgid:16 ppid:2 flags:0x00000008 [ 5832.777697] Call trace: [ 5832.777700] __switch_to+0xa4/0xe0 [ 5832.777717] __schedule+0x360/0xd78 [ 5832.777723] schedule+0x2c/0x110 [ 5832.777729] schedule_timeout+0xa0/0x1a8 [ 5832.777797] rcu_gp_fqs_loop+0xf0/0x4c0 [ 5832.777911] rcu_gp_kthread+0x11c/0x158 [ 5832.777917] kthread+0xe0/0xf0 [ 5832.777980] ret_from_fork+0x10/0x20 [ 5832.777988] rcu: Stack dump where RCU GP kthread last ran: [ 5832.778872] Sending NMI from CPU 6 to CPUs 3: [ 5832.778926] NMI backtrace for cpu 3 [ 5832.778939] CPU: 3 PID: 430 Comm: containerd Not tainted 6.7.7-WSL2-STABLE+ #1 [ 5832.778947] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 5832.778953] pc : clear_rseq_cs.isra.0+0x20/0x38 [ 5832.779136] lr : __rseq_handle_notify_resume+0x174/0x498 [ 5832.779141] sp : ffff80008344bc80 [ 5832.779144] x29: ffff80008344bcd0 x28: 000000000000000c x27: 0000000000000000 [ 5832.779154] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [ 5832.779163] x23: 0000ab1793fcfe20 x22: 0000000000000000 x21: 00000000ffffffff [ 5832.779171] x20: ffff80008344beb0 x19: ffff000004610000 x18: 0000000000000000 [ 5832.779180] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 5832.779189] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 5832.779197] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800080258d8c [ 5832.779205] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 5832.779213] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000fffffffffff8 [ 5832.779221] x2 : 0000000000000000 x1 : 0000ffff88c0f8a8 x0 : 0000000000000000 [ 5832.779230] Call trace: [ 5832.779233] clear_rseq_cs.isra.0+0x20/0x38 [ 5832.779238] do_notify_resume+0x1d0/0xe70 [ 5832.779246] el0_svc+0x90/0xb0 [ 5832.779251] el0t_64_sync_handler+0x138/0x148 [ 5832.779256] el0t_64_sync+0x14c/0x150 [ 5832.779918] CPU: 6 PID: 1321 Comm: nvim Not tainted 6.7.7-WSL2-STABLE+ #1 [ 5832.779922] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 5832.779924] pc : do_epoll_wait.part.0+0x2ec/0x6f8 [ 5832.779929] lr : do_epoll_wait.part.0+0x238/0x6f8 [ 5832.779930] sp : ffff80008470bca0 [ 5832.779931] x29: ffff80008470bd40 x28: ffff000055332600 x27: ffff000055332618 [ 5832.779934] x26: 0000fffffffffffc x25: ffff0000193b3dd0 x24: 0000000000000400 [ 5832.779937] x23: 0000ffffc0ebd4a8 x22: ffff80008470bcf0 x21: 0000000000000000 [ 5832.779939] x20: ffff0000193b3d80 x19: ffff80008470bcd8 x18: 0000000000000000 [ 5832.779974] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 5832.779976] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 5832.780009] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800080b5ce1c [ 5832.780012] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 5832.780014] x5 : 0000ffffc0ebd4a8 x4 : 0000000000000000 x3 : 0000000000000000 [ 5832.780015] x2 : 000000000000001b x1 : 0000000000000000 x0 : 0000000000000004 [ 5832.780018] Call trace: [ 5832.780019] do_epoll_wait.part.0+0x2ec/0x6f8 [ 5832.780021] do_epoll_pwait.part.0+0x38/0x120 [ 5832.780022] __arm64_sys_epoll_pwait+0x78/0x138 [ 5832.780024] invoke_syscall.constprop.0+0x54/0x128 [ 5832.780027] do_el0_svc+0x44/0xf0 [ 5832.780029] el0_svc+0x24/0xb0 [ 5832.780031] el0t_64_sync_handler+0x138/0x148 [ 5832.780032] el0t_64_sync+0x14c/0x150 [ 5906.221529] systemd-journald[6821]: File /var/log/journal/4f2f70649e9e4776a9f1716c9737e494/system.journal corrupted or uncleanly shut down, renaming and replacing. [ 6307.581091] rcu: INFO: rcu_sched self-detected stall on CPU [ 6307.582514] rcu: 4-....: (15001 ticks this GP) idle=a414/1/0x4000000000000000 softirq=46243/46243 fqs=6470 [ 6307.583963] rcu: (t=15001 jiffies g=249165 q=3624 ncpus=8) [ 6307.584810] CPU: 4 PID: 6088 Comm: zsh Not tainted 6.7.7-WSL2-STABLE+ #1 [ 6307.584816] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 6307.584818] pc : __arch_copy_to_user+0xbc/0x240 [ 6307.585195] lr : tty_ioctl+0x200/0xa20 [ 6307.585422] sp : ffff800084c4bc90 [ 6307.585423] x29: ffff800084c4bd10 x28: ffff0000006be740 x27: 0000000000000000 [ 6307.585426] x26: 0000000000000000 x25: ffff00001904f000 x24: 0000ab36ba6c1a74 [ 6307.585429] x23: 0000000000005413 x22: ffff800080b54ac0 x21: ffff000020dffc00 [ 6307.585432] x20: ffff000020dffc00 x19: ffff00001904f0e8 x18: 0000000000000000 [ 6307.585434] x17: 0000000000000000 x16: 0000000000000000 x15: ffff00001904f1b0 [ 6307.585436] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 6307.585438] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800080b54fd0 [ 6307.585441] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000ab36ba6c1a74 [ 6307.585443] x5 : 0000ab36ba6c1a7c x4 : 0000000000000000 x3 : 000000000094002e [ 6307.585446] x2 : 0000000000000008 x1 : ffff00001904f1b8 x0 : 0000ab36ba6c1a74 [ 6307.585448] Call trace: [ 6307.585476] __arch_copy_to_user+0xbc/0x240 [ 6307.585480] __arm64_sys_ioctl+0x38c/0xa70 [ 6307.585696] invoke_syscall.constprop.0+0x54/0x128 [ 6307.585793] do_el0_svc+0x44/0xf0 [ 6307.585795] el0_svc+0x24/0xb0 [ 6307.585821] el0t_64_sync_handler+0x138/0x148 [ 6307.585823] el0t_64_sync+0x14c/0x150 [ 6473.753316] rcu: INFO: rcu_sched self-detected stall on CPU [ 6473.754448] rcu: 4-....: (59932 ticks this GP) idle=a414/1/0x4000000000000000 softirq=46243/46243 fqs=25367 [ 6473.756284] rcu: (t=60006 jiffies g=249165 q=9163 ncpus=8) [ 6473.757194] CPU: 4 PID: 6088 Comm: zsh Not tainted 6.7.7-WSL2-STABLE+ #1 [ 6473.757247] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 6473.757249] pc : __arch_copy_to_user+0xbc/0x240 [ 6473.757260] lr : tty_ioctl+0x200/0xa20 [ 6473.757266] sp : ffff800084c4bc90 [ 6473.757267] x29: ffff800084c4bd10 x28: ffff0000006be740 x27: 0000000000000000 [ 6473.757271] x26: 0000000000000000 x25: ffff00001904f000 x24: 0000ab36ba6c1a74 [ 6473.757273] x23: 0000000000005413 x22: ffff800080b54ac0 x21: ffff000020dffc00 [ 6473.757276] x20: ffff000020dffc00 x19: ffff00001904f0e8 x18: 0000000000000000 [ 6473.757278] x17: 0000000000000000 x16: 0000000000000000 x15: ffff00001904f1b0 [ 6473.757280] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 6473.757283] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800080b54fd0 [ 6473.757313] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000ab36ba6c1a74 [ 6473.757315] x5 : 0000ab36ba6c1a7c x4 : 0000000000000000 x3 : 000000000094002e [ 6473.757318] x2 : 0000000000000008 x1 : ffff00001904f1b8 x0 : 0000ab36ba6c1a74 [ 6473.757320] Call trace: [ 6473.757322] __arch_copy_to_user+0xbc/0x240 [ 6473.757325] __arm64_sys_ioctl+0x38c/0xa70 [ 6473.757329] invoke_syscall.constprop.0+0x54/0x128 [ 6473.757333] do_el0_svc+0x44/0xf0 [ 6473.757335] el0_svc+0x24/0xb0 [ 6473.757336] el0t_64_sync_handler+0x138/0x148 [ 6473.757338] el0t_64_sync+0x14c/0x150 [ 6530.005623] rcu: INFO: rcu_sched self-detected stall on CPU [ 6530.006820] rcu: 2-....: (15000 ticks this GP) idle=4094/1/0x4000000000000000 softirq=61611/61611 fqs=6377 [ 6530.008333] rcu: (t=15001 jiffies g=249169 q=11009 ncpus=8) [ 6530.009444] CPU: 2 PID: 361 Comm: systemd-logind Not tainted 6.7.7-WSL2-STABLE+ #1 [ 6530.009448] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 6530.009450] pc : __arch_copy_to_user+0x60/0x240 [ 6530.009461] lr : inotify_read+0x260/0x3d8 [ 6530.009598] sp : ffff800082c4bc70 [ 6530.009599] x29: ffff800082c4bce0 x28: ffff0000049fe740 x27: 0000000000000010 [ 6530.009603] x26: ffff000004cbd220 x25: ffff00001a318e60 x24: 0000000000000110 [ 6530.009605] x23: 0000000000000001 x22: 0000ab287a102ad0 x21: ffff000004cbd200 [ 6530.009608] x20: ffff000004cbd20c x19: 0000ab287a102ad0 x18: 0000000000000000 [ 6530.009610] x17: 0000000000000000 x16: 0000000000000000 x15: ffff800082c4bcc8 [ 6530.009612] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 6530.009615] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000803a05b4 [ 6530.009617] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000ab287a102ad0 [ 6530.009619] x5 : 0000ab287a102ae0 x4 : 0000000000000008 x3 : 0000000200000001 [ 6530.009622] x2 : 0000000000000008 x1 : ffff800082c4bcd0 x0 : 0000ab287a102ad0 [ 6530.009625] Call trace: [ 6530.009626] __arch_copy_to_user+0x60/0x240 [ 6530.009629] vfs_read+0xb0/0x280 [ 6530.009663] ksys_read+0x74/0x128 [ 6530.009664] __arm64_sys_read+0x20/0x40 [ 6530.009666] invoke_syscall.constprop.0+0x54/0x128 [ 6530.009669] do_el0_svc+0xcc/0xf0 [ 6530.009701] el0_svc+0x24/0xb0 [ 6530.009703] el0t_64_sync_handler+0x138/0x148 [ 6530.009705] el0t_64_sync+0x14c/0x150 [ 6530.009708] Sending NMI from CPU 2 to CPUs 5: [ 6530.009743] NMI backtrace for cpu 5 [ 6530.009747] CPU: 5 PID: 5120 Comm: node Not tainted 6.7.7-WSL2-STABLE+ #1 [ 6530.009749] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 6530.009752] pc : clear_rseq_cs.isra.0+0x20/0x38 [ 6530.009815] lr : __rseq_handle_notify_resume+0x174/0x498 [ 6530.009817] sp : ffff800089cebc80 [ 6530.009818] x29: ffff800089cebcd0 x28: 000000000000000c x27: 0000000000000000 [ 6530.009821] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [ 6530.009823] x23: 0000ff75f82c9dfc x22: 0000000000000000 x21: 00000000ffffffff [ 6530.009826] x20: ffff800089cebeb0 x19: ffff000011cfe740 x18: 0000000000000000 [ 6530.009828] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ff6665a0c780 [ 6530.009831] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 6530.009833] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800080258d8c [ 6530.009835] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 6530.009837] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000fffffffffff8 [ 6530.009840] x2 : 0000000000000000 x1 : 0000ff75f6e0f7e8 x0 : 0000000000000000 [ 6530.009842] Call trace: [ 6530.009843] clear_rseq_cs.isra.0+0x20/0x38 [ 6530.009845] do_notify_resume+0x1d0/0xe70 [ 6530.009850] el0_svc+0x90/0xb0 [ 6530.009852] el0t_64_sync_handler+0x138/0x148 [ 6530.009853] el0t_64_sync+0x14c/0x150 [ 6696.177931] rcu: INFO: rcu_sched self-detected stall on CPU [ 6696.179032] rcu: 5-....: (60005 ticks this GP) idle=3734/1/0x4000000000000000 softirq=60589/60589 fqs=25771 [ 6696.180732] rcu: (t=60006 jiffies g=249169 q=15239 ncpus=8) [ 6696.181856] Sending NMI from CPU 5 to CPUs 2: [ 6696.181867] NMI backtrace for cpu 2 [ 6696.181871] CPU: 2 PID: 361 Comm: systemd-logind Not tainted 6.7.7-WSL2-STABLE+ #1 [ 6696.181875] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 6696.181877] pc : __arch_copy_to_user+0x60/0x240 [ 6696.181888] lr : inotify_read+0x260/0x3d8 [ 6696.181896] sp : ffff800082c4bc70 [ 6696.181897] x29: ffff800082c4bce0 x28: ffff0000049fe740 x27: 0000000000000010 [ 6696.181900] x26: ffff000004cbd220 x25: ffff00001a318e60 x24: 0000000000000110 [ 6696.181903] x23: 0000000000000001 x22: 0000ab287a102ad0 x21: ffff000004cbd200 [ 6696.181905] x20: ffff000004cbd20c x19: 0000ab287a102ad0 x18: 0000000000000000 [ 6696.181907] x17: 0000000000000000 x16: 0000000000000000 x15: ffff800082c4bcc8 [ 6696.181960] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 6696.181962] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000803a05b4 [ 6696.181964] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000ab287a102ad0 [ 6696.181966] x5 : 0000ab287a102ae0 x4 : 0000000000000008 x3 : 0000000200000001 [ 6696.181968] x2 : 0000000000000008 x1 : ffff800082c4bcd0 x0 : 0000ab287a102ad0 [ 6696.181971] Call trace: [ 6696.182001] __arch_copy_to_user+0x60/0x240 [ 6696.182005] vfs_read+0xb0/0x280 [ 6696.182008] ksys_read+0x74/0x128 [ 6696.182009] __arm64_sys_read+0x20/0x40 [ 6696.182011] invoke_syscall.constprop.0+0x54/0x128 [ 6696.182015] do_el0_svc+0xcc/0xf0 [ 6696.182017] el0_svc+0x24/0xb0 [ 6696.182019] el0t_64_sync_handler+0x138/0x148 [ 6696.182021] el0t_64_sync+0x14c/0x150 [ 6696.182865] CPU: 5 PID: 5120 Comm: node Not tainted 6.7.7-WSL2-STABLE+ #1 [ 6696.182868] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 6696.182870] pc : clear_rseq_cs.isra.0+0x20/0x38 [ 6696.182874] lr : __rseq_handle_notify_resume+0x174/0x498 [ 6696.182876] sp : ffff800089cebc80 [ 6696.182877] x29: ffff800089cebcd0 x28: 000000000000000c x27: 0000000000000000 [ 6696.182880] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [ 6696.182883] x23: 0000ff75f82c9dfc x22: 0000000000000000 x21: 00000000ffffffff [ 6696.182885] x20: ffff800089cebeb0 x19: ffff000011cfe740 x18: 0000000000000000 [ 6696.182888] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ff6665a0c780 [ 6696.182890] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 6696.182892] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800080258d8c [ 6696.182895] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 6696.182897] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000fffffffffff8 [ 6696.182900] x2 : 0000000000000000 x1 : 0000ff75f6e0f7e8 x0 : 0000000000000000 [ 6696.182902] Call trace: [ 6696.182903] clear_rseq_cs.isra.0+0x20/0x38 [ 6696.182905] do_notify_resume+0x1d0/0xe70 [ 6696.182909] el0_svc+0x90/0xb0 [ 6696.182910] el0t_64_sync_handler+0x138/0x148 [ 6696.182912] el0t_64_sync+0x14c/0x150 |
Yes, if my computer is suspended, I encounter this issue 100 % of the times. It is, however, far from the only situation I encounter it. My use case is that I run dev containers in Docker with VS Code (typically C#/.NET + Node frontend stuff) and my experience is that if I work on really simple things (one project, a couple of source files) I cane work maybe an hour before issues (stalls, slowdowns) start to be a real problem. But if I work on more complex stuff (5-10 projects that compile in parallel, tens of thousands of lines of code) I can barely get VS Code to open and activate all extensions before WSL (and the dev container) is completely unusable. |
I can not seem to reproduce the problem with Hyper-V running stock Ubuntu 23.10, and I'm not sure what the real difference there is other than that WSL uses a different kernel. It looks like there are some differences between the CPU that the virtual machine uses, the caches and listed extensions differ. CPU info of WSL2 VM➜ ~ dmesg | grep CPU [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x51df804e] [ 0.000000] Detected VIPT I-cache on CPU0 [ 0.000000] CPU features: detected: GIC system register CPU interface [ 0.000000] CPU features: kernel page table isolation disabled by kernel configuration [ 0.000000] CPU features: detected: ARM errata 1165522, 1319367, or 1530923 [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=8. [ 0.000000] GICv3: CPU0: found redistributor 0 region 0:0x00000000effee000 [ 0.294328] smp: Bringing up secondary CPUs ... [ 0.307988] CPU features: detected: Spectre-v2 [ 0.307995] CPU features: detected: Spectre-v4 [ 0.308039] Detected VIPT I-cache on CPU1 [ 0.308101] GICv3: CPU1: found redistributor 1 region 1:0x00000000f000e000 [ 0.308201] CPU1: Booted secondary processor 0x0000000001 [0x51df804e] [ 0.308540] Detected VIPT I-cache on CPU2 [ 0.308572] GICv3: CPU2: found redistributor 2 region 2:0x00000000f002e000 [ 0.308661] CPU2: Booted secondary processor 0x0000000002 [0x51df804e] [ 0.308961] Detected VIPT I-cache on CPU3 [ 0.308999] GICv3: CPU3: found redistributor 3 region 3:0x00000000f004e000 [ 0.309087] CPU3: Booted secondary processor 0x0000000003 [0x51df804e] [ 0.309389] Detected VIPT I-cache on CPU4 [ 0.309431] GICv3: CPU4: found redistributor 4 region 4:0x00000000f006e000 [ 0.309518] CPU4: Booted secondary processor 0x0000000004 [0x51df804e] [ 0.309825] Detected VIPT I-cache on CPU5 [ 0.309874] GICv3: CPU5: found redistributor 5 region 5:0x00000000f008e000 [ 0.309960] CPU5: Booted secondary processor 0x0000000005 [0x51df804e] [ 0.310248] Detected VIPT I-cache on CPU6 [ 0.310302] GICv3: CPU6: found redistributor 6 region 6:0x00000000f00ae000 [ 0.310388] CPU6: Booted secondary processor 0x0000000006 [0x51df804e] [ 0.311743] Detected VIPT I-cache on CPU7 [ 0.311804] GICv3: CPU7: found redistributor 7 region 7:0x00000000f00ce000 [ 0.311890] CPU7: Booted secondary processor 0x0000000007 [0x51df804e] [ 0.313021] smp: Brought up 1 node, 8 CPUs [ 0.644306] CPU features: detected: 32-bit EL0 Support [ 0.657151] CPU features: detected: CRC32 instructions [ 0.670127] CPU features: detected: RCpc load-acquire (LDAPR) [ 0.684748] CPU features: detected: LSE atomic instructions [ 0.697564] CPU features: detected: Privileged Access Never [ 0.733989] CPU features: detected: Hardware dirty bit management on CPU1-2,4 [ 0.751305] CPU: All CPU(s) started at EL1 [ 2.963616] No ACPI PMU IRQ for CPU0 [ 2.964194] No ACPI PMU IRQ for CPU1 [ 2.964841] No ACPI PMU IRQ for CPU2 [ 2.965283] No ACPI PMU IRQ for CPU3 [ 2.965691] No ACPI PMU IRQ for CPU4 [ 2.966302] No ACPI PMU IRQ for CPU5 [ 2.966895] No ACPI PMU IRQ for CPU6 [ 2.967408] No ACPI PMU IRQ for CPU7 ➜ ~ lscpu Architecture: aarch64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: Qualcomm Model: 14 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 1 Stepping: 0xd BogoMIPS: 38.40 Flags: fp asimd aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp Model: 14 Thread(s) per core: 1 Core(s) per socket: 7 Socket(s): 1 Stepping: 0xd BogoMIPS: 38.40 Flags: fp asimd aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp Caches (sum of all): L1d: 256 KiB (8 instances) L1i: 256 KiB (8 instances) L2: 1 MiB (8 instances) L3: 4 MiB (1 instance) Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Vulnerable Mmio stale data: Not affected Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; __user pointer sanitization Spectre v2: Mitigation; Branch predictor hardening Srbds: Not affected Tsx async abort: Not affected CPU info of Hyper-V VMroot@ubuntu0:~# dmesg | grep CPU [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x51df804e] [ 0.000000] Detected VIPT I-cache on CPU0 [ 0.000000] CPU features: detected: GIC system register CPU interface [ 0.000000] CPU features: detected: Hardware dirty bit management [ 0.000000] CPU features: detected: Spectre-v2 [ 0.000000] CPU features: detected: Spectre-v4 [ 0.000000] CPU features: kernel page table isolation forced ON by KASLR [ 0.000000] CPU features: detected: Kernel page table isolation (KPTI) [ 0.000000] CPU features: detected: ARM erratum 1418040 [ 0.000000] CPU features: detected: Qualcomm erratum 1009, or ARM erratum 1286807, 2441009 [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=8. [ 0.000000] GICv3: CPU0: found redistributor 0 region 0:0x00000000effee000 [ 0.001753] smp: Bringing up secondary CPUs ... [ 0.002120] Detected VIPT I-cache on CPU1 [ 0.002190] GICv3: CPU1: found redistributor 1 region 1:0x00000000f000e000 [ 0.002285] CPU1: Booted secondary processor 0x0000000001 [0x51df804e] [ 0.002729] Detected VIPT I-cache on CPU2 [ 0.002782] GICv3: CPU2: found redistributor 2 region 2:0x00000000f002e000 [ 0.003386] CPU2: Booted secondary processor 0x0000000002 [0x51df805e] [ 0.003823] Detected VIPT I-cache on CPU3 [ 0.003880] GICv3: CPU3: found redistributor 3 region 3:0x00000000f004e000 [ 0.003961] CPU3: Booted secondary processor 0x0000000003 [0x51df804e] [ 0.004327] Detected VIPT I-cache on CPU4 [ 0.004387] GICv3: CPU4: found redistributor 4 region 4:0x00000000f006e000 [ 0.004479] CPU4: Booted secondary processor 0x0000000004 [0x51df804e] [ 0.004788] Detected VIPT I-cache on CPU5 [ 0.004915] GICv3: CPU5: found redistributor 5 region 5:0x00000000f008e000 [ 0.005144] CPU5: Booted secondary processor 0x0000000005 [0x51df805e] [ 0.005658] Detected VIPT I-cache on CPU6 [ 0.005789] GICv3: CPU6: found redistributor 6 region 6:0x00000000f00ae000 [ 0.005866] CPU6: Booted secondary processor 0x0000000006 [0x51df805e] [ 0.006169] Detected VIPT I-cache on CPU7 [ 0.006255] GICv3: CPU7: found redistributor 7 region 7:0x00000000f00ce000 [ 0.006322] CPU7: Booted secondary processor 0x0000000007 [0x51df804e] [ 0.012949] smp: Brought up 1 node, 8 CPUs [ 0.012983] CPU features: detected: 32-bit EL0 Support [ 0.012987] CPU features: detected: CRC32 instructions [ 0.012989] CPU features: detected: Data cache clean to Point of Persistence [ 0.012993] CPU features: detected: RCpc load-acquire (LDAPR) [ 0.012996] CPU features: detected: LSE atomic instructions [ 0.012998] CPU features: detected: Privileged Access Never [ 0.024078] CPU: All CPU(s) started at EL1 [ 0.570000] ledtrig-cpu: registered to indicate activity on CPUs [ 0.570310] No ACPI PMU IRQ for CPU0 [ 0.570315] No ACPI PMU IRQ for CPU1 [ 0.570317] No ACPI PMU IRQ for CPU2 [ 0.570318] No ACPI PMU IRQ for CPU3 [ 0.570320] No ACPI PMU IRQ for CPU4 [ 0.570322] No ACPI PMU IRQ for CPU5 [ 0.570323] No ACPI PMU IRQ for CPU6 [ 0.570325] No ACPI PMU IRQ for CPU7 root@ubuntu0:~# lscpu Architecture: aarch64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: Qualcomm BIOS Vendor ID: Qualcomm Technologies Inc Model name: Kryo-4XX-Gold BIOS Model name: Microsoft SQ2 @ 3.15 GHz None CPU @ 1.5GHz BIOS CPU family: 280 Model: 14 Thread(s) per core: 1 Core(s) per socket: 8 Socket(s): 1 Stepping: 0xd BogoMIPS: 38.40 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp Caches (sum of all): L1d: 512 KiB (8 instances) L1i: 512 KiB (8 instances) L2: 4 MiB (8 instances) L3: 4 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-7 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Mitigation; PTI Mmio stale data: Not affected Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; __user pointer sanitization Spectre v2: Mitigation; Branch predictor hardening Srbds: Not affected Tsx async abort: Not affected |
Edited, didn't work, cheered too soonOn advice of a thread on the kernel mailing list on RCU. I compiled the kernel with Feel free to test: With RSEQ disabled, the crashes still occur, however it feels like it takes considerably longer before they occur, it might still be worth it to try the linked kernel. Unfortunately my window with
And now with a later occurring trace!
|
I did give it a try. I started up a dev container with a rather large code base, which I usually won't be able to compile, much less start a debugging session, before WSL hangs. With this kernel, however, I was able to compile, debug and actually run. As you note, I still got the crashes but it took much longer for them to occur and it seems WSL was able to recover more quickly. A big step forward but not quite usable :-). It also didn't solve the problem where WSL always completely hangs and uses 99% CPU when closing VS Code and the dev container. I will try to collect some logs and post them here. |
@pmartincic further digging shows that the problems are alleviated quite a bit by disabling RSEQ on a recent kernel and the problems are possibly related to copy_to_user fault handling. See also this thread / reply: https://lore.kernel.org/rcu/[email protected]/ I do see some patches submitted upstream for hv timer support recently (among which a commit of yours, congrats!), do these fix anything related to this issue? |
See #11274 (comment) This issue is fixed in 24H2. |
Version
Windows version: 10.0.22621.963
WSL Version
Kernel Version
Kernel version: 5.15.79.1
Distro Version
Ubuntu 20.04.5 LTS
Other Software
I'm running on a Microsoft Surface X laptop with a Microsoft SQ2 processor & 16GB ram
Repro Steps
Intermittent, but occurs regularly (1-4 times a day).
Some factors that may make it worse (subjective & anecdotal):
tmux
)Some factors that do not appear to change behavior:
I have followed suggestions in other "hang issues" such as #9114, #8824, but they have not resolved the issue.
Expected Behavior
Terminals and apps work without hangs. (I have simple needs.)
(I use Windows for all high GUI/network apps, such as Zoom, Web Browsing, etc.)
Actual Behavior
WSL becomes unresponsive:
VmmemWSL
Windows background processTrying to regain control in powershell with wsl.exe
wsl.exe -l -v
executes and output is correctwsl.exe -t work
executes but may delay noticeably before returningwsl.exe -l -v
), but cannot start a fresh sessionwsl.exe --shutdown
with some successwsl.exe --shutdown
hangs about 50% of the time (does not return to powershell prompt)wslservice.exe
will reliably allow me to start a new WSL session, but very occasionally the new session will be overly sluggish, and I find rebooting will clear the problem.wslservice.exe
, thewsl.exe --shutdown
will terminate, with the message:Diagnostic Logs
I've tried the procedure described here by OneBlue to obtain dumps & other logs. However, I can not successfully perform the procedure as the system distro shell is also hung. (It is a separate tab in the same windows terminal instance.)
I'm hoping you can provide some additional techniques to allow me to gather logs which will be of use in finding and fixing the issue.
The text was updated successfully, but these errors were encountered: