WSL2 vim hang - RCU stall #7913

usedbytes · 2022-01-10T17:54:43Z

Version

Microsoft Windows [Version 10.0.19042.1415]

WSL Version

WSL 2
WSL 1

Kernel Version

Linux version 5.10.60.1-microsoft-standard-WSL2 (oe-user@oe-host) (aarch64-msft-linux-gcc (GCC) 9.3.0, GNU ld (GNU Binutils) 2.34.0.20200220) #1 SMP Wed Aug 25 23:20:18 UTC 2021

Distro Version

Ubuntu-20.04

Other Software

tmux, vim

Repro Steps

I'm not exactly sure - I had a vim (specifically vim.gtk3, but in a terminal) process running inside a tmux session. When I came back to it after some hours, it was non-responsive.

I have seen this same issue in the past with the same set-up, but I don't know what triggers it.

Expected Behavior

vim/terminal remains responsive. The kernel doesn't report RCU stalls.

If it becomes unresponsive, I expect to be able to kill the process via kill/killall etc.

Actual Behavior

The process is completely non-responsive.

top and /proc say the process is running, at 100% CPU:

top - 17:46:34 up  7:06,  0 users,  load average: 3.00, 2.95, 2.15
Tasks:  14 total,   2 running,  12 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us, 12.4 sy,  0.0 ni, 86.8 id,  0.0 wa,  0.4 hi,  0.3 si,  0.0 st
MiB Mem :  12417.9 total,   8655.4 free,    401.6 used,   3360.8 buff/cache
MiB Swap:   4096.0 total,   4096.0 free,      0.0 used.  11900.4 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  6146 XXXXXX  20   0   56080  19136  13436 R  98.7   0.2  23:31.67 vim

dmesg reports RCU stalls:

[25546.968589] rcu: INFO: rcu_sched self-detected stall on CPU
[25546.968599] rcu:     5-....: (327445 ticks this GP) idle=166/1/0x4000000000000002 softirq=92025/92025 fqs=157547
[25546.968602]  (t=330021 jiffies g=430477 q=135508)
[25546.968611] Task dump for CPU 5:
[25546.968612] task:vim             state:R  running task     stack:    0 pid: 6295 ppid:   153 flags:0x00000003
[25546.968689] Call trace:
[25546.968701]  dump_backtrace+0x0/0x1c8
[25546.968704]  show_stack+0x1c/0x28
[25546.968711]  sched_show_task+0x140/0x160
[25546.968718]  dump_cpu_task+0x48/0x54
[25546.968720]  rcu_dump_cpu_stacks+0xc0/0x100
[25546.968723]  rcu_sched_clock_irq+0x7bc/0x9b8
[25546.968726]  update_process_times+0x64/0x2a0
[25546.968729]  tick_sched_handle.isra.0+0x38/0x50
[25546.968731]  tick_sched_timer+0x50/0xa0
[25546.968733]  __hrtimer_run_queues+0x11c/0x3a8
[25546.968735]  hrtimer_interrupt+0x118/0x300
[25546.968739]  hv_stimer0_isr+0x28/0x30
[25546.968741]  hv_stimer0_percpu_isr+0x14/0x20
[25546.968744]  handle_percpu_devid_irq+0x8c/0x210
[25546.968745]  generic_handle_irq+0x34/0x48
[25546.968747]  __handle_domain_irq+0x68/0xc0
[25546.968749]  gic_handle_irq+0x5c/0x120
[25546.968750]  el1_irq+0xc8/0x180
[25546.968752]  setup_rt_frame+0x8d0/0xbf0
[25546.968754]  do_notify_resume+0x294/0x650
[25546.968755]  work_pending+0xc/0x208

I am unable to terminate the process with kill 6146/kill -9 6146/killall vim.
I am unable to attach to the process with strace/gdb -p 6146 - it just hangs at Attaching to process 6146

Windows Task Manager shows ~7% CPU usage in Vmmem, on an otherwise idle system.

This is a Surface Pro X with the SQ2 chip (Windows on Arm)

Diagnostic Logs

No response

The text was updated successfully, but these errors were encountered:

benhillis · 2022-01-11T22:08:00Z

@tyhicks - we've seen some RCU stalls in our test automation in the past. Is there something that we can do to root cause these?

usedbytes · 2022-01-12T09:22:55Z

More data, I didn't see a hang, but I checked dmesg this morning and it's full of RCU stalls - I guess they freed up because the WSL2 instance is still responsive.

There seems to be a pretty wide range of tasks (kworker, rcu_sched, khugepaged, swapper, neomutt, tmux, localhost(?), ksoftirqd...), so I suspect it's not to do with any particular application.

[41445.560635] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[41445.560647] rcu:     3-...!: (0 ticks this GP) idle=9d2/1/0x4000000000000002 softirq=89675/89675 fqs=2
[41445.560651]  (detected by 2, t=205367 jiffies, g=334085, q=323)
[41445.560658] Task dump for CPU 3:
[41445.560660] task:khugepaged      state:R  running task     stack:    0 pid:   57 ppid:     2 flags:0x00000028
[41445.560664] Call trace:
[41445.560672]  __switch_to+0x84/0xc0
[41445.560675]  0xfffffdffffe00000
[41445.560677] rcu: rcu_sched kthread starved for 205361 jiffies! g334085 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=5
[41445.560678] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[41445.560679] rcu: RCU grace-period kthread stack dump:
[41445.560680] task:rcu_sched       state:I stack:    0 pid:   12 ppid:     2 flags:0x00000028
[41445.560682] Call trace:
[41445.560683]  __switch_to+0x84/0xc0
[41445.560687]  __schedule+0x348/0x730
[41445.560689]  schedule+0x4c/0xd0
[41445.560691]  schedule_timeout+0x1a0/0x328
[41445.560695]  rcu_gp_kthread+0x41c/0xbc8
[41445.560701]  kthread+0x100/0x130
[41445.560704]  ret_from_fork+0x10/0x18
[41445.560752] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[41445.560756] rcu:     3-...!: (1 ticks this GP) idle=9d2/1/0x4000000000000000 softirq=89675/89675 fqs=2
[41445.560758]  (detected by 7, t=1854634 jiffies, g=334085, q=323)
[41445.560762] Task dump for CPU 3:
[41445.560763] task:khugepaged      state:R  running task     stack:    0 pid:   57 ppid:     2 flags:0x0000002a
[41445.560766] Call trace:
[41445.560769]  __switch_to+0x84/0xc0
[41445.560771]  0xfffffdffffe00000
[41445.560773] rcu: rcu_sched kthread starved for 1854628 jiffies! g334085 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
[41445.560773] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[41445.560774] rcu: RCU grace-period kthread stack dump:
[41445.560775] task:rcu_sched       state:R  running task     stack:    0 pid:   12 ppid:     2 flags:0x00000028
[41445.560781] Call trace:
[41445.560783]  __switch_to+0x84/0xc0
[41445.560809]  __schedule+0x348/0x730
[41445.560811]  schedule+0x4c/0xd0
[41445.560812]  schedule_timeout+0x1a0/0x328
[41445.560814]  rcu_gp_kthread+0x41c/0xbc8
[41445.560816]  kthread+0x100/0x130
[41445.560821]  ret_from_fork+0x10/0x18
[41445.569823] WSL2: Performing memory compaction.
[49548.284171] rcu: INFO: rcu_sched self-detected stall on CPU
[49548.284177] rcu:     6-...!: (1 GPs behind) idle=92e/1/0x4000000000000002 softirq=87466/87471 fqs=0
[49548.284180]  (t=177580 jiffies g=334341 q=264)
[49548.284187] rcu: rcu_sched kthread starved for 177580 jiffies! g334341 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=5
[49548.284188] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[49548.284188] rcu: RCU grace-period kthread stack dump:
[49548.284190] task:rcu_sched       state:I stack:    0 pid:   12 ppid:     2 flags:0x00000028
[49548.284193] Call trace:
[49548.284201]  __switch_to+0x84/0xc0
[49548.284203] rcu: INFO: rcu_sched self-detected stall on CPU
[49548.284206]  __schedule+0x348/0x730
[49548.284208]  schedule+0x4c/0xd0
[49548.284211]  schedule_timeout+0x1a0/0x328
[49548.284214] rcu:     5-...!: (2 ticks this GP) idle=606/0/0x3 softirq=95374/95374 fqs=0
[49548.284217]  rcu_gp_kthread+0x41c/0xbc8
[49548.284218]  (t=727336 jiffies g=334341 q=264)
[49548.284220] rcu: rcu_sched kthread starved for 727336 jiffies! g334341 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=5
...
...
[66128.883585] rcu: rcu_sched kthread starved for 1439749 jiffies! g338001 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
[66128.883586] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[66128.883587] rcu: RCU grace-period kthread stack dump:
[66128.883587] task:rcu_sched       state:R  running task     stack:    0 pid:   12 ppid:     2 flags:0x00000028
[66128.883588] Call trace:
[66128.883589]  __switch_to+0x84/0xc0
[66128.883590]  __schedule+0x348/0x730
[66128.883591]  schedule+0x4c/0xd0
[66128.883592]  schedule_timeout+0x1a0/0x328
[66128.883593]  rcu_gp_kthread+0x41c/0xbc8
[66128.883594]  kthread+0x100/0x130
[66128.883595]  ret_from_fork+0x10/0x18
[70256.496667] WSL2: Performing memory compaction.
[71018.375306] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[71018.375312] rcu:     4-...!: (0 ticks this GP) idle=338/0/0x0 softirq=98633/98633 fqs=1  (false positive?)
[71018.375317]  (detected by 0, t=175088 jiffies, g=338801, q=5)
[71018.375322] Task dump for CPU 4:
[71018.375325] task:swapper/4       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x00000028
[71018.375329] Call trace:
[71018.375336]  __switch_to+0x84/0xc0
[71018.375343]  tick_nohz_idle_stop_tick+0x1f4/0x2a8
[71018.375347]  do_idle+0xc0/0x128
[71018.375349] rcu: rcu_sched kthread starved for 175086 jiffies! g338801 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=7
[71018.375350] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[71018.375350] rcu: RCU grace-period kthread stack dump:
[71018.375352] task:rcu_sched       state:I stack:    0 pid:   12 ppid:     2 flags:0x00000028
[71018.375353] Call trace:
[71018.375368]  __switch_to+0x84/0xc0
[71018.375371]  __schedule+0x348/0x730
[71018.375372]  schedule+0x4c/0xd0
[71018.375374]  schedule_timeout+0x1a0/0x328
[71018.375375]  rcu_gp_kthread+0x41c/0xbc8
[71018.375377]  kthread+0x100/0x130
[71018.375379]  ret_from_fork+0x10/0x18

hwine · 2022-06-14T23:03:28Z

I'm also seeing this on aarch64 on a regular basis. A few more general observations, and I'd be happy to provide logs if you point me to instructions.

this happens "for sure" with tmux
often, only the active tmux window is impacted -- I may be able to do ^B^W and find other working windows.
it will also happen with bash
I've experienced this in "Windows Terminal", "Windows Terminal Preview", and a terminal window in VSCode
Often, after the hang, wsl shutdown will not complete, and I have to terminate all wsl*.exe tasks, then vmmemWSL before I get a successful restart of WSL.
- after a number of "force quits", I need to reboot (I haven't tried logout).
sometimes, wsl shutdown will complete, but VmmemWSL will still be running at 20-30% CPU. I have to wait for that process to terminate (or kill it) before I can get a new working session.

I think it is less likely to hang if the terminal window is not the active window in the app. E.g. if I switch to a power shell tab in windows terminal, or close the terminal pane in vscode. But I haven't been keeping timings, so ....

key	value
kernel	Linux TABLET-935H4GMO 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux (Microsoft Surface X)
OS	Ubuntu 20.04.4 LTS

microsoft-github-policy-service · 2024-02-01T12:08:08Z

This issue has been automatically closed since it has not had any activity for the past year. If you're still experiencing this issue please re-file this as a new issue or feature request.

Thank you!

usedbytes mentioned this issue Jan 16, 2023

WSL hang, arm64 cpu, not solved by other solutions #9454

Open

2 tasks

microsoft-github-policy-service bot closed this as completed Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WSL2 vim hang - RCU stall #7913

WSL2 vim hang - RCU stall #7913

usedbytes commented Jan 10, 2022

benhillis commented Jan 11, 2022

usedbytes commented Jan 12, 2022

hwine commented Jun 14, 2022 •

edited

Loading

microsoft-github-policy-service bot commented Feb 1, 2024

WSL2 vim hang - RCU stall #7913

WSL2 vim hang - RCU stall #7913

Comments

usedbytes commented Jan 10, 2022

Version

WSL Version

Kernel Version

Distro Version

Other Software

Repro Steps

Expected Behavior

Actual Behavior

Diagnostic Logs

benhillis commented Jan 11, 2022

usedbytes commented Jan 12, 2022

hwine commented Jun 14, 2022 • edited Loading

microsoft-github-policy-service bot commented Feb 1, 2024

hwine commented Jun 14, 2022 •

edited

Loading