-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frequent RCU Stalls on ARM64 #11210
Comments
That's interesting. Can you try to have an |
I haven't found a way to reproduce the issue reliablely. It is too random. And I just encoutered it again while using VS Code. Some processes of VS Code stopped responding and took 100% CPU. I tried running |
After rebooting and entering WSL again, I didn't find the file to be redirected to. I will see what else info I can collect next time I encounter it. |
I have been encountering something similar when running bitbake, if it runs long enough, eventually WSL grinds to a halt, consuming all available CPUs given to it, and nothing actually runs. wsl --shutdown is the only way I've found to recover, once it's begun, as there's no way to run any diagnostics inside WSL at that point. |
After some investigation, I find that any process that attaches to the process not responsive will be stuck, and can't be quitted using I'm looking for some methods that can collect diagnostic logs without attaching to the process. |
@PeronGH do you see RCU stalls in your kernel logs / dmesg when this happens? |
Yes, I do. This happened when I wake up the system from sleep:
And this is when compiling an Android App:
The kernel is https://github.com/Nevuly/WSL2-Linux-Kernel-Rolling/releases/tag/linux-wsl-stable-6.7.9 but the issue happened on the official WSL kernel too. If required, I can switch back to official kernel and wait for the issue to happen in order to collect logs. |
Closed in favour of #11274 |
Alright, well unfortunately, welcome to the club. You can try compiling the kernel with CONFIG_RSEQ disabled which seriously alleviates the problem, and on a Hyper-V VM this doesn't happen (however you'll miss all the WSL2 magic). I see (while typing this reply) that you found my latest issue - hopefully the Microsoft team will respond there soon! |
Windows Version
Microsoft Windows [Version 10.0.22635.3209]
WSL Version
2.1.3.0
Are you using WSL 1 or WSL 2?
Kernel Version
5.15.146.1-2
Distro Version
Debian GNU/Linux trixie trixie/sid aarch64
Other Software
htop 3.3.0
GNU bash, version 5.2.21(1)-release (aarch64-unknown-linux-gnu)
vscode 1.86.2
(Actually not specifc to any of them)
Repro Steps
The trigger is not clear, but it seems related to a process using too much resources or system sleep. It is triggered multilpe times a day, sometimes when compiling some Rust projects, somethimes when closing VS Code. It seems not related to any specific setup, as I encoutered it in the fresh install of different distros, kernels and configs.
Expected Behavior
This should not happen.
Actual Behavior
The process will suddenly become not responsive, and uses 100% CPU. It is also impossible to kill the process, the only way to stop it is to shutdown WSL. I've already tried different distros, but the bug persists on both Ubuntu and Debian. I also tried different branches of Debian, including the
testing
branch I'm using now. And I used different shells too,zsh
is also unable to avoid the bug. I even tried different kernels, including Nevuly/WSL2-Linux-Kernel-Rolling-LTS, but it cannot resolve the issue too. I also disbaled WSLg, swap, systemd and everything suspicious, still failed to solve the issue.Here is the screenshot of this time:
Here are previous screenshots:
(Caused by a process in WSLg container)
Diagnostic Logs
I tried attaching to it using
strace
, but thestrace
became not responsive too. I also triedlsof
, which also became not responsive and unable to be killed.The text was updated successfully, but these errors were encountered: