-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switching to info screen sometimes hangs #120
Comments
Yes, more information is needed. |
@wsakernel - how can this be reproduced, which system are you using, is it using Linux with glibc, or musl, is the architecture x64 or ARM, any other information? I am not able to reproduce this locally, related concurrency bugs have been fixed quite a while ago. |
I see it on two different Intel x86-64 based off-the-shelf Fujitsu laptops. One running Debian 11 (bullseye), one running Debian 12 (bookworm). Nothing fancy, standard glibc. I can reproduce it by repeatedly hitting keys '1' and '3' (note that I don't run as root, so the scan screen is just the text about missing CAP_SYS_ADMIN), but '1' and '2' might fail as well. It is always the info screen which hangs. The frequency with which I change screens does not matter. It happens "randomly". Just tried again, it failed after 3 seconds. Next fail took 20 seconds of switching screens. I guess I need to fire up GDB for really useful info, but sadly I have no time for digging into this right now :( |
Okay, I may have no time, but I have interest ;)
Sampling thread is stuck here:
with this backtrace
It seems this syscall blocks? Something bad with my wifi driver not able to respond? |
Other progs had a similar problem, too: sonic-net/sonic-swss-common#114 Just setting the socket to non-blocking alone is not enough. The |
Thanks for looking into this. It seems there is no quick fix for this, and I have only limited extra time at the moment. |
I'll try to add timeout to non-blocking. Discarding the broken cmd and starting a new one seems to work. We will see... |
Info screen waits for the first data to arrive. This stalls sometimes on my machine because the netlink command does not complete for some reason. So, set the socket to non-blocking and try again next cycle if no data is available. This needs a small version bump for libnl from 3.2 to 3.2.22 because only since then the call to nl_recvmsgs() returns -NLE_AGAIN. Fixes uoaerg#120.
Info screen waits for the first data to arrive. This stalls sometimes on my machine because the netlink command does not complete for some reason. So, set the socket to non-blocking and try again next cycle if no data is available. This needs a small version bump for libnl from 3.2 to 3.2.22 because only since then the call to nl_recvmsgs() returns -NLE_AGAIN. Fixes uoaerg#120.
Info screen waits for the first data to arrive. This stalls sometimes on my machine because the netlink command does not complete for some reason. So, set the socket to non-blocking and try again next cycle if no data is available. This needs a small version bump for libnl from 3.2 to 3.2.22 because only since then the call to nl_recvmsgs() returns -NLE_AGAIN. Fixes #120.
As title says, sometimes when pressing '1' for the info screen, the program locks up. The lowest line highlights 'info' but nothing else is printed on the screen. I can reproduce it with current top-of-tree (58fb5a4) by constantly switching between info screen and help screen. It might take some seconds, but here on two different machines, the issue will show. I haven't time to debug it further, my gut feeling is that there is some race condition between the pthreads? The lock up does not happen between other screens I tried. Let me know if more information is needed.
The text was updated successfully, but these errors were encountered: