-
-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wnck-applet crashes randomly since update to 1.26.2 #1385
Comments
Please give me the stack trace command that you want me to execute so I can update the log in the top post. |
I'm experiencing this on distribution postmarketOS on device samsung-serranove (armv7, SoC msm8916, display 540x960). Installed package versions: mate-desktop 1.26.1, mate-panel 1.27.2 (but issue also happened at 1.26.3), libwnck3 43.0. It does not happen on devices bq-paella (aarch64, SoC msm8916, display 720x1280) and virtual machine qemu (amd64, virtual, display 640x480). On device samsung-serranove I also tried with rotated the display by 90 degrees to 960x540. It didn't solve the issue. When starting e.g. application "Pluma" from the classic menu, the panel items "Show Desktop", "Window List" and "Workspace Switcher" are crashing, showing message "... has quit unexpectedly". All the three are part of the wnck-applet, which is part of mate-panel (https://github.com/mate-desktop/mate-panel/tree/v1.26.3/applets/wncklet). Edit: When starting e.g. "Pluma" by command line in a terminal, the crashing doesn't happen. When removing the "Window List" (right-click its "handle" that is e.g. three dots, uncheck "Lock To Panel", again right-click the three dots, "Remove from Panel") and starting e.g. "Pluma" from classic menu, no crash occurs anymore. Right-click panel, "Reset All Panels" to continue debugging. In terminal checking Open a terminal (if needed click 3x "Reload"), type Summarizing so far: wnck-applet crashes with message "Illegal instruction" when starting an application from classic menu (does not happen with all applications, only with some). "Window List" seems to cause the issue. I don't know how to debug this further. Any advice? |
The issue was also reported for device asus-tf201 (armv7, SoC Nvidia Tegra 3, display 1280x800). Could it be a 32-bit vs. 64-bit issue somewhere in the code? That's just a wild guess. |
on my device (asus-grouper, Tegra 3 SOC) I attached strace to the PID of wnck-applet (
For me this looks like that the mmap fails.. |
I see a very similar issue in Xfce4 that also affects my 32-bit device samsung-serranove only but not the 64-bit devices bq-paella and qemu. Not yet fully sure if this is directly related (will investigate further). Still it might be a hint that something underlying causes a 32-bit issue. |
With the very little sample you're giving, could seem like an issue on armv7. Not sure if @davidhedlund has such a system though, and he's seeing X Window System error, which probably is a different issue. Anyway, you could try and run the applet under Anyway, not that I have any clue here nor amrv7 hardware on which I could test, but maybe that could help track it down. |
Intel 64-bit here. |
Tried the propose gdb-session, seem to point to a syscall issue:
|
Oh... then it's either not a 32-bit vs. 64-bit issue – or we have two different issues. @LongnoseRob In your debug output there is a "^C" and at the end of the next line it says "Interrupt". You may have canceled the process? Edit: Start wnck-applet via gdb in terminal, then make it crash by starting Firefox or Pluma from menu. My debug output isn't much helpful either. The crash is due to "Illegal instruction". However, looking for the backtrace it says the stack may be corrupted. Web-searching about that says this is not unusual on ARM... I didn't find a proper solution how to do the backtracing.
|
Yes, that was intentional to get the BT of the first process that exited. I tried some more digging and installed |
I am guessing this is ARM only? I do not have any ARM machines bigger than a phone, so no suprise I have never seen this. This shows the value of testing on all supported architectures, as this was found only after release. Do we have anyone on the team with an ARM laptop? If not, someone developing for ARM would be a valuable addition to the team given such laptops are likely to become more common in the future. |
@LongnoseRob and me are on architecture armv7 or @davidhedlund: Could you try the "gdb" debugging steps and report what you get?
The steps I did to get the debug output:
|
If you know how to build from source, you could try |
Note that |
I chose a different approach to track it down. In postmarketOS v22.12 (based on Alpine Linux 3.17) it works without issues, in v23.06 (based on 3.18) there are issues. So I set up a pmOS v22.12 (3.17) installation, changed the repository URLs to v23.06 (3.18), systematically upgraded the packages and tried to test which one causes the issue. It turned out that package "startup-notification" introduced the issue when upgrading in Apline Linux from package version 0.12-r4 to 0.12-r5. The difference between the two build releases "r4" and "r5" is a patch to fix 32-bit time. The patches (actually the one patch in Alpine consists of two patches) are also implemented in upstream "startup-notification" but are not yet part of an official release, they were implemented after the latest release 0.12.
At least within Alpine Linux that patch seems to do the contrary of what it's supposed to do. I created an issue report at Alpine to discuss the further procedure: Now the next question is of the issue of @davidhedlund is somehow related to this. It's unlikely as it seems to be a 32-bit issue and he's on 64-bit. Trisquel is based on Ubuntu (based on Debian). Both in Ubuntu and Debian the version of "startup-notification" wasn't changed for a long time. Also they don't have this 32-bit time patch implemented in their build repository, as far as I can see.
So it's quite clear to me that the issue of @davidhedlund is another one. Sorry for having chimed into your issue. You should try to debug your issue with preferably |
I clicked on "Don't Reload" |
Oh, my bad. The path in Ubuntu is different: /usr/lib/mate-panel/wnck-applet The order of the first steps are:
In the terminal gdb should now be running without Try to reproduce the crash that you reported. Or keep on working until the crash happens. Once wnck-applet crashed, in the terminal there should be some text saying If it doesn't work or you have question, feel free to ask. |
@Jakko3 well, done! So now we know what you see is actually an issue in Alpine's libstartup-notifications, not anything else (yet) as this is more than an ABI break a rebuild would solve. In practice, it's (more than likely) crashing libwnck in its The issue here is that the Alpine patch not only breaks ABI (somewhat), but breaks API as well in practice: no rebuild is gonna make the passed in pointers point to wider memory areas. Note that the rationale on the linked email is confusing because it quotes a piece of libstartup-notification code that is not vanilla, and ends up referencing itself so I don't exactly know why they patched this in OpenBSD in the first place… before the patch, callers passing Anyway, if they are going with this change, they also need to patch libwnck to use that new API/ABI: diff --git a/libwnck/tasklist.c b/libwnck/tasklist.c
index ccc7427..2e79a36 100644
--- a/libwnck/tasklist.c
+++ b/libwnck/tasklist.c
@@ -4973,7 +4973,8 @@ sequence_timeout_callback (void *user_data)
WnckTasklist *tasklist = user_data;
GList *tmp;
gint64 now;
- long tv_sec, tv_usec;
+ time_t tv_sec;
+ suseconds_t tv_usec;
double elapsed;
now = g_get_real_time (); Note that this is not something upstream Anyway, note that xfwm4 also still uses the libsn 0.12 API (e.g. passes pointers to Anyway, that's not mate-panel's bug; and I'd say not libwnck one either as there is no released version of the library causing the problem, and checking for this is terribly tricky. Also, it affects multiple (if not all) users of the API, so a reasonable new release of libstartup-notification would need to be incompatible -- or maybe better, add some API to use the 2038-safe values. |
BTW, it's not the first time that libstartup-notifications unexpectedly broke ABI :) |
I had to click reload twice, as you suggested. Also, there's no gdb prompt:
|
@davidhedlund looks good, now you need the applet to crash 🙂, and then output a backtrace ( |
I'm lost here. Should the gdb prompt be triggered or manually opened? @cwendling |
Your picture is correct. This is what I described as The next step is what I descirbed as After that crash happened, return to the terminal window and do the next steps as described by me (and slightly extended by @cwendling). Let us know if you're not sure or it's not clear. That's no problem. |
When wnck-applet crashes, the gdb prompt will be triggered by the crash. |
If you have virt-manager, you could install Trisquel and try to reproduce it if you want, since you have better knowledge how to solve this issue. Firefox is not available in the Trisquel repositories due to trademark issues. But Pluma is installed by default. |
Nothing crashes (with or without |
The crashes by starting "Pluma" (or other applications) from classic menu was the issue I ran into. We could find the cause and solve it. Now we need to find the cause of your issue that you reported. It's a different issue and not related to mine. I suggest that you set up a terminal like in your last picture – and then use your PC as you usually do. At some point the issue you reported would happen. If your issue doesn't happen at all anymore, you might close this issue report.
I downloaded Trisquel 11 and booted it by live USB on my amd64 PC. I tried different things but did not experience any crashes of wnck-applet so far. |
I experienced this issue two times during the same time (perhaps a month) when I reported it. I have never experienced it ever since, but I have not closed the issue because someone else might be able to reproduce it.
Thank you very much for your efforts and support! |
By coincidence, the wnck-applet crashed again for the first time in 6 months since I opened this issue, Trisquel while I was working with it as normal: So I don't think this issue should be closed. Same package versions as when I submitted this issue:
|
You are using an old mate-panel release. |
Trisquel 11 is based on Ubuntu 22.04 LTS (Jammy Jellyfish). This one is on mate-panel 1.26.2. Nonetheless a good hint that mate-panel 1.26.3 might solve the issue of @davidhedlund. |
I just noticed that the version is 1.26.2-1 +11.0trisquel10. Seems to be a Trisquel build then. I don't know how/where they build their packages, couldn't find it yet on a quick look at https://gitlab.trisquel.org/trisquel. |
Trisquel takes the latest upstream (ubuntu) source and rebuild using small bash scripts called "helpers" for mate-panel here is the used for the 22.04 jammy based release. Where,
I wonder what could break if we add 1.26.3 patches to the 1.26.2-1 release on this single package we have from upstream. Update: ...or better yet only apply the commit(s) fixing the issue. |
Expected behaviour
Actual behaviour
"Window List" has quite unexpectedly
and"Show Desktop" has quite unexpectedly
occurs at the same time (just no screenshot borders below)journalctl
Steps to reproduce the behaviour
MATE general version
1.26.0-1
Package version
Linux Distribution
Trisquel 11
Link to bugreport of your Distribution (requirement)
Closely related: https://gitlab.trisquel.org/trisquel/package-helpers/-/issues/76
The text was updated successfully, but these errors were encountered: