Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wnck-applet crashes randomly since update to 1.26.2 #1385

Open
davidhedlund opened this issue Jul 17, 2023 · 34 comments
Open

wnck-applet crashes randomly since update to 1.26.2 #1385

davidhedlund opened this issue Jul 17, 2023 · 34 comments

Comments

@davidhedlund
Copy link

davidhedlund commented Jul 17, 2023

Expected behaviour

Actual behaviour

  • "Window List" has quite unexpectedly and "Show Desktop" has quite unexpectedly occurs at the same time (just no screenshot borders below)

Screenshot at 2023-07-14 08-31-20

journalctl

jul 17 18:30:06 blues-System-Product-Name wnck-applet[1942]: The program 'wnck-applet' received an X Window System error.
jul 17 18:30:06 blues-System-Product-Name kernel: traps: wnck-applet[1942] trap int3 ip:7ff2ec159167 sp:7ffd00992c60 error:0 in libglib-2.0.so.0.7200.4[7ff2ec115000+8f000]

Steps to reproduce the behaviour

MATE general version

1.26.0-1

Package version

$ apt show libwnck-3-0
Package: libwnck-3-0
Version: 40.1-1
$ apt show mate-panel
Package: mate-panel
Version: 1.26.2-1+11.0trisquel10

Linux Distribution

Trisquel 11

Link to bugreport of your Distribution (requirement)

Closely related: https://gitlab.trisquel.org/trisquel/package-helpers/-/issues/76

@davidhedlund
Copy link
Author

Please give me the stack trace command that you want me to execute so I can update the log in the top post.

@Jakko3
Copy link

Jakko3 commented Nov 2, 2023

I'm experiencing this on distribution postmarketOS on device samsung-serranove (armv7, SoC msm8916, display 540x960). Installed package versions: mate-desktop 1.26.1, mate-panel 1.27.2 (but issue also happened at 1.26.3), libwnck3 43.0.

It does not happen on devices bq-paella (aarch64, SoC msm8916, display 720x1280) and virtual machine qemu (amd64, virtual, display 640x480).

On device samsung-serranove I also tried with rotated the display by 90 degrees to 960x540. It didn't solve the issue.

When starting e.g. application "Pluma" from the classic menu, the panel items "Show Desktop", "Window List" and "Workspace Switcher" are crashing, showing message "... has quit unexpectedly". All the three are part of the wnck-applet, which is part of mate-panel (https://github.com/mate-desktop/mate-panel/tree/v1.26.3/applets/wncklet).

Edit: When starting e.g. "Pluma" by command line in a terminal, the crashing doesn't happen.

When removing the "Window List" (right-click its "handle" that is e.g. three dots, uncheck "Lock To Panel", again right-click the three dots, "Remove from Panel") and starting e.g. "Pluma" from classic menu, no crash occurs anymore.

Right-click panel, "Reset All Panels" to continue debugging.

In terminal checking ps -A | grep -i wnck. When starting "Pluma" and the crashes occur, the "/usr/libexec/wnck-applet" doesn't show up anymore, obviously crashed. After clicking "Reload" on the first of the messages, it shows up again in ps -A | grep -i wnck, thus it restarted.

Open a terminal (if needed click 3x "Reload"), type killall wnck-applet, don't touch the 3 error windows, focus the terminal again, type /usr/libexec/wnck-applet (on ssh prepend "DISPLAY=:0"), now click 3x "Reload" on the display (maybe need to minimize the terminal to reach all error windows). Starting "Pluma" from classing menu. The terminal shows message "Illegal instruction".

Summarizing so far: wnck-applet crashes with message "Illegal instruction" when starting an application from classic menu (does not happen with all applications, only with some). "Window List" seems to cause the issue.

I don't know how to debug this further. Any advice?

@Jakko3
Copy link

Jakko3 commented Nov 2, 2023

The issue was also reported for device asus-tf201 (armv7, SoC Nvidia Tegra 3, display 1280x800).

Could it be a 32-bit vs. 64-bit issue somewhere in the code? That's just a wild guess.

@LongnoseRob
Copy link

on my device (asus-grouper, Tegra 3 SOC) I attached strace to the PID of wnck-applet (strace -p <PID>), the output was as following when starting firefox (119.0, 32-bit), causing the Show Desktop and WIndowlist crashed messages:

poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3, [{iov_base="\24\0\6\0\24\5\0\0\27\1\0\0\6\0\0\0\0\0\0\0\377\377\377\177", iov_len=24}, {iov_base=NULL, iov_len=0}, {iov_base="", iov_len=0}], 3) = 24
poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1 H]\1\0\0\0\6\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3, [{iov_base="\24\0\6\0\24\5\0\0\320\1\0\0\0\0\0\0\0\0\0\0\377\377\377\177", iov_len=24}, {iov_base=NULL, iov_len=0}, {iov_base="", iov_len=0}], 3) = 24
poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1 I]\4\0\0\0\6\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 48
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
sendmsg(5, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\0010\0\0\0\202\1\0\0w\0\0\0\1\1o\0\34\0\0\0/org/a11"..., iov_len=136}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\6(iiii)\0\0\0\0\0\0\0\0\0"..., iov_len=48}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 184
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10353, tv_nsec=99341216}) = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10353, tv_nsec=99641222}) = 0
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb59d4000
munmap(0xb59d4000, 8192)                = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10353, tv_nsec=109737410}) = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10353, tv_nsec=110110417}) = 0
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout)
poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3, [{iov_base="5 \4\0\274\23\200\4\7\0\200\4M\1\30\0\213\4\6\0\275\23\200\4\274\23\200\4%\0\0\0"..., iov_len=1140}, {iov_base=NULL, iov_len=0}, {iov_base="", iov_len=0}], 3) = 1140
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="A\0T]\277\23\200\4\3\0\202\0\16\0\200\4\0\20\7\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10353, tv_nsec=115666520}) = 0
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}], 5, 1503) = 0 (Timeout)
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10354, tv_nsec=620595524}) = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1698891777, tv_nsec=580876537}) = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10354, tv_nsec=621849548}) = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10354, tv_nsec=622561561}) = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10354, tv_nsec=623294574}) = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10354, tv_nsec=624337594}) = 0
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPC, si_addr=0xb6f6c962} ---
+++ killed by SIGILL +++

For me this looks like that the mmap fails..
I think we will nee further instructions for debugging, including a debug-build probably..

@Jakko3
Copy link

Jakko3 commented Nov 2, 2023

I see a very similar issue in Xfce4 that also affects my 32-bit device samsung-serranove only but not the 64-bit devices bq-paella and qemu. Not yet fully sure if this is directly related (will investigate further). Still it might be a hint that something underlying causes a 32-bit issue.

@cwendling
Copy link
Member

With the very little sample you're giving, could seem like an issue on armv7. Not sure if @davidhedlund has such a system though, and he's seeing X Window System error, which probably is a different issue.

Anyway, you could try and run the applet under GDK_SYNCHRONIZE=1 gdb -ex run /usr/libexec/wnck-applet and otherwise following @Jakko3's instructions. (as I unfortunately don't see anything relevant in the @LongnoseRob's strace)

Anyway, not that I have any clue here nor amrv7 hardware on which I could test, but maybe that could help track it down.

@davidhedlund
Copy link
Author

With the very little sample you're giving, could seem like an issue on armv7. Not sure if @davidhedlund has such a system though, and he's seeing X Window System error, which probably is a different issue.

Intel 64-bit here.

@LongnoseRob
Copy link

Tried the propose gdb-session, seem to point to a syscall issue:

$ GDK_SYNCHRONIZE=1 gdb -ex run /usr/libexec/wnck-applet
GNU gdb (GDB) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.

 // GDB introduction cut for brevity

Reading symbols from /usr/libexec/wnck-applet...
(No debugging symbols found in /usr/libexec/wnck-applet)
Starting program: /usr/libexec/wnck-applet
[New LWP 29400]
[New LWP 29401]
[New LWP 29402]
[New LWP 29403]
//tigger the problem
[LWP 29402 exited]
^C
Thread 1 "wnck-applet" received signal SIGINT, Interrupt.
__cp_begin () at src/thread/arm/syscall_cp.s:23
23      src/thread/arm/syscall_cp.s: No such file or directory.
(gdb) bt
#0  __cp_begin () at src/thread/arm/syscall_cp.s:23
#1  0xb6fc72c2 in __syscall_cp_c (nr=-1233186104, u=<optimized out>, v=<optimized out>, w=0, x=-1249751712, y=-1237160064, z=-1233863275)
    at src/thread/pthread_cancel.c:33
#2  0x00000000 in  ()
(gdb)                

@Jakko3
Copy link

Jakko3 commented Nov 2, 2023

Intel 64-bit here.

Oh... then it's either not a 32-bit vs. 64-bit issue – or we have two different issues.

@LongnoseRob In your debug output there is a "^C" and at the end of the next line it says "Interrupt". You may have canceled the process? Edit: Start wnck-applet via gdb in terminal, then make it crash by starting Firefox or Pluma from menu.

My debug output isn't much helpful either. The crash is due to "Illegal instruction". However, looking for the backtrace it says the stack may be corrupted. Web-searching about that says this is not unusual on ARM... I didn't find a proper solution how to do the backtracing.

$ GDK_SYNCHRONIZE=1 gdb -ex run /usr/libexec/wnck-applet
GNU gdb (GDB) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "armv7-alpine-linux-musleabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/libexec/wnck-applet...
(No debugging symbols found in /usr/libexec/wnck-applet)
Starting program: /usr/libexec/wnck-applet 
[New LWP 3972]
[New LWP 3973]
[New LWP 3974]
[New LWP 3975]
[New LWP 3977]
[LWP 3977 exited]
[New LWP 3978]
[New LWP 3979]
[LWP 3978 exited]
[LWP 3979 exited]
[New LWP 3980]
[New LWP 3981]
[LWP 3980 exited]
[LWP 3981 exited]
[New LWP 3983]
[LWP 3974 exited]

Thread 1 "wnck-applet" received signal SIGILL, Illegal instruction.
__stack_chk_fail () at src/env/__stack_chk_fail.c:26
26	src/env/__stack_chk_fail.c: No such file or directory.
(gdb) bt
#0  __stack_chk_fail () at src/env/__stack_chk_fail.c:26
#1  0xb6f5927a in ?? () from /usr/lib/libwnck-3.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

@LongnoseRob
Copy link

@LongnoseRob In your debug output there is a "^C" and at the end of the next line it says "Interrupt". You may have canceled the process? Edit: Start wnck-applet via gdb in terminal, then make it crash by starting Firefox or Pluma from menu.

Yes, that was intentional to get the BT of the first process that exited.

I tried some more digging and installed musl-dbg package, as thesese calls to src/env/__stack_chk_fail.c:26 point to musl,
but this did not provide any more input..

@lukefromdc
Copy link
Member

I am guessing this is ARM only? I do not have any ARM machines bigger than a phone, so no suprise I have never seen this. This shows the value of testing on all supported architectures, as this was found only after release.

Do we have anyone on the team with an ARM laptop? If not, someone developing for ARM would be a valuable addition to the team given such laptops are likely to become more common in the future.

@Jakko3
Copy link

Jakko3 commented Nov 3, 2023

I am guessing this is ARM only?

@LongnoseRob and me are on architecture armv7 or armhf edit: or likely that's armel, not fully sure (32-bit). @davidhedlund however is on architecture x86-64 or amd64 (64-bit), thus not ARM. Although possibly we have two different issues.

@davidhedlund: Could you try the "gdb" debugging steps and report what you get?

Yes, that was intentional to get the BT of the first process that exited.

The steps I did to get the debug output:

  • installed package gdb
  • open a terminal
  • killall wnck-applet -> some error windows show up, ignore them
  • GDK_SYNCHRONIZE=1 gdb -ex run /usr/libexec/wnck-applet
  • on all error windows of before click "Reload", may have to minimize the terminal to reach the windows in the background
  • open application "Pluma" from classic menu (or "Firefox" or "MATE System Monitor") -> in the terminal the thread "wnck-applet" received signal SIGILL, Illegal instruction. @davidhedlund: If this doesn't make your "wnck-applet" crash aka. "Thread 1 wnck-applet received signal" in gdb, you may need to make it crash your way. In the issue title you just wrote "crashes randomly". In your case it will likely not be "SIGILL, Illegal instruction" but something else.
  • type bt to get the backtrace
  • to exit gdb type exit and confirm with "y"

@muktupavels
Copy link
Contributor

If you know how to build from source, you could try libwnck/test-tasklist to see if problem is in libwnck or mate-panel.

@lukefromdc
Copy link
Member

Note that killall wnck-applet is only possible with the applet built out of process, so that's what you have if this works. Has anyone seen this crash with the applet in-process? You would see the entire panel crashing, any reference in the backtrace to libwnck would imply the same or a similar crash.

@Jakko3
Copy link

Jakko3 commented Nov 5, 2023

I chose a different approach to track it down. In postmarketOS v22.12 (based on Alpine Linux 3.17) it works without issues, in v23.06 (based on 3.18) there are issues. So I set up a pmOS v22.12 (3.17) installation, changed the repository URLs to v23.06 (3.18), systematically upgraded the packages and tried to test which one causes the issue.

It turned out that package "startup-notification" introduced the issue when upgrading in Apline Linux from package version 0.12-r4 to 0.12-r5. The difference between the two build releases "r4" and "r5" is a patch to fix 32-bit time.

The patches (actually the one patch in Alpine consists of two patches) are also implemented in upstream "startup-notification" but are not yet part of an official release, they were implemented after the latest release 0.12.

At least within Alpine Linux that patch seems to do the contrary of what it's supposed to do. I created an issue report at Alpine to discuss the further procedure:


Now the next question is of the issue of @davidhedlund is somehow related to this. It's unlikely as it seems to be a 32-bit issue and he's on 64-bit.

Trisquel is based on Ubuntu (based on Debian). Both in Ubuntu and Debian the version of "startup-notification" wasn't changed for a long time. Also they don't have this 32-bit time patch implemented in their build repository, as far as I can see.

So it's quite clear to me that the issue of @davidhedlund is another one. Sorry for having chimed into your issue. You should try to debug your issue with preferably killall wnck-applet and GDK_SYNCHRONIZE=1 gdb -ex run /usr/libexec/wnck-applet or alternatively maybe with ps -A | grep wnck-applet and strace -p <PID> where PID is the number from the command before.

@davidhedlund
Copy link
Author

davidhedlund commented Nov 6, 2023

I chose a different approach to track it down. In postmarketOS v22.12 (based on Alpine Linux 3.17) it works without issues, in v23.06 (based on 3.18) there are issues. So I set up a pmOS v22.12 (3.17) installation, changed the repository URLs to v23.06 (3.18), systematically upgraded the packages and tried to test which one causes the issue.

It turned out that package "startup-notification" introduced the issue when upgrading in Apline Linux from package version 0.12-r4 to 0.12-r5. The difference between the two build releases "r4" and "r5" is a patch to fix 32-bit time.

* https://gitlab.alpinelinux.org/alpine/aports/-/commit/e99c0e1ab1248b67d25763b493b9a9b8413dc74b

The patches (actually the one patch in Alpine consists of two patches) are also implemented in upstream "startup-notification" but are not yet part of an official release, they were implemented after the latest release 0.12.

* https://gitlab.freedesktop.org/xdg/startup-notification/-/commit/a7e49fefde18ea8d5bada8096d32f23bcfb5a6dc

* https://gitlab.freedesktop.org/xdg/startup-notification/-/commit/ea9f7e4cc6fd8c08d175ed7774ed2c5bd11c8ef0

At least within Alpine Linux that patch seems to do the contrary of what it's supposed to do. I created an issue report at Alpine to discuss the further procedure:

* https://gitlab.alpinelinux.org/alpine/aports/-/issues/15441

Now the next question is of the issue of @davidhedlund is somehow related to this. It's unlikely as it seems to be a 32-bit issue and he's on 64-bit.

Trisquel is based on Ubuntu (based on Debian). Both in Ubuntu and Debian the version of "startup-notification" wasn't changed for a long time. Also they don't have this 32-bit time patch implemented in their build repository, as far as I can see.

* Ubuntu "startup-notification" version: https://packages.ubuntu.com/search?keywords=startup-notification&searchon=names&suite=all&section=all

* Ubuntu "startup-notification" build: https://git.launchpad.net/ubuntu/+source/startup-notification/tree/?h=import/0.12-6build2

* Debian "startup-notification" version: https://tracker.debian.org/pkg/startup-notification

* Debian "startup-notification" build: https://salsa.debian.org/gnome-team/startup-notification/-/tree/debian/0.12-6/debian

So it's quite clear to me that the issue of @davidhedlund is another one. Sorry for having chimed into your issue. You should try to debug your issue with preferably killall wnck-applet and GDK_SYNCHRONIZE=1 gdb -ex run /usr/libexec/wnck-applet or alternatively maybe with ps -A | grep wnck-applet and strace -p <PID> where PID is the number from the command before.

I clicked on "Don't Reload"

Screenshot_trisquel_11 0_amd64 iso_2023-11-06_04:52:16

Screenshot_trisquel_11 0_amd64 iso_2023-11-06_04:53:05

@Jakko3
Copy link

Jakko3 commented Nov 7, 2023

Oh, my bad. The path in Ubuntu is different: /usr/lib/mate-panel/wnck-applet

The order of the first steps are:

  • open terminal
  • killall wnck-applet
  • don't touch the error windows yet, instead go back to the terminal
  • GDK_SYNCHRONIZE=1 gdb -ex run /usr/lib/mate-panel/wnck-applet
  • now click "Reload" on the error windows (if you have more than one, click "Reload" on all of them)

In the terminal gdb should now be running without (gdb) prompt.

Try to reproduce the crash that you reported. Or keep on working until the crash happens.

Once wnck-applet crashed, in the terminal there should be some text saying Thread 1 "wnck-applet" received signal [...]. And there is now a (gdb) prompt. Type bt for a backtrace. Exit gdb by tying quit. Copy the text incl. the Thread 1 [...] line and paste it here.

If it doesn't work or you have question, feel free to ask.

@cwendling
Copy link
Member

@Jakko3 well, done! So now we know what you see is actually an issue in Alpine's libstartup-notifications, not anything else (yet) as this is more than an ABI break a rebuild would solve. In practice, it's (more than likely) crashing libwnck in its sn_startup_sequence_get_last_active_time() call, because it passes long pointers -- which used to be right before Alpine's patch.

The issue here is that the Alpine patch not only breaks ABI (somewhat), but breaks API as well in practice: no rebuild is gonna make the passed in pointers point to wider memory areas. Note that the rationale on the linked email is confusing because it quotes a piece of libstartup-notification code that is not vanilla, and ends up referencing itself so I don't exactly know why they patched this in OpenBSD in the first place… before the patch, callers passing long should have been fine, as the truncation would have happened in sn_startup_sequence_get_last_active_time() itself when e.g. setting *tv_sec, which is not perfect but safe (I think?).

Anyway, if they are going with this change, they also need to patch libwnck to use that new API/ABI:

diff --git a/libwnck/tasklist.c b/libwnck/tasklist.c
index ccc7427..2e79a36 100644
--- a/libwnck/tasklist.c
+++ b/libwnck/tasklist.c
@@ -4973,7 +4973,8 @@ sequence_timeout_callback (void *user_data)
   WnckTasklist *tasklist = user_data;
   GList *tmp;
   gint64 now;
-  long tv_sec, tv_usec;
+  time_t tv_sec;
+  suseconds_t tv_usec;
   double elapsed;
 
   now = g_get_real_time ();

Note that this is not something upstream libwnck can easily fix, because of the very hard to detect API/ABI break. I guess they could find a very convoluted way to check this, but given there's no release for this since 9 years, I'm not entirely sure they will do… although it might get problematic in 2038.

Anyway, note that xfwm4 also still uses the libsn 0.12 API (e.g. passes pointers to long), so that's gonna be a problem as well if they don't patch it in Alpine. Wanna laugh? xfwm4 actually reverted to passing long because it caused crashes 🥇 It got introduced in https://gitlab.xfce.org/xfce/xfwm4/-/commit/8448703965b25c1ee97410aa463d789a43a6c771.

Anyway, that's not mate-panel's bug; and I'd say not libwnck one either as there is no released version of the library causing the problem, and checking for this is terribly tricky. Also, it affects multiple (if not all) users of the API, so a reasonable new release of libstartup-notification would need to be incompatible -- or maybe better, add some API to use the 2038-safe values.

@cwendling
Copy link
Member

BTW, it's not the first time that libstartup-notifications unexpectedly broke ABI :)

@davidhedlund
Copy link
Author

davidhedlund commented Nov 8, 2023

The order of the first steps are:

* open terminal

* `killall wnck-applet`

* don't touch the error windows yet, instead go back to the terminal

* `GDK_SYNCHRONIZE=1 gdb -ex run /usr/lib/mate-panel/wnck-applet`

* now click "Reload" on the error windows (if you have more than one, click "Reload" on all of them)

I had to click reload twice, as you suggested. Also, there's no gdb prompt:

Screenshot_trisquel_11 0_amd64 iso_2023-11-08_14:13:42

x@x-Standard-PC-i440FX-PIIX-1996:~$ killall wnck-applet
x@x-Standard-PC-i440FX-PIIX-1996:~$ GDK_SYNCHRONIZE=1 gdb -ex run /usr/lib/mate-panel/wnck-applet
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/mate-panel/wnck-applet...
(No debugging symbols found in /usr/lib/mate-panel/wnck-applet)
Starting program: /usr/lib/mate-panel/wnck-applet 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff55da640 (LWP 1738)]
[New Thread 0x7ffff4dd9640 (LWP 1739)]
[New Thread 0x7fffeffff640 (LWP 1740)]
[New Thread 0x7fffecf4c640 (LWP 1747)]
[Thread 0x7fffecf4c640 (LWP 1747) exited]
[New Thread 0x7fffecf4c640 (LWP 1748)]
[New Thread 0x7fffe3fff640 (LWP 1749)]
[Thread 0x7fffecf4c640 (LWP 1748) exited]
[New Thread 0x7fffecf4c640 (LWP 1750)]
[New Thread 0x7fffe37fe640 (LWP 1751)]
[Thread 0x7fffe3fff640 (LWP 1749) exited]
[Thread 0x7fffecf4c640 (LWP 1750) exited]
[Thread 0x7fffe37fe640 (LWP 1751) exited]
[New Thread 0x7fffe37fe640 (LWP 1752)]
[Thread 0x7fffeffff640 (LWP 1740) exited]

@cwendling
Copy link
Member

cwendling commented Nov 8, 2023

@davidhedlund looks good, now you need the applet to crash 🙂, and then output a backtrace (thread apply all bt full on the GDB prompt).

@davidhedlund
Copy link
Author

In the terminal gdb should now be running without (gdb) prompt.

Try to reproduce the crash that you reported. Or keep on working until the crash happens.

Once wnck-applet crashed, in the terminal there should be some text saying Thread 1 "wnck-applet" received signal [...]. And there is now a (gdb) prompt. Type bt for a backtrace. Exit gdb by tying quit. Copy the text incl. the Thread 1 [...] line and paste it here.

If it doesn't work or you have question, feel free to ask.

I'm lost here. Should the gdb prompt be triggered or manually opened? @cwendling

@Jakko3
Copy link

Jakko3 commented Nov 8, 2023

Your picture is correct. This is what I described as In the terminal gdb should now be running without (gdb) prompt.

The next step is what I descirbed as Try to reproduce the crash that you reported. Or keep on working until the crash happens. Or @cwendling described it as now you need the applet to crash. By this we mean: You created this issue here titled "wnck-applet crashes randomly since update to 1.26.2". We now need one of those crashes. If you have an idea how to "provoke" that issue, do it. Otherwise, if the issue seems to happen completely random, then use your computer as you usually do until the random crash of wnck-applet happens.

After that crash happened, return to the terminal window and do the next steps as described by me (and slightly extended by @cwendling).

Let us know if you're not sure or it's not clear. That's no problem.

@Jakko3
Copy link

Jakko3 commented Nov 8, 2023

Should the gdb prompt be triggered or manually opened?

When wnck-applet crashes, the gdb prompt will be triggered by the crash.

@davidhedlund
Copy link
Author

Should the gdb prompt be triggered or manually opened?

When wnck-applet crashes, the gdb prompt will be triggered by the crash.

If you have virt-manager, you could install Trisquel and try to reproduce it if you want, since you have better knowledge how to solve this issue.

Firefox is not available in the Trisquel repositories due to trademark issues. But Pluma is installed by default.

@davidhedlund
Copy link
Author

davidhedlund commented Nov 9, 2023

When starting e.g. application "Pluma" from the classic menu, the panel items "Show Desktop", "Window List" and "Workspace Switcher" are crashing, showing message "... has quit unexpectedly". All the three are part of the wnck-applet, which is part of mate-panel (https://github.com/mate-desktop/mate-panel/tree/v1.26.3/applets/wncklet).

Nothing crashes (with or without killall wnck-applet && GDK_SYNCHRONIZE=1 gdb -ex run /usr/lib/mate-panel/wnck-applet executed first) when I open Pluma from the classic menu in Trisquel.

@Jakko3
Copy link

Jakko3 commented Nov 9, 2023

when I open Pluma from the classic menu in Trisquel

The crashes by starting "Pluma" (or other applications) from classic menu was the issue I ran into. We could find the cause and solve it.

Now we need to find the cause of your issue that you reported. It's a different issue and not related to mine.

I suggest that you set up a terminal like in your last picture – and then use your PC as you usually do. At some point the issue you reported would happen.

If your issue doesn't happen at all anymore, you might close this issue report.

If you have virt-manager, you could install Trisquel and try to reproduce it if you want

I downloaded Trisquel 11 and booted it by live USB on my amd64 PC. I tried different things but did not experience any crashes of wnck-applet so far.

@davidhedlund
Copy link
Author

davidhedlund commented Nov 10, 2023

If your issue doesn't happen at all anymore, you might close this issue report.

I experienced this issue two times during the same time (perhaps a month) when I reported it. I have never experienced it ever since, but I have not closed the issue because someone else might be able to reproduce it.

I downloaded Trisquel 11 and booted it by live USB on my amd64 PC. I tried different things but did not experience any crashes of wnck-applet so far.

Thank you very much for your efforts and support!

@davidhedlund
Copy link
Author

davidhedlund commented Nov 11, 2023

By coincidence, the wnck-applet crashed again for the first time in 6 months since I opened this issue, Trisquel while I was working with it as normal:

image

So I don't think this issue should be closed.

Same package versions as when I submitted this issue:

$ apt show libwnck-3-0
Package: libwnck-3-0
Version: 40.1-1
$ apt show mate-panel
Package: mate-panel
Version: 1.26.2-1+11.0trisquel10

@raveit65
Copy link
Member

You are using an old mate-panel release.
Ubuntu should update to mate-panel-1.26.3 https://github.com/mate-desktop/mate-panel/releases/tag/v1.26.3
Which has fixes for wnck-applets.

@Jakko3
Copy link

Jakko3 commented Nov 18, 2023

Trisquel 11 is based on Ubuntu 22.04 LTS (Jammy Jellyfish). This one is on mate-panel 1.26.2.

Nonetheless a good hint that mate-panel 1.26.3 might solve the issue of @davidhedlund.

@Jakko3
Copy link

Jakko3 commented Nov 18, 2023

I just noticed that the version is 1.26.2-1 +11.0trisquel10. Seems to be a Trisquel build then. I don't know how/where they build their packages, couldn't find it yet on a quick look at https://gitlab.trisquel.org/trisquel.

@Ark74
Copy link

Ark74 commented Nov 20, 2023

Trisquel takes the latest upstream (ubuntu) source and rebuild using small bash scripts called "helpers" for mate-panel here is the used for the 22.04 jammy based release.

Where,

  • nabia is codename for focal's upstream
  • aramo is codename for jammy's upstream

I wonder what could break if we add 1.26.3 patches to the 1.26.2-1 release on this single package we have from upstream.

Update: ...or better yet only apply the commit(s) fixing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants