-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
thread 'tokio-runtime-worker' panicked at 'assertion failed: !(self.ptype == ProgressType::Task)' #4284
Comments
Do you have a coredump from this? Try using |
I think there's some issue with how rpm-ostree is built because there's nothing in the stack trace:
I also tried reproducing with |
|
Random (maybe helpful) report: Error happens on fedora silverblue 38 but not on 37 |
It's been happening to me randomly like ⅓ of the time, I think this is just a race condition of some kind. Still happens on this version btw:
|
Aiming to help debug coreos#4284
Sorry about the delay on this. I put up #4348 which should help us debug. Once that PR lands a build should show up in https://copr.fedorainfracloud.org/coprs/g/CoreOS/continuous/ Then should get us a bit more debugging information. Also this is a client-server setup, so we may also need to do e.g.
Then paste in there:
And |
Aiming to help debug coreos#4284
Ok @YaLTeR the debug patch is available in https://copr.fedorainfracloud.org/coprs/g/CoreOS/continuous/ - let me know if you need help trying out enabling that build alongside injecting the debug configuration above! |
Also cc @istvan-derda since you hit this too |
Got a new one with the debug logging:
|
OK, sadly I did the wrong thing with the debug logging...just put up #4353 which should really help this time. What's going wrong is we're trying to output a new progress bar when an old one is still active, and we need to know exactly which are old and new. |
Would it be possible to update the COPR with RPMs including that PR? Or just send the RPMs here for F38 |
Btw, for me the crash reproduces basically every time if I update with rpm-ostree update and at the same time do flatpak update -y. |
I'm getting this with a brand-new F38 Silverblue install FWIW |
Have been experiencing this intermittently with F38 also. Sometimes rerunning For me this is happening at this point:
|
I built an rpm from main, will try to reproduce again. |
|
And a second one for the bulk (
|
Was having the same error
Just did a |
Piling on here, randomly on overlay of RPM fusion mesa-vdpau-drivers-freeworld: coredumpctl info:
|
I also have this time to time no idea where it come, at first i've thinked it would come from some source blocked, but i never have checked more since i've seen some other had also the problem. I say i have it to see any potential fix, and let know that one more user have the problem. |
FWIW, this happens on every one of my systems for the past few months; I thought it would be fixed quickly (as these things usually are), and I've tried all the workarounds listed here (clearing cache and such), but it still happens. I have to try to upgrade 2 to 5 times before it finishes with "no changes" (even though there are changes according to I do have overrides on my machines, and I do have third party repos enabled (OBS for darktable on my personal desktop and a copr for a patched mutter on both personal and work) — but this was even happening on my work machine without any additional repos (before I added the patched triple-buffered mutter). However, I do have a local package on my work laptop (for VPN certificates) and layered Fedora packages on both. |
I've asked my coworkers who also use Silveblue and they haven't seen the issue. They don't layer packages, however. So my guess is that it's really related to layering packages. |
The worst it's this bug wasn't there on 37 |
@garrett Can you try grabbing some of the debug info using It'd be particularly helpful to use rpm-ostree-2023.3 as a base for updating. I've tried briefly again today to reproduce this, and took another tour through the code trying to retrofit these symptoms to code paths; but no luck so far. If we don't succeed soon I'll try just spamming debug log prints around the relevant code paths I can think of and have someone who is hitting this update to that. |
This is actually quite important for general "understanding of the system" purposes. We can close this FIXME without needing to print the transaction type. Motivated specifically by a theory that this may relate to coreos#4284 if somehow the client gets confused as to which transaction it's monitoring.
@cgwalters: Sure! rpm-ostree:
Version: '2023.3'
Git: 8ab6f143a0ecad8b125b47dee8bbeb2e99f1b215
Features:
- rust
- compose
- container
- fedora-integration On my work laptop, before I realized I should modify the service too: On my personal workstation, where I did modify the service before: rpm-ostree--debug--personal.txt The second of these is probably more useful, as it should have more debugging info: |
In
It's really strange to me that we're seeing the "Enabled rpm-md repositories" message at that point...that seems like a leftover bit that should have been seen earlier. I'm starting to build up a theory that this is somehow related to us iterating a new main context in the commit path, but we have leftover queued idle work on the mainloop or so? But not yet clear to me what that is... @garrett Can you also attach the daemon-side logs from that run? i.e. |
This is actually quite important for general "understanding of the system" purposes. Motivated specifically by a theory that this may relate to coreos#4284 if somehow the client gets confused as to which transaction it's monitoring.
- Allocate a unique serial per Progress instance - Output the serial alongside the text, which is really a unique string format that will tell us where in the code it's being called - Add `g_debug()` output with both in the constructors, and just the serial in other methods This should help us figure out on the daemon side which specific code is trying to do a percent progress while we have a plain task. cc coreos#4284
This is actually quite important for general "understanding of the system" purposes. Motivated specifically by a theory that this may relate to coreos#4284 if somehow the client gets confused as to which transaction it's monitoring.
@cgwalters: Apologies for the delay! Here are the logs from ostreed on May 09 from my personal machine: I think the relevant entries should be around 11:37 - 11:38. |
All of the repeated "loading fedora" messages looked very odd to me, and digging in this is coming from the "sysroot changed' signal. We don't have even debug logging for invocation of these messages (will add) but...that gave me a clue. I think I'm starting to understand this bug. Basically what I think is happening right now is that there are two processes talking to the daemon (gnome-software and /usr/bin/rpm-ostree). While the human has invoked e.g. But basically at some point, one of these other methods ends up calling something in the core that tries to refresh rpm-md, and that in turn will try to invoke the progress APIs, and that will cause this crash. First I'm going to add a lot more useful debug logging to this path to verify. I think the best fix here is to change our output infrastructure to only route through the dbus txn progress for the transaction thread. |
Ahhh yes I think the core bug here is everything using |
Yep, this is really easy to reproduce now that I understand the bug; having layered packages makes it much much more likely (but I think it may be reproducible without). Basically in one shell: So yes, will look at reworking the transaction progress stuff to be thread-local. |
This is needed because for transactions (which are always run in a thread today) we want output to go to the transaction's DBus progress. But for other methods which are not transactions, we don't have a channel for status reporting, so output needs to continue to go to the journal. If we mix these two things due to concurrent method invocations, the client may get confused and crash. Closes: coreos#4284
OK, got a PR up in #4405 |
I also hit this several times today in the office, but when I came home 'update' worked first time. |
Can one or more people try out |
This is actually quite important for general "understanding of the system" purposes. Motivated specifically by a theory that this may relate to #4284 if somehow the client gets confused as to which transaction it's monitoring.
Well, it's just a single sample, but I just did rpm-ostree update together with flatpak update -y, which usually triggered the issue, and it went fine. |
Particularly to get the fix for coreos#4284 out.
Particularly to get the fix for #4284 out.
This is now up as https://bodhi.fedoraproject.org/updates/FEDORA-2023-ce6c9dcd69 |
- Allocate a unique serial per Progress instance - Output the serial alongside the text, which is really a unique string format that will tell us where in the code it's being called - Add `g_debug()` output with both in the constructors, and just the serial in other methods This should help us figure out on the daemon side which specific code is trying to do a percent progress while we have a plain task. cc coreos#4284
This is needed because for transactions (which are always run in a thread today) we want output to go to the transaction's DBus progress. But for other methods which are not transactions, we don't have a channel for status reporting, so output needs to continue to go to the journal. If we mix these two things due to concurrent method invocations, the client may get confused and crash. Closes: coreos#4284
This is actually quite important for general "understanding of the system" purposes. Motivated specifically by a theory that this may relate to coreos#4284 if somehow the client gets confused as to which transaction it's monitoring.
Particularly to get the fix for coreos#4284 out.
Host system details
Expected vs actual behavior
Expected:
No panic.
Steps to reproduce it
Not reproducible.
Would you like to work on the issue?
Should be assigned to someone else.
The text was updated successfully, but these errors were encountered: