-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU usage for updating notification timestamps #1102
Comments
What do the logs say? I believe it should be spamming some messages, because it's in a loop. Also, the dunstrc you posted is not minimal. Try reproducing it with only minimal setting changes |
@fwsmit I've since tried it without my Also how do I log? I tried |
Try verbosity debug |
I've got the same problem (high CPU usage when using show_age_threshold > 0). It seems that draw() just gets increasingly spammed as the number of pending notifications increase. For now I'm forcing the sleep interval to be 1s (1000000) here: Line 95 in 4910d5e
The source of the problem seems to be the fancy sleep calculations here: Lines 571 to 579 in 4910d5e
With the change mentioned above, the draw loop seems to get called at most once a second, which works for me. The downside is that it seems to fire even when no redrawing is needed, but since a redraw is only for the notifications on the screen, this uses very little CPU (shows 0% on my PC). |
Is it always reproducible? |
It is. I started noticing this problem when there are over about 100 notifications waiting to be read, and it always results in very high CPU usage. I've been running today with the "fix" (not a fix) above, and it helps a lot, got at most 5% CPU usage, but that is still not great, on the debug logs we can see that it starts to run draw() more than once a second (thus the 5% instead of 0% CPU use). I didn't look into how these calls are being reach, it it some event loop or are there threads being spawn? I'm guessing the later because I haven't seen CPU use over 100%. About reproducing it, just fire up dunst with show_age_threshold = 60 in the config, send a couple hundred notifications (like with notify-send), and watch the debug logs, it flows very fast once the age starts getting rendered (after the initial 60s). |
I have also been noticing this issue for some time now, but never cared to dig into it. I often leave my computer locked but unsuspended when I need eg. to access it remotely, and the fans are close to always at full speed and CPU at 100% when I come back and have some notifications pending. In my case it doesn't require hundreds of notifications; the email and text messages received overnight (or even over lunch) are usually enough to trigger the issue. I can further investigate if needed. Edit: Actually, for i in $(seq 1 200); do notify-send "Test $i" "Test :)"; done Edit 2: @jdrowell is correct, the problem indeed comes from the sleep timeout in I expect it would be fine enough to wake up only once a second to update the ages of all the notifications? |
I still get 40-50% load upon disabling Edit: reverting to 1.9.0 for fixing disabling |
I can confirm that I still get unexpected system load. The reason is not clear to me to this point. I'll add some targeted debug logging on my local build and leave it running until I encounter the bug again. |
Version 1.9.1 has actually started to cause high CPU usage for me when receiving notifications at idle ( Notifications start flickering rapidly and my compositor (picom) also increases in CPU usage when this issue happens. My show age of message setting is |
Can this issue be re-opened? There clearly still is a bug. I can reproduce. I believe that the problem, again, is that after some time, the timer will somehow trigger every ms. I'm investigating the cause. |
Experiencing the same issue when upgrading to v1.9.1: idle notifications get my CPU burning 🔥. I have reverted to v1.9.0 in the meantime, but I'll try to find out why that's the case. |
Reopened it |
I found the issue, but I'll need guidance (@fwsmit ?) as to how to best handle this. In these lines of get_next_datachange, the next datachange is set to the next coming notification expiracy. Yet, if dunst is idle ( While writing this, I assumed that Where is the best place to handle this? |
Hmm, it seems there is no property that updates when a notification not displayed. However, when it's not displayed it's in the
Where is this implemented? |
I've got this problem too. Often come back to my screenlocked laptop after an hour to find the fans blasting, my regular 2 apps open, sometimes even 0 notifications, and yet dunst is using 30%+ of my CPU |
So it seems the problem has gotten worse since 1.9.1. I currently do not have the time to investigate, but it would be good to have a debug log of dunst at the time of the bug happening. |
I do not have the real log, but some custom logging gave me this:
Some more investigation pointed that the super-short sleep happens because of the lines I pointed to earlier.
I actually just guessed from the behaviour. I'll try to find some time to fix this asap; so far I do not understand why it got worst -- from a very quick scan, it looks to me that the old code should have this problem also. |
I spotted one thing that's different:
new
old Now we are using |
Oh, well. I think I digged into a rabbitloophole. After reading this bug and the PR. What is the initial problem? I cannot reproduce it with anything before #1140, but if I introduce the changes from #1140 into my tree, this gives 100% reproducibility. While being idle, after That after the timeout of the notification is hit, dunst goes brrrr. What was the initial reason, why dunst has got problems? Is it just, that if you have too many notifications displayed? If so.
|
The hack I used 2 months ago (#1102 (comment)) for this is still mitigating the issue for me, don't have any load problems anymore. But it is a hack, so looking forward to someone properly fixing this. Also, after a while (5min?) the 1 second precision is really not necessary, so maybe only update the notifications that are actually on screen, and update less frequently as they get older. |
Submitted as PR #1158. It seems to work indeed, though I tested only hastily ; all tests are passing, though I had to fix one (see PR), which was a problem in the test and not a problem in the code. The one thing I missed all this time that makes it relevant is that |
Thanks for posting a PR. I'll give the others a little bit of time to test it. If it doesn't fix the issue, then I agree with bebehei to revert the patch.
The problem is that the queues implementation used to take each notification's "turn of second" into account. With 100 notifications you would update 100 times a second to update the age text for each notification.
I think it can still return -1 when there are no notifications. |
The change makes sense to me. struct notification *n = iter->data;
gint64 timeout_ts = n->timestamp + n->timeout;
if (n->timeout > 0 && n->locked == 0) {
if (timeout_ts > time)
wakeup_time = MIN(wakeup_time, timeout_ts);
else
// while we're processing or while locked, the notification already timed out
return time;
}
|
Do you mean the change in #1158?
Yes, that does seem to make sense. |
Yes I meant the change in #1158. |
Thanks for clarifying. It would be nice if a few people can try out #1158. I will try to release a new version fixing the issue on tuesday. |
I did try out, but I high load remain when show_age_threshold = -1 |
I'm also experiencing this, version 1.9.1 🥺 High CPU noticed after coming back to my laptop after a night, only one notification was enough to trigger it, though the notification stayed alive for hours. |
As per the comment in the dunst config file the In Line 589 in 464076d
Now in the Lines 95 to 115 in 464076d
The solution can be as simple as bailing if that is the case: if (active) {
gint64 timeout_at = queues_get_next_datachange(now);
+ if (timeout_at == -1)
+ return G_SOURCE_REMOVE;
// Previous computations may have taken time, update `now`
// This might mean that `timeout_at` is now before `now`, so
// we have to make sure that `sleep` is still positive.
now = time_monotonic_now();
gint64 sleep = timeout_at - now;
sleep = MAX(sleep, 1000); // Sleep at least 1ms
LOG_D("Sleeping for %li ms", sleep/1000);
if (sleep >= 0) {
if (reason == 0 || next_timeout < now || timeout_at < next_timeout) {
if (next_timeout != 0) {
g_source_remove(next_timeout_id);
}
next_timeout_id = g_timeout_add(sleep/1000, run, NULL);
next_timeout = timeout_at;
}
}
} (there are other ways to refactor the code to the same effect of course) Example test scenario is to:
Expected behaviour without the change above:
Expected behaviour with the change above:
|
@bakkeby thank you for taking a look. Your fix seems to work indeed. But I believe the conversation is being muddied up by discussing 2 different issues here. There is the issue of high CPU usage when |
show_age_threshold > 0
As for this issue, I'm unable to reproduce it with |
Could anyone see if #1158 fixes this issue? As I said above, I cannot reproduce the issue, so I cannot test the fix |
I have managed to simulate (reproduce) the bug by forcing an idle state static bool queues_notification_is_finished(struct notification *n, struct dunst_status status, gint64 time)
{
assert(n);
if (n->skip_display && !n->redisplayed)
return true;
if (n->timeout == 0) // sticky
return false;
- bool is_idle = status.fullscreen ? false : status.idle;
+ bool is_idle = true;
/* don't timeout when user is idle */
if (is_idle && !n->transient) {
n->start = time_monotonic_now();
return false;
}
...
} and setting the timestamp value to a high number to simulate time elapsed gint64 queues_get_next_datachange(gint64 time)
{
gint64 wakeup_time = G_MAXINT64;
gint64 next_second = time + S2US(1) - (time % S2US(1));
for (GList *iter = g_queue_peek_head_link(displayed); iter;
iter = iter->next) {
struct notification *n = iter->data;
+ n->timestamp = 0xFFFFFFFF;
gint64 timeout_ts = n->timestamp + n->timeout;
if (n->timeout > 0 && n->locked == 0) {
if (timeout_ts > time)
wakeup_time = MIN(wakeup_time, timeout_ts); With that code, a simple notification is enough to trigger CPU hogging (the compositor [picom in my case] and dunst compete for cpu cycles). Note: it's important that the specific timeout value for the notification causing CPU hogging to be non-zero. If the timeout related to the severity of the notification is zero, CPU hogging doesn't appear to happen. The exhibiting behaviour is the same as the one described in this issue and the one I seemed to have experienced. I have tested #1158 and it appears to fix the issue. |
I had a closer look and I believe that I have an explanation for what is going on and why this change is the right fix for this issue. struct notification *n = iter->data;
- gint64 timeout_ts = n->timestamp + n->timeout;
+ gint64 timeout_ts = n->start + n->timeout;
if (n->timeout > 0 && n->locked == 0) {
In the dunst configuration you can specify how long a notification is to be shown before it is automatically removed. I would suggest setting the timeout to 5 seconds to make reproducing this issue quicker (I think this is set to 60 seconds by default).
If the user receives more than a certain number of notifications then they will be queued to be displayed later and at the bottom of the list of notifications shown there will be a piece of text showing how many more notifications are pending e.g. Now let's say that we create a lot of notifications using this for loop as mentioned earlier in this issue. for i in $(seq 1 250); do notify-send "Test $i" "Test :)"; done You can monitor dunst CPU usage using top -p $(pidof dunst) The CPU usage will be low. At some point the timeout for the currently displayed notifications will trigger and the notifications will be removed. At this point the 19 (in my case) next notifications from the queue are displayed. These notifications will again be subject to a delay of 5 seconds before these are removed. In gint64 timeout_ts = n->timestamp + n->timeout; Given that the notifications were received roughly at the same time and these new notifications had to wait 5 seconds before they could get their turn the timeout for this second batch of notifications have already expired. This means that we always end up hitting the if (n->timeout > 0 && n->locked == 0) {
if (timeout_ts > time)
wakeup_time = MIN(wakeup_time, timeout_ts);
else
// while we're processing or while locked, the notification already timed out
return time;
} In the calling By calculating the timeout based on when the notification was first displayed the notifications are removed and replaced by new ones from the queue without the elevated CPU usage. gint64 timeout_ts = n->start + n->timeout; Ultimately what this boils down to is that this issue does not have anything to do with the |
Thanks for investigating and making a clear explanation. This doesn't explain the original report, but it does explain the regression in 1.9.1, so it might be a separate issue from the original report again 😆 |
@fwsmit the original report is in the context of the To reproduce set normal timeout to 5 seconds.
Set idle threshold to 5 seconds so that you don't have to wait for too long.
Then start notifications with a 2 second pause between each notification. for i in $(seq 1 19); do notify-send "Test $i" "Test :)"; sleep 2; done Once the 5 second threshold is passed for the first notification it will run into the same scenario of calculating an already expired timeout, but the notification will not be removed because the user is idle. The CPU usage will rise at this point. The same fix addresses this issue as well because the /**
* Check if a notification has timed out
*
* @param n the notification to check
* @param status the current status of dunst
* @param time the current time
* @retval true: the notification is timed out
* @retval false: otherwise
*/
static bool queues_notification_is_finished(struct notification *n, struct dunst_status status, gint64 time)
{
...
/* don't timeout when user is idle */
if (is_idle && !n->transient) {
n->start = time_monotonic_now();
return false;
}
... |
I think there is an argument for changing the if (settings.show_age_threshold >= 0) {
gint64 age = time - n->timestamp;
if (age > settings.show_age_threshold - S2US(1)) {
/* Notification age should be updated -- sleep
* until the next turn of second.
* This ensures that all notifications' ages
* will change at once, and that at most one
* update will occur each second for this
* purpose. */
wakeup_time = MIN(wakeup_time, next_second);
}
else
wakeup_time = MIN(wakeup_time, n->timestamp + settings.show_age_threshold);
} I'll see if I can reproduce an issue with that when idle_threshold is exceeded. |
show_age_threshold > 0
I have it clear now. This issue was originally about lots of notifications idling and then updating their timestamps individually. Then it was fixed with #1140, but this introduced a regression that got confused for this issue. A separate issue (#1163) was also reported here and fixed afterwards.
|
Just to follow I was not able to reproduce any issues specifically with regards to the The use of The way it is implemented now looks correct to me as if you have notifications pending in the queue that are older than the age threshold then those will start showing the age of the notification as soon as they are shown - rather than having to wait another 60 seconds or so before showing that the notification is more than three minutes old. |
Issue description
With
idle_threshold
set, old notifications (ones that say "X min old") increase the CPU usage:I believe it's related to #169
Installation info
1.9.0 (2022-06-27)
system package
swaywm
Minimal dunstrc
The text was updated successfully, but these errors were encountered: