cupsd sometimes enters a infinite loop when timeout variable gets lower than current_time #604

zdohnal · 2023-01-30T10:21:32Z

Describe the bug
cupsd is in infinite loop, causing doing a certain operation (expiring subscriptions/deleting temp printers/removing job files) every second until the service is restarted. The trigger of the issue is unknown, but it happens when a timeout variable (f.e. expire_time or local_timeout) is lower than current_time, causing cupsd to run an operation every second.

To Reproduce
Steps to reproduce the behavior:
Unknown

Expected behavior
No infinite loop after a certain event

System Information:

seen in Fedora 36/37, NixOS
cupsd
CUPS version 2.4.2

Additional context
The additional data here #578 , however the complete error_log is difficult to provide, since we don't know what triggers the issue and once the issue finally shows itself, a ton of logs are provided and it is difficult to track what triggered it. I'll keep my eyes on this.

The text was updated successfully, but these errors were encountered:

zdohnal · 2023-01-30T10:28:17Z

I've lowered the priority, since it is not hit every time you run cupsd and there is a workaround.

clefru · 2023-02-04T14:01:35Z

TL:DR; Old job files are not cleaned up, keeping the scheduler awake with unsuccessful cleanup attempts.

I traced this a bit deeper with gdb and found the following:

First, select_timeout always returns 1. That is because, the JobHistoryUpdate variable is smaller than timeout, so timeout=JobHistoryUpdate gets executed in the following block:

  if (JobHistoryUpdate && timeout > JobHistoryUpdate)
  {
    timeout = JobHistoryUpdate;
    why     = "update job history";
  }

Note: JobHistoryUpdate is stuck and will never get updated, but more on that later. At the end of the function, the value of timeout is much smaller than now and the following clipping logic kicks in:

  timeout = timeout - now + 1;

  if (timeout < 1)
    timeout = 1;

fixing timeout to 1.

Then in the main scheduler loop, we find another check for JobHistoryUpdate and we call out to cupsdCleanJobs. We do so every second. This function is the real culprit not "Expiring subscriptions."

    if (JobHistoryUpdate && current_time >= JobHistoryUpdate)
      cupsdCleanJobs();

Let's talk about my suspected culprit. cupsdCleanJobs iterates over all jobs and recomputes JobHistoryUpdate to the history_time or file_time of the oldest job. It subsequently seems to contain logic to clean up old jobs. Except, this logic never triggers for me.

I've set a breakpoint on cupsdCleanJobs (to see the function entry, which triggers nicely) and the cleanup functions cupsdDeleteJob and remove_job_files. Neither of them is ever called. The following conditional guards the cleanup logic:

    if (job->state_value >= IPP_JOB_CANCELED && !job->printer)

The conjunction with !job->printer looks odd. Is this supposed to mean "when the job has no printer associated anymore, e.g. the job's printer has been deleted"? Because that's rarely going to evaluate to true as most old jobs will still have a printer associated with it. (I rarely delete my printers...)

Browsing around in my web UI shows that most of my completed jobs still have printers "associated" with them. At least there is an href pointing to the printer. My oldest job is from 2020 and I did delete a printer since then, and because jobs from those deleted printers don't exist anymore, I assume that only for those jobs, the cleanup logic worked.

Conclusion: It feels like dropping && !job->printer from the conditional is the appropriate fix. I don't see why an old job should be exempt from cleanup, just because the printer, it was printed to, still exists.

michaelrsweet · 2023-02-06T15:43:55Z

@clefru The !job->printer means "not active on a printer". The dest member provides the name of the destination, but the printer member provides a pointer to the printer that is assigned the job (think jobs sent to classes).

yurkobb · 2023-04-10T08:19:36Z

I've lowered the priority, since it is not hit every time you run cupsd and there is a workaround.

What is the workaround? (sorry for asking).

clefru · 2023-04-29T10:38:31Z

Without making cupsdCleanJobs() actually clean old jobs, CUPS will continue to wake up every second to "clean jobs", but not actually do anything, only to repeat the same process a second later. That's a waste of CPU leading to battery drains on laptops by unproductive wakeups.

Has anyone looked into cupsCleanJobs a bit more in detail to see if there actually a bug in there? Or is the JobHistoryUpdate variable miscomputed somehow?

zdohnal · 2023-05-18T14:25:19Z

I've lowered the priority, since it is not hit every time you run cupsd and there is a workaround.

What is the workaround? (sorry for asking).

Usually restarting the service helps. If it does not, I clean up /var/spool/cups of d and c files.

Based on currently unknown trigger scheduler sometimes sets JobHistoryUpdate into past, which causes `select()` to timeout after one second. It happens when `job->file_time` of a job without files to remove gets assigned to `JobHistoryUpdate`. If we check for `job->num_files` and assign the `job->file_time` only if there are any, we will fix extensive logging (and unneeded cupsd execution) in various places, e.g. cleaning jobs, expiring subscriptions, deleting temporary queues... Fixes OpenPrinting#604

zdohnal · 2023-07-27T10:47:16Z

@clefru thanks for the investigation! I've found time to look into it too (or rather I got annoyed that I saw it again :) ) - I think we have to check job->num_files when we assign time into JobHistoryUpdate in cupsdCleanJobs() - because we want to run the function when there is a file which is going to be removed here.

I'll create PR for it.

Based on currently unknown trigger scheduler sometimes sets JobHistoryUpdate into past, which causes select() to timeout after one second. It happens when job->file_time of a job without files to remove gets assigned to JobHistoryUpdate. If we check for job->num_files and assign the job->file_time only if there are any, we will fix extensive logging (and unneeded cupsd execution) in various places, e.g. cleaning jobs, expiring subscriptions, deleting temporary queues... Fixes #604

clefru · 2023-07-30T14:05:52Z

@zdohnal Thank you for the fix!

Master commit: 541f72d1cc1 Fixes #604

zdohnal added bug Something isn't working unable-to-reproduce Unable to reproduce priority-high labels Jan 30, 2023

zdohnal mentioned this issue Jan 30, 2023

Increase log level for "Expiring subscriptions..." #578

Merged

zdohnal added priority-medium and removed priority-high labels Jan 30, 2023

bgamari mentioned this issue Apr 1, 2023

Journalctl spam with 1/sec 'cupsd[1394]: Expiring subscriptions' NixOS/nixpkgs#195090

Closed

michaelrsweet removed the unable-to-reproduce Unable to reproduce label Jun 2, 2023

michaelrsweet self-assigned this Jun 2, 2023

michaelrsweet modified the milestones: v2.5, v2.4.x Jun 2, 2023

michaelrsweet added priority-low and removed priority-medium labels Jun 2, 2023

zdohnal mentioned this issue Jul 27, 2023

Ricoh MP-C4503 PDF/PS/PXL provided ppds do not work but Gutenprint PPD works OpenPrinting/libcupsfilters#33

Closed

zdohnal mentioned this issue Jul 27, 2023

scheduler/job.c: Fix extensive logging in scheduler #767

Merged

zdohnal closed this as completed in #767 Jul 27, 2023

zdohnal added a commit that referenced this issue Sep 13, 2023

scheduler/job.c: Merge fix from master for extensive logging

cc7713b

Master commit: 541f72d1cc1 Fixes #604

paboum mentioned this issue Mar 6, 2024

Printing test page does not work on Gentoo with CUPS 2.4.7 due distro packaging issue #904

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cupsd sometimes enters a infinite loop when timeout variable gets lower than current_time #604

cupsd sometimes enters a infinite loop when timeout variable gets lower than current_time #604

zdohnal commented Jan 30, 2023

zdohnal commented Jan 30, 2023

clefru commented Feb 4, 2023 •

edited

Loading

michaelrsweet commented Feb 6, 2023

yurkobb commented Apr 10, 2023

clefru commented Apr 29, 2023 •

edited

Loading

zdohnal commented May 18, 2023

zdohnal commented Jul 27, 2023

clefru commented Jul 30, 2023

cupsd sometimes enters a infinite loop when timeout variable gets lower than current_time #604

cupsd sometimes enters a infinite loop when timeout variable gets lower than current_time #604

Comments

zdohnal commented Jan 30, 2023

zdohnal commented Jan 30, 2023

clefru commented Feb 4, 2023 • edited Loading

michaelrsweet commented Feb 6, 2023

yurkobb commented Apr 10, 2023

clefru commented Apr 29, 2023 • edited Loading

zdohnal commented May 18, 2023

zdohnal commented Jul 27, 2023

clefru commented Jul 30, 2023

clefru commented Feb 4, 2023 •

edited

Loading

clefru commented Apr 29, 2023 •

edited

Loading