Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: cron.php executes successfully but Nextcloud says cronjob hasn't run #30899

Closed
5 of 8 tasks
hegedussz opened this issue Jan 28, 2022 · 27 comments · Fixed by #32109
Closed
5 of 8 tasks

[Bug]: cron.php executes successfully but Nextcloud says cronjob hasn't run #30899

hegedussz opened this issue Jan 28, 2022 · 27 comments · Fixed by #32109
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap bug

Comments

@hegedussz
Copy link

⚠️ This issue respects the following points: ⚠️

  • This is a bug, not a question or a configuration/webserver/proxy issue.
  • This issue is not already reported on Github (I've searched it).
  • Nextcloud Server is up to date. See Maintenance and Release Schedule for supported versions.
  • I agree to follow Nextcloud's Code of Conduct.

Bug description

I saw this message:
Background jobs Some jobs didn’t execute since yesterday.
Please consider switching to system cron.

Steps to reproduce

  1. working instance 23.0
  2. set the webcron
    3 cron being run by webcron

Expected behavior

The cron has seemingly run successfully, so I should not see this message in the UI? Can't see any issues in the Nextcloud log either.

Installation method

Official Docker image

Operating system

Debian/Ubuntu

PHP engine version

PHP 8.0

Web server

Apache (supported)

Database engine version

MariaDB

Is this bug present after an update or on a fresh install?

Fresh Nextcloud Server install

Are you using the Nextcloud Server Encryption module?

Encryption is Disabled

What user-backends are you using?

  • Default user-backend (database)
  • LDAP/ Active Directory
  • SSO - SAML
  • Other

Configuration report

{
    "system": {
        "htaccess.RewriteBase": "\/",
        "memcache.local": "\\OC\\Memcache\\APCu",
        "memcache.locking": "\\OC\\Memcache\\Redis",
        "redis": {
            "host": "***REMOVED SENSITIVE VALUE***",
            "port": 6379
        },
        "apps_paths": [
            {
                "path": "\/var\/www\/html\/apps",
                "url": "\/apps",
                "writable": false
            },
            {
                "path": "\/var\/www\/html\/custom_apps",
                "url": "\/custom_apps",
                "writable": true
            }
        ],
        "overwriteprotocol": "https",
        "objectstore": {
            "class": "\\OC\\Files\\ObjectStore\\S3",
            "arguments": {
                "bucket": "nextcloud-prod-01",
                "key": "***REMOVED SENSITIVE VALUE***",
                "secret": "***REMOVED SENSITIVE VALUE***",
                "region": "hu-central-1",
                "hostname": "staging.s3.einfra.hu",
                "port": "",
                "objectPrefix": "urn:oid:",
                "autocreate": false,
                "use_ssl": true,
                "use_path_style": true,
                "legacy_auth": false
            }
        },
        "passwordsalt": "***REMOVED SENSITIVE VALUE***",
        "secret": "***REMOVED SENSITIVE VALUE***",
        "trusted_domains": [
            "localhost",
            "nextcloud-prod-01.prod.caas.einfra.hu",
            "nextcloud.dinpi.hu"
        ],
        "datadirectory": "***REMOVED SENSITIVE VALUE***",
        "dbtype": "mysql",
        "version": "23.0.0.10",
        "overwrite.cli.url": "https:\/\/localhost",
        "dbname": "***REMOVED SENSITIVE VALUE***",
        "dbhost": "***REMOVED SENSITIVE VALUE***",
        "dbport": "",
        "dbtableprefix": "oc_",
        "mysql.utf8mb4": true,
        "dbuser": "***REMOVED SENSITIVE VALUE***",
        "dbpassword": "***REMOVED SENSITIVE VALUE***",
        "installed": true,
        "instanceid": "***REMOVED SENSITIVE VALUE***",
        "maintenance": false,
        "mail_smtpmode": "smtp",
        "mail_smtphost": "***REMOVED SENSITIVE VALUE***",
        "mail_sendmailmode": "smtp",
        "mail_smtpport": "25",
        "loglevel": 0,
        "debug": true,
        "ldapProviderFactory": "OCA\\User_LDAP\\LDAPProviderFactory",
        "mail_from_address": "***REMOVED SENSITIVE VALUE***",
        "mail_domain": "***REMOVED SENSITIVE VALUE***"
    }
}

List of activated Apps

Enabled:
  - accessibility: 1.9.0
  - activity: 2.15.0
  - admin_audit: 1.13.0
  - calendar: 3.0.5
  - cloud_federation_api: 1.6.0
  - comments: 1.13.0
  - contacts: 4.0.7
  - contactsinteraction: 1.4.0
  - dashboard: 7.3.0
  - dav: 1.21.0
  - event_update_notification: 1.4.0
  - federatedfilesharing: 1.13.0
  - federation: 1.13.0
  - files: 1.18.0
  - files_pdfviewer: 2.4.0
  - files_rightclick: 1.2.0
  - files_sharing: 1.15.0
  - files_trashbin: 1.13.0
  - files_versions: 1.16.0
  - files_videoplayer: 1.12.0
  - firstrunwizard: 2.12.0
  - group_default_quota: 0.1.3
  - logreader: 2.8.0
  - lookup_server_connector: 1.11.0
  - mail: 1.11.6
  - nextcloud_announcements: 1.12.0
  - notifications: 2.11.1
  - oauth2: 1.11.0
  - password_policy: 1.13.0
  - photos: 1.5.0
  - privacy: 1.7.0
  - provisioning_api: 1.13.0
  - recommendations: 1.2.0
  - serverinfo: 1.13.0
  - settings: 1.5.0
  - sharebymail: 1.13.0
  - support: 1.6.0
  - survey_client: 1.11.0
  - systemtags: 1.13.0
  - text: 3.4.0
  - theming: 1.14.0
  - twofactor_backupcodes: 1.12.0
  - updatenotification: 1.13.0
  - user_ldap: 1.13.0
  - user_status: 1.3.1
  - viewer: 1.7.0
  - weather_status: 1.3.0
  - workflowengine: 2.5.0
Disabled:
  - circles
  - encryption
  - files_external

Nextcloud Signing status

No errors have been found.

Nextcloud Logs

{"reqId":"j1YyIpzI8Jd67gESXKwa","level":0,"time":"2022-01-28T18:55:09+00:00","remoteAddr":"255.255.255.255","user":"--","app":"cron","method":"GET","url":"/cron.php","message":"Finished OCA\\Mail\\BackgroundJob\\TrainImportanceClassifierJob job with ID 2247 in 0 seconds","userAgent":"curl/7.81.0-DEV","version":"23.0.0.10","id":"61f43c0dcdef9"}

{"reqId":"j1YyIpzI8Jd67gESXKwa","level":0,"time":"2022-01-28T18:55:09+00:00","remoteAddr":"255.255.255.255","user":"--","app":"cron","method":"GET","url":"/cron.php","message":"Run OCA\\Mail\\BackgroundJob\\TrainImportanceClassifierJob job with ID 2247","userAgent":"curl/7.81.0-DEV","version":"23.0.0.10","id":"61f43c0dcdf95"}

{"reqId":"bXxjA2YbyH09Kix5PzKQ","level":0,"time":"2022-01-28T18:50:12+00:00","remoteAddr":"255.255.255.255","user":"--","app":"cron","method":"GET","url":"/cron.php","message":"Run OCA\\Mail\\BackgroundJob\\SyncJob job with ID 2502","userAgent":"curl/7.81.0-DEV","version":"23.0.0.10","id":"61f43ae53d1c1"}

Additional info

Webcron container:
Image: curlimages/curl:7.81.0
Cron Schedule: */5 * * * *

Command: curl -L -s https://nextcloud.ltd/cron.php
Result: {"status":"success"}

When I run cron.php manually inside the container it is successful (sudo -u www-data PHP_MEMORY_LIMIT=512M php /var/html/cron.php)

@hegedussz hegedussz added 0. Needs triage Pending check for reproducibility or if it fits our roadmap bug labels Jan 28, 2022
@darktweek
Copy link

I have same problem

@szaimen
Copy link
Contributor

szaimen commented Jan 31, 2022

Yes, webcron is known to be unreliable. Please switch to system cron instead! https://docs.nextcloud.com/server/stable/admin_manual/configuration_server/background_jobs_configuration.html?highlight=cron#cron

@hegedussz
Copy link
Author

Yes, webcron is known to be unreliable. Please switch to system cron instead! https://docs.nextcloud.com/server/stable/admin_manual/configuration_server/background_jobs_configuration.html?highlight=cron#cron

Unfortunately the offical Nextcloud docker image does not use the system cron.

@szaimen
Copy link
Contributor

szaimen commented Jan 31, 2022

you can either add an addditional container for this: https://github.com/nextcloud/docker/blob/dcf058fb559e7918cd77f5f1bb0dfdbcee813571/.examples/docker-compose/insecure/postgres/fpm/docker-compose.yml#L40-L48 or add a cronjob like this:

# Edit the root crontab
sudo crontab -e -u root

# Add this line to the file:
*/5 * * * * docker exec --user www-data nextcloud-container-name php cron.php
# (of course you need to change 'nextcloud-container-name' to the correct name of your container)

# close and save the file -> done.

@szaimen szaimen closed this as completed Jan 31, 2022
@koehn
Copy link

koehn commented Mar 28, 2022

Any chance you'll update the documentation and remove the feature, or fix the bug?

I've been using Webcron for years and it suddenly stopped working in 23.0.3. Had I known it wasn't actually supported I'd have migrated to another solution a long time ago.

@ghost
Copy link

ghost commented Mar 28, 2022

the official helm chart also uses webcron and will need fixed along with the docs
https://github.com/nextcloud/helm/blob/master/charts/nextcloud/values.yaml#L281-L322

@koehn
Copy link

koehn commented Mar 29, 2022

@szaimen This bug shouldn’t be closed; if a documented feature “is known to be unreliable” then either make it reliable or remove the feature. Leaving it documented and unreliable means it’s just a bug that you don’t want to fix.

@szaimen szaimen reopened this Mar 29, 2022
@timkgh
Copy link

timkgh commented Apr 19, 2022

Related #31725

@fuse314
Copy link

fuse314 commented Apr 19, 2022

As I understand it:
In cron.php, the behaviour of how the jobs are executed when called from "CLI" and "not CLI" (i.E. webcron) is very different.

  • when called from "CLI", the jobs are executed in a loop:

    server/cron.php

    Line 144 in 13dbad0

    while ($job = $jobList->getNext($onlyTimeSensitive)) {
  • when called from "not CLI", cron.php just gets the first job and executes it:

    server/cron.php

    Line 170 in 13dbad0

    $job = $jobList->getNext();

I have the same problem where webcron is executed with status "success", while the "settings" dialog shows that the cronjobs are not being executed -- with my Nextcloud 22.2.6 (docker) instance where I changed from system cron (extra container) to webcron (via health check as suggested by nextcloud/docker#1329 ) in the past month.

@charims
Copy link

charims commented Apr 19, 2022

I had this same issue, but it turned out I was really behind on cron jobs after the upgrade, so despite getting {"status":"success"} from webcron, it was taking a long time to catch up. From what I understand, since webcron only handles one job per-call, if you are really behind, it can take an astronomical amount of time to catch up.

I ran the following sql query to check the status of the jobs:

select id, class, last_run, last_checked, execution_duration from oc_jobs order by last_run;

Most of mine had a last_run of 0.

So I wanted to see if webcron was doing anything, so i started to curl the cron.php endpoint. After a few minutes of calling it repeatedly it caught up. Now I just need to watch to see if it keeps up or if I need to change the frequency its called.

Running the below may help you get caught up on cron's rather quickly. Be careful, this may put some load on your system.

while true; do curl -s https://<domain>/cron.php; done

You can run this with a sleep statement to prevent causing too much system load:

while true; do curl -s https://<domain>/cron.php; sleep 5s; done

I knew I was done when the basic settings page had the green status for background jobs, and all the jobs had a recent last_run in the database.

@timkgh
Copy link

timkgh commented Apr 19, 2022

@charims I think it will fall behind again. I have a very small instance, system cron finishes quickly, then webcron is green for a while but after a few days it turns red again. Maybe webcron only runs a subset of all the jobs or it keeps failing on the same one but still returns success.

I switched to system cron for now, though I'd rather not, I don't like these cron jobs run by the www-data user.

@pneuschwander
Copy link

Could it be related to the value of lastcron?

After each successful cron run, the value of lastcron should be updated:

server/cron.php

Line 180 in 13dbad0

$config->setAppValue('core', 'lastcron', time());

That value determines whether the error message is shown in the UI:
https://github.com/nextcloud/server/blob/master/apps/settings/templates/settings/admin/server.php#L42 and the following lines.

The error message related to "Please consider switching to system cron" seems to not depend on a specific outstanding cron task. It seems to rather be based on the date of the last successful cron run.

The value of lastcron seems to be fetched for the UI at:

'lastcron' => $this->config->getAppValue('core', 'lastcron', false),

So I'd assume that either their is something that prevents the update of lastcron value or that the retrieval of lastcron for the UI is not working as expected.

Could someone who is affected by this try to inspect the lastcron value (not sure where it is stored; maybe in a database table?) and check whether it is updated / fresh?

If it is updated/fresh: Maybe there is a problem to get the current value for the UI

If it is not updated / not fresh: Maybe there is a problem to persist the change or the line in cron.php is not reached (but then there should some exception be logged according to cron.php code)

Checking whether lastcron value is updated might help to better understand what is happening / where things go wrong.

@pneuschwander
Copy link

pneuschwander commented Apr 19, 2022

Oh, I just realized that I might be wrong with my assumption about lastcron.

https://github.com/nextcloud/server/blob/master/apps/settings/templates/settings/admin/server.php#L47

The error message is based on cronMaxAge and it depends on the execution of the individual tasks:

'cronMaxAge' => $this->cronMaxAge(),

protected function cronMaxAge(): int {

Sorry :/

@pneuschwander
Copy link

pneuschwander commented Apr 19, 2022

I think I found out what is going on.
@fuse314 already referred to the different handling in cron.php based on whether execution is via webcron or cron/CLI.

image

The jobs with time_sensitive == 0 are the ones that won't be executed.

https://docs.nextcloud.com/server/latest/admin_manual/configuration_server/background_jobs_configuration.html?highlight=cron#maintenance-window-start

has been introduced.

/**
* UTC Hour for maintenance windows
*
* Some background jobs only run once a day. When an hour is defined for this config,
* the background jobs which advertise themselves as not time sensitive will be
* delayed during the "working" hours and only run in the 4 hours after the given time.
* This is e.g. used for activity expiration, suspicious login training and update checks.
*
* A value of 1 e.g. will only run these background jobs between 01:00am UTC and 05:00am UTC.
*
* Defaults to ``100`` which disables the feature
*/
'maintenance_window_start' => 1,

The feature to respect maintenance_window_start is only part of the CLI option in cron.php.

server/cron.php

Lines 114 to 144 in 13dbad0

$onlyTimeSensitive = false;
$startHour = $config->getSystemValueInt('maintenance_window_start', 100);
if ($startHour <= 23) {
$date = new \DateTime('now', new \DateTimeZone('UTC'));
$currentHour = (int) $date->format('G');
$endHour = $startHour + 4;
if ($startHour <= 20) {
// Start time: 01:00
// End time: 05:00
// Only run sensitive tasks when it's before the start or after the end
$onlyTimeSensitive = $currentHour < $startHour || $currentHour > $endHour;
} else {
// Start time: 23:00
// End time: 03:00
$endHour -= 24; // Correct the end time from 27:00 to 03:00
// Only run sensitive tasks when it's after the end and before the start
$onlyTimeSensitive = $currentHour > $endHour && $currentHour < $startHour;
}
}
// Work
$jobList = \OC::$server->getJobList();
// We only ask for jobs for 14 minutes, because after 5 minutes the next
// system cron task should spawn and we want to have at most three
// cron jobs running in parallel.
$endTime = time() + 14 * 60;
$executedJobs = [];
while ($job = $jobList->getNext($onlyTimeSensitive)) {

The getNext method of joblist has a new parameter with default = true.

public function getNext(bool $onlyTimeSensitive = true): ?IJob {

if ($onlyTimeSensitive) {
$query->andWhere($query->expr()->eq('time_sensitive', $query->createNamedParameter(IJob::TIME_SENSITIVE, IQueryBuilder::PARAM_INT)));
}

Webcron fetches job without specifying the parameter, fallback to true default parameter value and will never trigger time_sentitive == 0 jobs

server/cron.php

Line 170 in 13dbad0

$job = $jobList->getNext();

One might want to consider changing the default value for the parameter or to adapt the "else" part of cron.php to consider the new parameter / feature.

@fuse314
Copy link

fuse314 commented Apr 21, 2022

I modified cron.php to handle cron and webcron calls the same way:
My instance is now running this version of cron.php.
So far - running only for a few hours - everything looks good, though each call to cron.php now takes a few seconds instead of finishing "immediately" before the change.
I suspect that when long running tasks will be handled, the webcron call might run into a timeout.

I will keep an eye on my small docker instance over the next days/weeks. If anyone here feels adventurous, you are welcome to try this out for yourself.

If I don't run into problems in the next week, I will open a pull request with this change.

@iordi154
Copy link

iordi154 commented Apr 22, 2022

I wanted to try but ran into a user problem in my webhosted instance:
“Console has to be executed with the user that owns the file config/config.php
Current user id: 1
id of config.php: 2“

@iordi154
Copy link

Webcron fetches job without specifying the parameter, fallback to true default parameter value and will never trigger time_sentitive == 0 jobs

server/cron.php

Line 170 in 13dbad0

$job = $jobList->getNext();

One might want to consider changing the default value for the parameter or to adapt the "else" part of cron.php to consider the new parameter / feature.

I can confirm: Manually changing time_sensitive parameters in DB = 1 immediately triggers all jobs and eliminate the red light in NC configuration menu => „everything OK“
However, the jobs fall back to time_sensitive = 0 so everything will start from zero again.

Sadly I don‘t know, what has to be done to change this behaviour.

@charims
Copy link

charims commented Apr 23, 2022

i've been running with modified cron.php from @fuse314 for a couple days, it got me caught back up after jobs ran behind again, and now it seems to be keeping up.

@pneuschwander
Copy link

I think #30945 introduced that changes related to time_sensitive jobs. Maybe @nickvergessen can add something to the discussion.

@nickvergessen
Copy link
Member

See my comment in #32107 (comment)
We accidentally put the wrong default value in place for instances which use webcron or ajax cron jobs. Add false inside the getNext() and it should work again.

@k-popov
Copy link
Contributor

k-popov commented Apr 24, 2022

I've created another #32109 with just default value changed to get any (time-sensitive or not) job from the list.

#32107 has been closed but still I don't understand why webcron should not execute time-sensitive jobs only during maintenance hours. I would be grateful for explanation why is that. I understand that this could be just maintainer's will but would be good to have rationale behind the decision.

What option is proposed for webcron to run time-consuming jobs during maintenance hours.

@nickvergessen
Copy link
Member

nickvergessen commented Apr 24, 2022

Ill give a more detailed response to that tomorrow from desk. Too hard to put it together on the phone in my free time 😅

@nickvergessen
Copy link
Member

What option is proposed for webcron to run time-consuming jobs during maintenance hours.

TL;DR: Only cron is suitable for non-single user instances

Background information:

  • We try to set the php execution time to 1h, but in most environments this is prohibited and default execution times for web requests is limited, while unlimited for CLI:

    server/lib/base.php

    Lines 631 to 635 in 7d272c5

    if (strpos(@ini_get('disable_functions'), 'set_time_limit') === false) {
    @set_time_limit(3600);
    }
    @ini_set('max_execution_time', '3600');
    @ini_set('max_input_time', '3600');
  • To reduce the risk of breaking web-based crons (ajax and webcron) both only run 1 job per execution:

    server/cron.php

    Lines 162 to 177 in 3fd55cb

    } else {
    // We call cron.php from some website
    if ($appMode === 'cron') {
    // Cron is cron :-P
    OC_JSON::error(['data' => ['message' => 'Backgroundjobs are using system cron!']]);
    } else {
    // Work and success :-)
    $jobList = \OC::$server->getJobList();
    $job = $jobList->getNext();
    if ($job != null) {
    $job->execute($jobList, $logger);
    $jobList->setLastJob($job);
    }
    OC_JSON::success();
    }
    }
    • While CLI based cron loops and starts taking new jobs until it ran 14 minutes (by minute 15 3 other processes should run in parallel already)

      server/cron.php

      Lines 144 to 161 in 3fd55cb

      while ($job = $jobList->getNext($onlyTimeSensitive)) {
      if (isset($executedJobs[$job->getId()])) {
      $jobList->unlockJob($job);
      break;
      }
      $job->execute($jobList, $logger);
      // clean up after unclean jobs
      \OC_Util::tearDownFS();
      $jobList->setLastJob($job);
      $executedJobs[$job->getId()] = true;
      unset($job);
      if (time() > $endTime) {
      break;
      }
      }
  • With webcron running every 15 (very old recommendation) or 5 (recommended since some time) there are only 20 (in case of 15 min interval) or 60 (in case of 5 min interval) executions inside the maintenance window. So it's much too risky to only run jobs in that time.
  • Ajax cron would even be killed completely mostlikely as the jobs would only run when no one is there, but when no one is there no one triggers the ajax requests to run a job.

Taking the numbers from the second last point a step further:

  • 15min or 5min interval make 96 or 288 jobs that can be executed per day.
  • I don't know if you ever checked your database, but e.g. on our company instance we have 722 jobs in the table right now
  • Only 288 can be worked on today ~40%
  • 434 jobs will not be executed today and have priority tomorrow

On my 2 user home installation I have 64 jobs, but there are no trashbin or version expirations queued, so they would manage to execute all together today.

Summary: Using ajax/webcron is only suitable for low-usage instances like 1-2 user home installs with passive sync behaviour. Anything else that requires serious cron execution to deliver notifications, emails, federation, .... must use cron to perform.
To perform even better on instances with many (> 1k) users we added this setting to allow admins to define load reduced hours so that some things which are not time critical can be moved to that time. We have to see how this develops as it reduces the load during the time people are online, but could e.g. heavily increase the load during the off-work times and might even be too much or something when the jobs are concentrated all into the same 4h window. But only reality from very big instances will show if this is a problem or better than the previous behaviour.

The breakage with the wrong default value was of course never intended. It's fixed now and will come with the next updates to 23 and 22 I assume.

@nickvergessen
Copy link
Member

Also send a pull request to update the documentation on that a bit:
nextcloud/documentation#8209
feel free to approve that one

@r2evans
Copy link

r2evans commented Nov 3, 2022

@fuse314, your repo change to cron.php is no longer available. Are you able to post your changes?

@fuse314
Copy link

fuse314 commented Nov 4, 2022

@r2evans the equivalent of my changes have been merged with #32109
The webcron solution currently works for me (I use Nextcloud 23), so I deleted my fork.

@r2evans
Copy link

r2evans commented Nov 4, 2022

The webcron solution is notable not working for me in 24.0.5, not sure if it just fell behind and never caught up or if it is still fundamentally unable to handle the load. (Low user-count in my case.) My workaround is unfortunately to configure my synology to run docker exec nextcloud php -f cron.php instead of curl https://myhost.com/nextcloud/cron.php: it requires root in cron (discouraged) but has the benefit of doing a "full" cron run instead of the webcron method (whether still hobbled or not). Thanks @fuse314

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.