Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After idling some hours the scheduler returns "running", accepts jobs, but refuses to execute them when "run_date" arrives #283

Closed
WesleyBlancoYuan opened this issue Jan 30, 2018 · 7 comments

Comments

@WesleyBlancoYuan
Copy link

I find that if I keep Apscheduler busy, i.e., keep adding jobs to it, it will stay happy and active. But, if I leave it without job for a time, say, 12 hours or more, and add job again, even it is running (scheduler.state returns 1, which is RUNNING, when I consult with a web service), it accepts job, but does not execute it, when the run_date arrives.

I am using APScheduler along with Apache and Django.

Apache/2.4.6 (CentOS) PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5
APScheduler (3.4.0)
django-apscheduler (0.2.3)

APScheduler is running in the web app in Apache, not in another process.
I tried using django-apscheduler (0.2.3) and without it, and the result is the same.

Some code:

'''
Scheduler and methods to handle jobs.

'''
class TaskScheduler(object):
    
    def __init__(self):
        '''
        Constructor. 
        Only can call it once because it create the sc.
        '''
        self.started = False
        self.sc = None
        self.sc = BackgroundScheduler()
#         self.sc.add_jobstore(DjangoJobStore(), 'default')
        
    
    def startScheduler(self):
        '''
        Start the scheduler. 
        @return if 'scheduler.start()' is called.
        @rtype: bool
        '''
        if (self.sc != None and self.sc.state == STATE_RUNNING):
            logger.debug("Scheduler already started.")
            return False
        else:
            try:
                self.sc = BackgroundScheduler()
#                 self.sc.add_jobstore(DjangoJobStore(), 'default')
                self.sc.start()
                self.started = True
                logger.info("Scheduler started STATUS: "+str(self.sc.state))
                return True
            except Exception as e:
                tb = traceback.format_exc(1)
                logger.error("EXCEPTION Error starting the Task Scheduler " + repr(e) + " - args - "+repr(tb))
                return False

    def stopScheduler(self):
        '''
        Shutdown the scheduler.
        @return if 'scheduler.shutdown()' is called. 
        @rtype: bool
        '''
        if (self.sc != None and self.sc.state != STATE_RUNNING):
            logger.debug("Scheduler already stopped.")
            return False
        else:
            self.sc.shutdown()
            self.started = False
            logger.info("Scheduler shut down.")
            return True
    
    def getStatusScheduler(self):
        '''
        Shutdown the scheduler.
        @return if 'scheduler.shutdown()' is called. 
        @rtype: bool
        '''
        if self.sc == None:
            logger.info("Scheduler Is None.")
            return -1
        else:
            return self.sc.state
                
    
    def planifyJob(self, name, subdirectory, extension, interface, interface_name, filesize, filerotate, timelimit, dateStart, continuousLoop=0, snaplength=0, filterstr="", autosync=0,username=""):
        try:
            from backEnd import tcpdmanage
            from backEnd.models import TPlanifiedJobs
            import jobscheduler
            
            returned,msg=tcpdmanage.verifyDumpcap(name, subdirectory, extension, interface, interface_name, filesize, filerotate, timelimit, continuousLoop, snaplength, filterstr, autosync, username)
            if not returned:
                return False, "Error Verifying Dumpcap process"
            processPlan=TPlanifiedJobs(name=name,subdirectory=subdirectory,extension=extension,interface=interface,interface_name=interface_name,filesize=filesize,filerotate=filerotate,timelimit=timelimit,autosync=autosync,continuousLoop=continuousLoop,snaplength=snaplength,filterDump=filterstr,state=properties.STATE_PLAN_PREPARING,startDate=dateStart)
            processPlan.save()
            idProcessPlan=processPlan.id
            logger.info("Planify Job "+str(dateStart))
            self.startScheduler()
            self.sc.add_job(jobscheduler.executionJob, 'date', run_date=dateStart, id=str(idProcessPlan), replace_existing=True, kwargs={'idProcessPlan': idProcessPlan, 'username': str(username)})
            return True,";"+str(idProcessPlan);
        except Exception as e:
            tb = traceback.format_exc(1)
            logger.error("ERROR planifyJob "+ repr(e) + " - args " + repr(tb))
            return False, "planifyJob Critical Error: "+repr(e);
@agronholm
Copy link
Owner

Have you read the "Troubleshooting" section of the docs?

@WesleyBlancoYuan
Copy link
Author

If I raise the logger level, is there anything more to config? Do I have to call the logger to output more?

@agronholm
Copy link
Owner

No, simply raising the logger level should be fine. You should be seeing plenty of additional info then.

@WesleyBlancoYuan
Copy link
Author

WesleyBlancoYuan commented Feb 1, 2018

Now I have set the logging level and tested: nothing. At the moment of planified job, when the job is due, nothing happened and nothing in log.

Some details:

I have configured django-apscheduler as the jobstore. Apache setting and so on is in the OP.
Yesterday, at 10:00 AM, I plannified three jobs, job A at 23:30 yesterday, job B at 09:00 today, and job C 09:50 today, and after that, I left the server there, no more jobs from yesterday from 1000 to 2330.

Checking the logs of yesterday, at 2330, this error happened:

[Wed Jan 31 23:30:00.010202 2018] [:error] [pid 17924] DEBUG:apscheduler.scheduler:Looking for jobs to run
[Wed Jan 31 23:30:00.019000 2018] [:error] [pid 17924] ERROR:root:
[Wed Jan 31 23:30:00.019013 2018] [:error] [pid 17924] Traceback (most recent call last):
[Wed Jan 31 23:30:00.019031 2018] [:error] [pid 17924]   File "/usr/lib/python2.7/site-packages/django_apscheduler/jobstores.py", line 40, in get_due_jobs
[Wed Jan 31 23:30:00.019034 2018] [:error] [pid 17924]     return self._get_jobs(next_run_time__lte=serialize_dt(now))
[Wed Jan 31 23:30:00.019036 2018] [:error] [pid 17924]   File "/usr/lib/python2.7/site-packages/django_apscheduler/jobstores.py", line 100, in _get_jobs
[Wed Jan 31 23:30:00.019038 2018] [:error] [pid 17924]     for job_id, job_state in job_states:
[Wed Jan 31 23:30:00.019040 2018] [:error] [pid 17924]   File "/usr/lib64/python2.7/site-packages/django/db/models/query.py", line 250, in __iter__
[Wed Jan 31 23:30:00.019042 2018] [:error] [pid 17924]     self._fetch_all()
[Wed Jan 31 23:30:00.019044 2018] [:error] [pid 17924]   File "/usr/lib64/python2.7/site-packages/django/db/models/query.py", line 1118, in _fetch_all
[Wed Jan 31 23:30:00.019046 2018] [:error] [pid 17924]     self._result_cache = list(self._iterable_class(self))
[Wed Jan 31 23:30:00.019048 2018] [:error] [pid 17924]   File "/usr/lib64/python2.7/site-packages/django/db/models/query.py", line 122, in __iter__
[Wed Jan 31 23:30:00.019050 2018] [:error] [pid 17924]     for row in compiler.results_iter():
[Wed Jan 31 23:30:00.019052 2018] [:error] [pid 17924]   File "/usr/lib64/python2.7/site-packages/django/db/models/sql/compiler.py", line 828, in results_iter
[Wed Jan 31 23:30:00.019054 2018] [:error] [pid 17924]     results = self.execute_sql(MULTI)
[Wed Jan 31 23:30:00.019056 2018] [:error] [pid 17924]   File "/usr/lib64/python2.7/site-packages/django/db/models/sql/compiler.py", line 886, in execute_sql
[Wed Jan 31 23:30:00.019058 2018] [:error] [pid 17924]     raise original_exception
[Wed Jan 31 23:30:00.019060 2018] [:error] [pid 17924] OperationalError: (2006, 'MySQL server has gone away')
[Wed Jan 31 23:30:00.020039 2018] [:error] [pid 17924] Exception in thread APScheduler:
[Wed Jan 31 23:30:00.020050 2018] [:error] [pid 17924] Traceback (most recent call last):
[Wed Jan 31 23:30:00.020053 2018] [:error] [pid 17924]   File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
[Wed Jan 31 23:30:00.020055 2018] [:error] [pid 17924]     self.run()
[Wed Jan 31 23:30:00.020057 2018] [:error] [pid 17924]   File "/usr/lib64/python2.7/threading.py", line 764, in run
[Wed Jan 31 23:30:00.020059 2018] [:error] [pid 17924]     self.__target(*self.__args, **self.__kwargs)
[Wed Jan 31 23:30:00.020061 2018] [:error] [pid 17924]   File "/usr/lib64/python2.7/site-packages/apscheduler/schedulers/blocking.py", line 30, in _main_loop
[Wed Jan 31 23:30:00.020063 2018] [:error] [pid 17924]     wait_seconds = self._process_jobs()
[Wed Jan 31 23:30:00.020065 2018] [:error] [pid 17924]   File "/usr/lib64/python2.7/site-packages/apscheduler/schedulers/base.py", line 943, in _process_jobs
[Wed Jan 31 23:30:00.020067 2018] [:error] [pid 17924]     for job in due_jobs:
[Wed Jan 31 23:30:00.020069 2018] [:error] [pid 17924] TypeError: 'NoneType' object is not iterable
[Wed Jan 31 23:30:00.020071 2018] [:error] [pid 17924]

We can see that it happens inside django-apscheduler.

And then, at 0900 and 0950, this error does not appear in log, and jobs not executed.

Before that, I have tried without django-apscheduler, planned a job at 0940, and a very similar error occurred, but in the method which executes the job(the callable defined in add_job):

How I added the job:

            self.sc.add_job(jobscheduler.executionJob, 'date', run_date=dateStart, id=str(idProcessPlan), replace_existing=True, kwargs={'idProcessPlan': idProcessPlan, 'username': str(username)})

callable:

def executionJob(idProcessPlan, username):
    logger.debug("Starting Execution Job...")
    planifyJobs=TPlanifiedJobs.objects.get(id=idProcessPlan)  # here the exception
    planifyJobsData=TPlanifiedJobsSerializer(planifyJobs)
    dataJob=planifyJobsData.data
    logger.info("Executing executionJob Plannified "+repr(dataJob))
    ....

The exeption was:

[Wed Jan 31 09:40:00.004621 2018] [:error] [pid 19658] INFO:apscheduler.scheduler:Removed job 15
[Wed Jan 31 09:40:00.004975 2018] [:error] [pid 19658] DEBUG:managerCapture:Starting Execution Job...
[Wed Jan 31 09:40:00.009212 2018] [:error] [pid 19658] DEBUG:apscheduler.scheduler:No jobs; waiting until a job is added
[Wed Jan 31 09:40:00.012488 2018] [:error] [pid 19658] ERROR:apscheduler.executors.default:Job "executionJob (trigger: date[2018-01-31 09:40:00 CET], next run at: 2018-01-31 09:40:00 CET)" raised an exception
[Wed Jan 31 09:40:00.012538 2018] [:error] [pid 19658] Traceback (most recent call last):
[Wed Jan 31 09:40:00.012548 2018] [:error] [pid 19658]   File "/usr/lib64/python2.7/site-packages/apscheduler/executors/base.py", line 125, in run_job
[Wed Jan 31 09:40:00.012555 2018] [:error] [pid 19658]     retval = job.func(*job.args, **job.kwargs)
[Wed Jan 31 09:40:00.012561 2018] [:error] [pid 19658]   File "/var/www/html/ManagerCapture/backEnd/planifier/jobscheduler.py", line 19, in executionJob
[Wed Jan 31 09:40:00.012584 2018] [:error] [pid 19658]     planifyJobs=TPlanifiedJobs.objects.get(id=idProcessPlan)
[Wed Jan 31 09:40:00.012674 2018] [:error] [pid 19658]   File "/usr/lib64/python2.7/site-packages/django/db/models/manager.py", line 85, in manager_method
[Wed Jan 31 09:40:00.012694 2018] [:error] [pid 19658]     return getattr(self.get_queryset(), name)(*args, **kwargs)
[Wed Jan 31 09:40:00.012698 2018] [:error] [pid 19658]   File "/usr/lib64/python2.7/site-packages/django/db/models/query.py", line 374, in get
[Wed Jan 31 09:40:00.012702 2018] [:error] [pid 19658]     num = len(clone)
[Wed Jan 31 09:40:00.012704 2018] [:error] [pid 19658]   File "/usr/lib64/python2.7/site-packages/django/db/models/query.py", line 232, in __len__
[Wed Jan 31 09:40:00.012707 2018] [:error] [pid 19658]     self._fetch_all()
[Wed Jan 31 09:40:00.012709 2018] [:error] [pid 19658]   File "/usr/lib64/python2.7/site-packages/django/db/models/query.py", line 1118, in _fetch_all
[Wed Jan 31 09:40:00.012713 2018] [:error] [pid 19658]     self._result_cache = list(self._iterable_class(self))
[Wed Jan 31 09:40:00.012716 2018] [:error] [pid 19658]   File "/usr/lib64/python2.7/site-packages/django/db/models/query.py", line 53, in __iter__
[Wed Jan 31 09:40:00.012718 2018] [:error] [pid 19658]     results = compiler.execute_sql(chunked_fetch=self.chunked_fetch)
[Wed Jan 31 09:40:00.012722 2018] [:error] [pid 19658]   File "/usr/lib64/python2.7/site-packages/django/db/models/sql/compiler.py", line 886, in execute_sql
[Wed Jan 31 09:40:00.012724 2018] [:error] [pid 19658]     raise original_exception
[Wed Jan 31 09:40:00.012726 2018] [:error] [pid 19658] OperationalError: (2006, 'MySQL server has gone away')

I think, before the execution of job, APScheduler should wake the connection to DB, to assure that if we want to retrieve persisted data of jobs, we have a living connection. I know Django has a problem of "after some idle time, the connection with DB dies and the first coming requests fails", but I am not sure if it is related. Even that, I think APScheduler should wake somehow, because I may planify a job after 3 days and setting the wait_time or max_time_out or whatever to that long, is not feasible.

I have seen a somehow similar issue #228, but I don't know how to apply the config with django-apscheduler.

@agronholm
Copy link
Owner

MySQL has been known to have issues like that, depending on the driver. But what worries me more is that for some reason the scheduler is trying to iterate over None which should never happen under any circumstances.

@WesleyBlancoYuan
Copy link
Author

Interestingly, if I don't use django-apscheduler jobstore, i.e., just use memory apscheduler, I can partially solve this by adding import of model classes in the callable of the job.

I mean, instead of doing:

from backEnd.models import TPlanifiedJobs
from backEnd.serializers import TPlanifiedJobsSerializer
from backEnd import tcpdmanage

def executionJob(idProcessPlan, username):
    logger.debug("Starting Execution Job...")
    planifyJobs=TPlanifiedJobs.objects.get(id=idProcessPlan)  # here the exception
    ...

I do:

def executionJob(idProcessPlan, username):
    from backEnd.models import TPlanifiedJobs
    from backEnd.serializers import TPlanifiedJobsSerializer
    from backEnd import tcpdmanage
    logger.debug("Starting Execution Job...")
    planifyJobs=TPlanifiedJobs.objects.get(id=idProcessPlan)  # here the exception
    ...

And the job gets executed. I think by importing the module Django reestablish the connection with DB and then enable the connection for APScheduler again.

By the way, in order to avoid waiting for one day to test the scheduler's activity, you can reproduce by:

  1. Create a scheduler without djang-apscheduler jobstore.
  2. Planify a job in 5 minutes. In the job to execute, add some DB consult, like find by id.
  3. Stop mysql/mariadb service. (systemctl stop mariadb, etc)
  4. Start mysql/mariadb again.
  5. Wait till the moment of execution.

You can see the same error in log.

But, if we don't specify the jobstore persistency, we have no way to track jobs if apache/mariadb/the machine restarts.

I have reported an issue to django-apscheduler, too.

@WesleyBlancoYuan
Copy link
Author

WesleyBlancoYuan commented Feb 5, 2018

Confirmed: The DB connection is the problem. I have reported the issue to django-apscheduler and is solved in 0.2.5. jcass77/django-apscheduler#13

With or without any persistent jobstore, to avoid this issue, in the callable of the job, you have to import Django models and do django.setup() to assure that Django comes in and manage the connection again to make sure the connection is alive. Now I do this:

import os
from backEnd import tcpdmanage
import traceback

def executionJob(idProcessPlan, username):
    logger.debug("Starting Execution Job...")
    try:
        os.environ.setdefault("DJANGO_SETTINGS_MODULE", "ManagerCaptureMain.settings")
        import django
        django.setup()
        from backEnd import models
        from backEnd import serializers
        planifyJobs=models.TPlanifiedJobs.objects.get(id=idProcessPlan)
        planifyJobsData=serializers.TPlanifiedJobsSerializer(planifyJobs)

So that even the DB connection dies, I make Django to reconnect first, because Django manages the connection automatically.

Now I think you can close the issue. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants