Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRM-21460 Add job execution hooks #11337

Merged
merged 1 commit into from
Jun 9, 2018

Conversation

rthouvenin
Copy link
Contributor

Overview

Implements the solution suggested in CRM-21460 to allow monitoring / storing the scheduled job execution results, assuming this is the right direction.

Before

At the moment the job log does not store any structured information about the execution status, only a message in a freetext format. One would have to parse localized messages to extract whether jobs have succeeded or not.

After

Two new hooks: preJob and postJob called right before and after the execution of a scheduled job. Rather than storing new info in the log table, hooks let implementers chose what to do with the info, and whether they want to store it or immediately act upon it.

Technical Details

Having both a pre and a post hook is mostly to measure execution time or detect stuck jobs. The pre hook doesn't give the job and params objects by reference, so the hook is not meant to alter the job before its execution.
In the post hook, the result can also be an exception object when the job execution was interrupted by an exception.

Comments

See Sencivity for an example of how these hooks can be used.

@MegaphoneJon
Copy link
Contributor

@rthouvenin Thank you for this work! I had considered the same problem previously, and had wanted to change the database schema to allow monitoring of scheduled jobs. I can see the advantages of this approach though.

My one concern about this approach is that in the past, I've had some jobs fail in such a way that the job never completes. I see a try...catch block here, so perhaps I'm thinking of a problem that no longer exists - but I'd be curious if there were scenarios in which we'd need to identify jobs that started but didn't finish.

Some examples I recall are:

  • A job execution time exceeds the PHP max_execution_time value (especially when triggered from within the web UI).
  • A malformed email in the bounce processing queue could cause the "bounce processing" job to not finish.
  • Certain bad SMTP settings (can't recall what) could lead the "Send Mailings" job to not finish.

@michau
Copy link

michau commented Nov 28, 2017

@MegaphoneJon Wouldn't this kind of situations (job starts but never finishes) be detectable when log entry is recorded for both preJob and postJob? You would just look for jobs that have odd number of entries in the log (and potentially something else, e.g. expected execution time is exceeded x 2).

@JoeMurray
Copy link
Contributor

I'm not sure there is a way to catch when a process is killed/dies, never completes because it gets blocked, but I will say that is probably the highest priority.

@JoeMurray JoeMurray closed this Nov 29, 2017
@JoeMurray
Copy link
Contributor

I'm comfortable with this approach.

@JoeMurray JoeMurray reopened this Nov 29, 2017
@totten
Copy link
Member

totten commented Nov 29, 2017

  1. I like how the code in this patch is very straight-forward to read. 👍

  2. "Jobs" are basically saved, recurring API calls, and these hooks wrap around the API call. As a matter of due diligence, I should point out that there are a couple generic ways to wrap around an API call -- eg Civi/API/Events.php ( to understand the events better, try running cv debug:event-dispatcher /api/ and reading some example code). Any thoughts on how this compares?

  3. <rant>Some of the code in JobManager gives me a weird vibe. I hesitate to comment because I don't want a problem with that context to reflect on this cleanly written patch, although it has shaped the patch a little. It starts from this incongruity: the hook signature passes $job and $params:

  • Every API call is a tuple of ($entity,$action,$params). So why does the hook only pass $params -- why are we missing $entity and $action? Those are important too!
  • Ah! The $job has $job->api_entity, $job->api_action, and $job->apiParams. But... we can rely on $job because it has everything... So why do we need a separate $params?
  • Well, apparently, there are these other edge-cases -- with setSingleRunParams() and executeJobByAction($entity, $action). For each request, you can pass in different params.

And this is where I get confused: A job is supposed to be a saved API call. If we're passing in the params on a per-request basis, then it's not a saved API call -- it's just a regular, plain old API call that's been hacked-up to look like a saved API call.

Now if we really push through this question... there are differences:

  • bin/cron.php (aka job system) authenticates via username+password+sitekey, whereas bin/rest.php (aka pure APIv3) authenticates via apikey+sitekey, and the CLI tools (drush cvapi/wp cvapi/cv api) just accept a username.
  • bin/cron.php (aka job system) logs the API call. The pure APIv3 endpoints (bin/rest.php, drush cvapi, etc) do not log.

These differences feel arbitrary to me -- they've conflated things that should be orthogonal:

  • Maybe I want to authenticate REST calls using username+password. (Why should only cron.php support that?)
  • Maybe I want to authenticate cron runs with a synthetic key. (Why should only rest.php support that?)
  • Maybe I want to log all calls to job.process_mailing, even if they're fired via drush cvapi or cv api. (Why should only cron.php support logging?)
  • Maybe I want to ignore logging for some noisy jobs? (Why should cron.php force all jobs to be logged?)

If we wanted to translate this rant into some kind of change... we'd probably beef-up authentication options for rest.php, beef-up general logging for all API calls, slim-down the Job stuff, and deprecate the use of cron.php?job=X&foo=Y.

But that's a bit of rant -- and probably outside the intended scope of this PR.

</rant>

  1. Recap: @rthouvenin, take a look at the generic API events. If you still think that job-specific hooks are necessary, then I'm OK with this. The implementation looks clean (r-code) and generally safe (r-user and r-tech), and the existence of https://github.com/WeMoveEU/sencivity implies that you are r-running it.

@rthouvenin
Copy link
Contributor Author

@totten Thanks for the feedback, and the rant too! Gives some insights and perspective :)
I did consider the api wrapper solution, but thought it doesn't work because in the event handlers I don't know if the API call being executed is a "scheduled" job or not: from what I can see the way the JobManager calls the API isn't different from usual. Or at least I didn't find a way to determine this, but if there is I'm happy to hear it.

@MegaphoneJon @JoeMurray I also think this is an important use case to catch. One solution is the one suggested by Michał: periodically check if there are jobs that started but haven't reported a result for too long. Monitoring tools such as Nagios or Sensu have built-in support of this: you can configure a check to emit an alert if no result was reported for a certain period of time. Of course this works only for periodic jobs. One-off jobs or jobs on events have to explicitely report the beginning or their execution, and another process emits an alert if no result was reported after a timeout.
In any case, I believe these solutions are possible with the hook implementation suggested in this PR.

@seancolsen
Copy link
Contributor

I haven't read this whole PR but I did notice that it appears to be adding two new hooks without any docs for those hooks. @rthouvenin can you add a PR to the Dev Guide with docs? Then we can merge that Dev Guide PR when this Core PR is merged. If you need help figuring out how to add the docs, ping me in Mattermost.

@rthouvenin
Copy link
Contributor Author

@seanmadsen Thanks for reminding. I did think about it, but wanted to have confirmation the change will be merged before spending time on documenting it.
I just created the PR: civicrm/civicrm-dev-docs#464

@eileenmcnaughton
Copy link
Contributor

Just revisting this - the recap was

"Recap: @rthouvenin, take a look at the generic API events. If you still think that job-specific hooks are necessary, then I'm OK with this. The implementation looks clean (r-code) and generally safe (r-user and r-tech), and the existence of https://github.com/WeMoveEU/sencivity implies that you are r-running it."

And I feel that @rthouvenin has done as asked so I'm going with " then I'm OK with this"

merging

@eileenmcnaughton eileenmcnaughton merged commit 0c53021 into civicrm:master Jun 9, 2018
@eileenmcnaughton eileenmcnaughton added the sig:extension support Supports being able to integrate extensions with CiviCRM label Jun 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
master sig:extension support Supports being able to integrate extensions with CiviCRM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants