-
Notifications
You must be signed in to change notification settings - Fork 643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big job batching #12638
Big job batching #12638
Conversation
DEV-1062 Big job batching
To avoid long-running jobs (esp for lambda), we want to create a job type that supports batching. At the end of each batch, we will trigger the next batch. With this approach we don't have to worry about conflicts with offsets and elements saves while the job is running. Things to consider:
|
This looks great as a standard way of batching jobs and the implementation using the If one of the goals of this is to help avoid queue jobs from timing out, especially when executed via web requests, then I have some thoughts.
Otherwise, this looks like a really useful new addition. |
Thanks for looking at this, @bencroker!
That would only actually work as intended if there aren’t any additional jobs queued up that would take over during the delay. And those jobs may be of a lower priority than the original job (e.g. A better approach for sites with that really need to worry about timeouts would be to have each job executed in its own siloed request, which is how SQS queues work. (SQS/EBS provide the actual “queue runner”, and send a request to Craft for each individual job that needs to be executed.) So as long as any given job isn’t doing too much (e.g. batched jobs per this PR), it’s a non-issue.
Just added a check for memory usage (6586319). I’m not sure how effective a time limit check would be though, as it could only be accurate if it’s the first job executed in the request. So I think our approach of just cutting things off after a preconfigured number of items have been processed is going to generally be safer. (Worth noting, too, I’m pretty sure your memory check isn’t working quite as expected because you’re passing |
Looks good re the memory usage check and you make a good point re the time limit check. I’m guessing the job scheduler, planned for Craft 5.x, will require a cron job that should help address this issue in a more robust way.
Yeah I think you’re right on that one, thanks for the heads-up! |
Introduces a new
craft\queue\BaseBatchedJob
class, which can be extended by queue jobs that may need to split up their workload into multiple batches.Instead of
execute()
, batched jobs should implement:loadData()
, which returns an instance of the (also new)craft\base\Batchable
interfaceprocessItem()
, which handles whatever processing needs to happen for a single item in the current batchThe
Batchable
interface extends PHP’sCountable
. In addition tocount()
, batchable classes must also implementgetSlice()
, which returns a slice of the abstracted data, for a given offset and length.There’s also a new
craft\db\QueryBatcher
class which implementsBatchable
for a givenyii\db\QueryInterface
object (e.g. an element query). The vast majority of batched jobs will just need to return aBatchedQuery
fromloadData()
.The PR also converts three existing queue jobs into batched jobs:
craft\queue\jobs\ApplyNewPropagationMethod
(executed when a section or Matrix field has been assigned a new Propagation Method)craft\queue\jobs\PropagateElements
(executed when a new site is added, for all localizable element types)craft\queue\jobs\ResaveElements
(executed in some cases after a section or entry type is saved, or when aresave/*
action is run with the--queue
option)Updating existing jobs
Here’s a rough before/after for a queue job getting converted to batchable:
Before
After
Notes
batchSize
property. (Default is 100.)craft\helpers\Queue::push()
, not viaCraft::$app->queue->push()
. Otherwise the configuredpriority
andttr
settings will be lost on any additional spawned jobs.craft\db\QueryBatcher
should specify anorderBy
value, ideally by the primary key value in ascending order, so there’s less chance that any results get skipped or double-processed.serialize()
-friendly should be excluded via__sleep()
, and any private/protected properties will need to be reset to their default values via__wakeup()
to avoid uninitialized property errors.