Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big job batching #12638

Merged
merged 6 commits into from
Feb 10, 2023
Merged

Big job batching #12638

merged 6 commits into from
Feb 10, 2023

Conversation

brandonkelly
Copy link
Member

@brandonkelly brandonkelly commented Feb 6, 2023

Introduces a new craft\queue\BaseBatchedJob class, which can be extended by queue jobs that may need to split up their workload into multiple batches.

Instead of execute(), batched jobs should implement:

  • loadData(), which returns an instance of the (also new) craft\base\Batchable interface
  • processItem(), which handles whatever processing needs to happen for a single item in the current batch

The Batchable interface extends PHP’s Countable. In addition to count(), batchable classes must also implement getSlice(), which returns a slice of the abstracted data, for a given offset and length.

There’s also a new craft\db\QueryBatcher class which implements Batchable for a given yii\db\QueryInterface object (e.g. an element query). The vast majority of batched jobs will just need to return a BatchedQuery from loadData().

The PR also converts three existing queue jobs into batched jobs:

  • craft\queue\jobs\ApplyNewPropagationMethod (executed when a section or Matrix field has been assigned a new Propagation Method)
  • craft\queue\jobs\PropagateElements (executed when a new site is added, for all localizable element types)
  • craft\queue\jobs\ResaveElements (executed in some cases after a section or entry type is saved, or when a resave/* action is run with the --queue option)

Updating existing jobs

Here’s a rough before/after for a queue job getting converted to batchable:

Before

use craft\queue\BaseJob;

class MyJob extends BaseJob
{
    public function execute($queue): void
    {
        $elements = MyElementType::find()
            ->param('value')
            ->all();
        $total = count($elements);

        foreach ($elements as $i => $element) {
            $this->setProgress($queue, $i / $total);
            // process the element...
        }
    }
}

After

use craft\base\Batchable;
use craft\db\QueryBatcher;
use craft\queue\BaseBatchedJob;

class MyJob extends BaseBatchedJob
{
    protected function loadData(): Batchable
    {
        $query = MyElementType::find()
            ->param('value')
            ->orderBy(['elements.id' => SORT_ASC]);
        return new QueryBatcher($query);
    }

    protected function processItem(mixed $item)
    {
        /** @var MyElementType $item */
        // process the element...
    }
}

Notes

  • Only one job will be active in the queue at any given time. Each job will spawn the next batch’s job once it’s done processing the items in the current batch.
  • The number of items processed per batch/job is customizable, via the batchSize property. (Default is 100.)
  • If more than one job is needed based on the total number of items, “(batch X of Y)” will be appended to the job description in the control panel sidebar and Queue Manager utility.
  • Batched jobs should be added to the queue via craft\helpers\Queue::push(), not via Craft::$app->queue->push(). Otherwise the configured priority and ttr settings will be lost on any additional spawned jobs.
  • Database queries passed to craft\db\QueryBatcher should specify an orderBy value, ideally by the primary key value in ascending order, so there’s less chance that any results get skipped or double-processed.
  • Spawned jobs are cloned from the current job, so they will need to be mindful about that. Any public properties that are set to objects which aren’t serialize()-friendly should be excluded via __sleep(), and any private/protected properties will need to be reset to their default values via __wakeup() to avoid uninitialized property errors.

@brandonkelly brandonkelly requested a review from a team as a code owner February 6, 2023 21:39
@linear
Copy link

linear bot commented Feb 6, 2023

DEV-1062 Big job batching

To avoid long-running jobs (esp for lambda), we want to create a job type that supports batching.

At the end of each batch, we will trigger the next batch. With this approach we don't have to worry about conflicts with offsets and elements saves while the job is running.

Things to consider:

  • configurable batch size
  • batch size needs to account for multipliers (e.g. sites)
  • should we add next job w/ a delay, so that it wouldn't get picked up within the same queue/run command?

@bencroker
Copy link
Contributor

This looks great as a standard way of batching jobs and the implementation using the Batchable interface is perfect!

If one of the goals of this is to help avoid queue jobs from timing out, especially when executed via web requests, then I have some thoughts.

  1. Spawning another job immediately once a batch is finished will cause that job (and subsequent batch jobs) to be executed in the same web request (unless this behaviour changed at some point). One thing I do in the Campaign plugin is add the next job with a configurable delay, so that by default it will not be executed by the same request. This has drawbacks, but avoids all batch jobs being executed by a single request.
  2. This may be beyond the scope of what you want to do, but to further help avoid timeouts and memory limits being exceeded, the SendoutJob checks whether a memory limit or time limit has been reached and queues up the next job if either is true. These limits are determined by multiplying the server’s limits by a configurable threshold.

Otherwise, this looks like a really useful new addition.

@brandonkelly
Copy link
Member Author

brandonkelly commented Feb 10, 2023

Thanks for looking at this, @bencroker!

Spawning another job immediately once a batch is finished will cause that job (and subsequent batch jobs) to be executed in the same web request (unless this behaviour changed at some point). One thing I do in the Campaign plugin is add the next job with a configurable delay, so that by default it will not be executed by the same request. This has drawbacks, but avoids all batch jobs being executed by a single request.

That would only actually work as intended if there aren’t any additional jobs queued up that would take over during the delay. And those jobs may be of a lower priority than the original job (e.g. UpdateSearchIndex), so you‘d just end up pushing back the higher-priority job in favor of lower-priority jobs, without actually gaining any additional stability.

A better approach for sites with that really need to worry about timeouts would be to have each job executed in its own siloed request, which is how SQS queues work. (SQS/EBS provide the actual “queue runner”, and send a request to Craft for each individual job that needs to be executed.) So as long as any given job isn’t doing too much (e.g. batched jobs per this PR), it’s a non-issue.

This may be beyond the scope of what you want to do, but to further help avoid timeouts and memory limits being exceeded, the SendoutJob checks whether a memory limit or time limit has been reached and queues up the next job if either is true. These limits are determined by multiplying the server’s limits by a configurable threshold.

Just added a check for memory usage (6586319). I’m not sure how effective a time limit check would be though, as it could only be accurate if it’s the first job executed in the request. So I think our approach of just cutting things off after a preconfigured number of items have been processed is going to generally be safer.

(Worth noting, too, I’m pretty sure your memory check isn’t working quite as expected because you’re passing true into memory_get_usage(), which has it return the total memory that PHP has reserved from the system, rather than the total active memory actually being used within that allotment. memory_get_usage(true) should always return the same number throughout the request, regardless of how much of it is used.)

@brandonkelly brandonkelly merged commit b09b4df into 4.4 Feb 10, 2023
@brandonkelly brandonkelly deleted the feature/dev-1062-big-job-batching branch February 10, 2023 22:38
@bencroker
Copy link
Contributor

Just added a check for memory usage (6586319). I’m not sure how effective a time limit check would be though, as it could only be accurate if it’s the first job executed in the request. So I think our approach of just cutting things off after a preconfigured number of items have been processed is going to generally be safer.

Looks good re the memory usage check and you make a good point re the time limit check. I’m guessing the job scheduler, planned for Craft 5.x, will require a cron job that should help address this issue in a more robust way.

(Worth noting, too, I’m pretty sure your memory check isn’t working quite as expected because you’re passing true into memory_get_usage(), which has it return the total memory that PHP has resolved from the system, rather than the total active memory actually being used within that allotment. memory_get_usage(true) should always return the same number throughout the request, regardless of how much of it is used.)

Yeah I think you’re right on that one, thanks for the heads-up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants