'cron' syntax (eg */5 1,2,3 * * *
) is a way of specifying times for jobs to run, popularised by crontab
, widely supported in other systems like systemd
, Amazon EventBridge Scheduler, etc.
Tools like https://crontab.guru/ are great to explore cron expressions, but note that the fields used in cron syntax can vary, so check the documentation for the system you're working on.
Prefer systemd timers (configured in /etc/systemd/system/foo.timer
) to crontab
jobs (configured in eg /etc/crontab
):
- Logging :
systemd
provides better logging by default (e.g.journalctl -u foo.timer
to see the logs).crontab
may attempt to email job output - this will fail as we typically don't have EC2 boxes configured to send email. - Security :
crontab
jobs can be unintentionally run asroot
(with all-powerful superuser access) if you're not careful .systemd
encourages you to be explicit about which user the task should run as - this should be a user with a restricted set of permissions, not root! - Timezones : Ubuntu
crontab
uses the server timezone to schedule jobs, while systemd allows custom timezone scheduling (e.g. you can schedule a job to start before office-hours, etc).
See also https://opensource.com/article/20/7/systemd-timers.
crontab
and similar solutions won't always be suitable for multi-instance apps. For example, if you have more than one instance but only need to run the scheduled task once (common if the task has side effects, like sending email).1
In this case, if the task can be triggered via a request to a HTTPS endpoint then the app's load balancer can ensure that at most one instance of the app receives the request and runs the task.
Scheduling the request from outside the app itself can be done in multiple ways (e.g. a scheduled Lambda), but one good solution is to use EventBridge rules with the API destination integration. This allows EventBridge rules to make authenticated calls to a specified endpoint on a schedule.
- All of the moving pieces required to create the schedule and trigger can be configured as part of your infrastructure, e.g. via the AWS CDK. So there's no need to write business logic or maintain a Lambda with the dependencies required to make the HTTP requests.
- No need to hand roll scheduling and retry logic because this is baked in to the EventBridge framework.
- EventBridge requests to an API destination endpoint have a maximum timeout of 5 seconds. So if your task takes more than 5 seconds, and you need the caller to be aware of its outcome (e.g. to enable retries on failure) this approach won't be suitable.
- nb. More generally, make sure to check the default timeout and retry settings to avoid swamping your service.
- The AWS constructs require a fair amount of boilerplate CDK code, but there are examples in the Guardian estate that could be used as a basis, e.g. in the crosswords status checker.
An AWS::Events::Rule
can invoke an AWS Lambda at regular intervals.
Footnotes
-
There are a number of examples of this pattern being used in the crosswordv2 repo, e.g. for the status checker and helpline service. ↩