General purpose job runner that uses OpenStack TaskFlow
This project is a Python job board to which jobs can be posted and to which job consumers on multiple machines can be attached. The jobs are TaskFlow flows, Python code that specifies an execute (forward) and revert (backward) method for each step. This allows you to create very complex processes where you have guarantees of what the end result will look like on either success or failure. This program attempts to be an extensible implementation of a generic TaskFlow non-blocking conductor with commandline tools for managing and visualizing running processes on a job board. You only have to write the jobs yourself, and those can run on the job board as a plugin of this project.
Clone the project, and check out your self written flows in the jobrunner/flows
directory.
In the jobrunner/flows/builtin
directory is an example of what flow definition might look
like. The entry point to post the flow can be registered with the register_job
decorator.
For documentation about how to write flows, look at this documentation.
- Create, activate and install the dependencies into a virtualenv
Change directory to the checkout of this project, then create the virtualenv.
# Make sure you have python-virtualenv installed
source activate_venv
- Set up a Redis and a MySQL somewhere
All your workers will need to be able to access it.
# Edit jobrunner/settings.py
JOBBOARD_CONF = {
'board': 'redis',
'host': '1.2.3.4'
}
PERSISTENCE_CONF = {
"connection": "mysql://taskflow:taskflow@[{}]"
"/taskflow".format('1.2.3.4'),
}
- Run a conductor
A conductor is a process that polls for new unclaimed jobs on the job board. When it claims a new job it will run that 'flow' in a thread until it has completed or failed.
# Run a conductor. You can run multiple conductors on multiple machines.
./bin/jobrunner run
- Post a job
By posting a job to the job board the work described in the job you are posting will be performed by the conductor that sees and locks the job first. A job consists of a couple of elements: a function that returns a TaskFlow Flow, arguments and keyword arguments for that function, and a 'store' which is a dict of variables available to the Tasks in the flow from the beginning.
# List all available jobs
./bin/jobrunner post --help
# Check the example job's help menu
./bin/jobrunner post simple_http_webserver --help
Post the sample job. This will run the default Python http web server on the conductor
that claims the posted jobs. This job will run indefinitely but it is a nice showcase
of the persistence feature of TaskFlow. When you kill the web server process it will
restart because of the Times
retry class on the flow, and when you kill the conductor
running the job it will stop refreshing the lock and another conductor will pick up the
flow at the Atom where it left of (being the execute/revert of the SimpleHTTPServer
TaskFlow Task
.
./bin/jobrunner post simple_http_webserver --port 8887
- View the job running on the job board
./bin/jobrunner show
Currently this project uses the default non-blocking TaskFlow conductor with the MySQL persistence backend and the Redis job board backend. This results in a batch processing system with the following characteristics:
This job runner runs OpenStack TaskFlow flows. The TaskFlow tooling, conventions and syntax are great for processes where a lot of uncertainty (like failing APIs and flaky servers) is involved and persistence and guarantees are required.
The progress of a flow is stored between each state transition. This means that if any machine running a flow goes offline, a conductor on another machine will see that the lock on the running job has become stale and pick up the work where the other machine left of.
This makes it so that if you have multiple machines running the job runner, you can program processes that run and will continue to run on any available machine as long as there is a machine running a conductor available.
This project abstracts out the whole job board and job executor part of a project implementing
TaskFlow. Basically every time I was implementing the same thing in different projects while
what I really wanted was just to run TaskFlow flows. This project takes away the overhead of
having to code up a conductor and functions that save flow details and logbooks to a job board.
Just register flows in the flows/
directory and this application will load them into the job
poster.
Jobs can be posted with a required 'capability'. This means that a conductor will evaluate a
condition and only claim the job if that condition evaluated to True
. An example of such
capability might be is_x86_64
or even dynamic things like port_is_free
where the port is
retrieved from the flow details in the conductor. See the capabilities documentation for
information about how to use and define these constraints.
Things I would like to add in the future:
- Consul KV backend instead of Redis or ZooKeeper
This would make it possible to not only run a distributed but a completely decentralized job board.
- Configurable Persistence and Job board backend
Currently this is hardcoded in the settings for convenience, but this should be configurable.
- Integration with raptiformica
The whole reason I wrote this program was so that I could have a way to distribute work over my cluster of random machines (mobile phones, laptops, etc). It would be nice to have something where I can declaratively describe my cluster network in where it posts flows to run certain services and processes based on what resources are available, and perhaps request resources based on the work available (like WOL some PC under my bed, or start some cloud instances) to catch up with work if the available workers can not keep up with the work waiting in the job queue. But that probably falls beyond the scope of this repo.