Skip to content

SedarG/job-scheduler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 

Repository files navigation

job-scheduler

Overview

This is a classic systems-design interview question. It could help you understand what it means to design a job-scheduler.

The goal of the job scheduler is to provide a repository for users to schedule jobs and do CRUD on their jobs efficiently while providing an efficient algorithm to find out what's due.

Algorithm

The data structure used to store jobs is a combination of a HashMap<userId, HashMap<jobId, Job>> and HierarchicalTimerWheel of Jobs. When a job is created, it will be recorded in both of these data structures and HTW will efficiently find the jobs that are due and the jobs will be executed. For demonstration purposes, the execution of the job is calling a Logging service to record the execution.

HierarchicalTimerWheel (HTW)

HTW is a stack of circular buffers, representing buckets of layered granularities. It's based on Hashed and hierarchical timing wheels: data structures for the efficient implementation of a timer facility. I then used a pair of these data structures to store the scheduled jobs in the respective slots. Each HTW is capable of storing jobs that are due in a fixed offset of time. The first HTW keeps the imminent jobs, the second one keeps the jobs that are due in the next window. When the first HTW is drained, the algorithm makes the second one the imminent and creates a new HTW for next window.

Status

The basic functionality is implemented and functional. Beyond that is T.B.D. for now. Fun things to add:

  • Abstractions on APIs and Storage at the service/process level
  • Partitioning and Replication
  • Ser/Deserialization to a file format to persist on a durable data store + WAL
  • Allow jobs to define a set of dependencies, build a graph beside the HashMap and HTW to efficiently calculate the dependency closure.
  • Real executables as jobs
  • Efficient and durable storage of executable packages/container images

Running the example

  • On a terminal, start the logging service using gradle:
~/job-scheduler/logging % ./gradlew bootRun

The logging service is hosted on http://localhost:8081

  • On another terminal, start the scheduler service, the same way
~/job-scheduler/scheduler % ./gradlew bootRun
  • Create a new job
% curl --header "Content-Type: application/json" \
--request POST \
--data "{\"userId\": 1, \"jobId\": 1, \"nextSchedule\":`date -d '+1 minutes' +%s`000 }" \ 
 http://localhost:8080/v1/users/1/jobs
  • 1 minute later, you can query the logging service to see the log of the job
% curl http://localhost:8081/v1/logs

The log line would look like

[{"timestamp":1684431087846,"entry":"Running {userID: 1, jobId: 1, nextSchedule: 1684431087000, recurrence: None"}]

The timestamp is the system time of the scheduler showing the time the log entry was created. nextSchedule shows the scheduling time of the task. In the above example the task is created 846 milliseconds after its scheduled time.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages