Skip to content

RoadMap

Daniele Venzano edited this page Oct 17, 2018 · 3 revisions

The Zoe roadmap

Zoe was born as a research project to support academical research: as such we have broad objectives that touch fundamental research problems.

Scheduler architectures and resource allocation

In parallel to classic, stable and well known schedulers (FIFO), we plan to design and implement within Zoe novel approaches to application scheduling and resource allocation. This includes:

  • Optimistic, pessimistic, distributed, centralized schedulers
  • Distributed or centralized schedulers

Scheduling policies

While the FIFO policy is fine for many settings, is it not the most efficient way of managing work that can be done concurrently. Many decades of scheduling literature point in all sorts of directions, some of which can find new applications in analytic systems:

  • Appropriate management of batch Vs interactive Vs streaming analytic applications
  • Deadline scheduling for streaming frameworks
  • Size-based scheduling better utilization and smaller response times

Dynamic resource allocation

Users are usually bad guessers on how many resources a particular application will need. We all have a tendency of overestimating resource reservations to make sure there is some headroom for unplanned spikes. This overestimation causes low utilization and non-efficient resource usage: with better reservation and allocation mechanisms that can adapt at runtime, more work could be done with the same resources.

  • Resize dynamically running applications in terms of number of services
  • Resize dynamically running applications in terms of memory and cores allocated for each service

Fault tolerance

Any modern system must be able to cope with faults and failures of any kind. The Zoe front-end and back-end need the ability to be replicated for fault tolerance and performance.

Other features planned, but without a release date yet

This list includes more down-to-Earth features that deal with our users and system administrator requests.

  • Availability zones: Administrators should be able to partition resources to isolate certain users
  • Mark hosts as offline: to upgrade the host operating system and perform other maintenance operations, administrators should have the ability of temporarily taking an host offline
  • Priorities: implement a multi-queue scheduler to manage queues of ZApps with different priorities. The focus for this bullet point is more on the administrator/sysdamin manageability than on having fancy scheduling techniques.
  • Scripting and APIs: advanced users want to run scripted experiments on Zoe. Improve the APIs and the scripting usability of the command-line tools
  • Storage: workspaces have many limitations and permission problems. Is there a better way to do storage for ZApps?