Skip to content

Latest commit

 

History

History
82 lines (60 loc) · 4.28 KB

architecture.md

File metadata and controls

82 lines (60 loc) · 4.28 KB

Ballista Architecture

Overview

Ballista allows queries to be executed in a distributed cluster. A cluster consists of one or more scheduler processes and one or more executor processes.

The scheduler accepts logical query plans and translates them into physical query plans using DataFusion and then runs a secondary planning process to translate the physical query plan into a distributed physical query plan by replacing any operator in the DataFusion plan which performs a repartition with a stage boundary (i.e. a shuffle exchange).

This results in a plan that contains a number of query stages that can be executed independently. There are dependencies between query stages and these dependencies form a directionally-acyclic graph (DAG) because a query stage cannot start until its child query stages have completed.

Each query stage has one or more partitions that can be processed in parallel by the available executors in the cluster. This is the basic unit of scalability in Ballista.

The output of each query stage is persisted to disk and future query stages will request this data from the executors that produced it. The persisted output will be partitioned according to the partitioning scheme that was defined for the query stage and this typically differs from the partitioning scheme of the query stage that will consume this intermediate output since it is the changes in partitioning in the plan that define the query stage boundaries.

This exchange of data between query stages is called a "shuffle exchange" in Apache Spark.

The following diagram shows the flow of requests and responses between the client, scheduler, and executor processes.

Query Execution Flow

Scheduler Process

The scheduler process implements a gRPC interface (defined in ballista.proto). The interface provides the following methods:

Method Description
ExecuteQuery Submit a logical query plan or SQL query for execution
GetExecutorsMetadata Retrieves a list of executors that have registered with a scheduler
GetFileMetadata Retrieve metadata about files available in the cluster file system
GetJobStatus Get the status of a submitted query
RegisterExecutor Executors call this method to register themselves with the scheduler

The scheduler can run in standalone mode, or can be run in clustered mode using etcd as backing store for state.

Executor Process

The executor process implements the Apache Arrow Flight gRPC interface and is responsible for:

  • Connecting to the scheduler and requesting tasks to execute
  • Executing tasks within a query stage and persisting the results to disk in Apache Arrow IPC Format
  • Making query stage output partitions available as "Flights" so that they can be retrieved by other executors as well as by clients

Rust Client

The Rust client provides a BallistaContext that allows queries to be built using DataFrames or SQL (or both).

The client executes the query plan by submitting an ExecuteQuery request to the scheduler and then calls GetJobStatus to check for completion. On completion, the client receives a list of locations for the Flights containing the results for the query and will then connect to the appropriate executor processes to retrieve those results.