Skip to content

Commit

Permalink
Update user-guide for the scheduler configurations
Browse files Browse the repository at this point in the history
  • Loading branch information
kyotoYaho committed Nov 1, 2022
1 parent df02487 commit 8335ea9
Showing 1 changed file with 29 additions and 6 deletions.
35 changes: 29 additions & 6 deletions docs/source/user-guide/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,20 +19,22 @@

# Configuration

## BallistaContext Configuration Settings

Ballista has a number of configuration settings that can be specified when creating a BallistaContext.

_Example: Specifying configuration options when creating a context_

```rust
let config = BallistaConfig::builder()
.set("ballista.shuffle.partitions", "200")
.set("ballista.batch.size", "16384")
.build()?;
.set("ballista.shuffle.partitions", "200")
.set("ballista.batch.size", "16384")
.build() ?;

let ctx = BallistaContext::remote("localhost", 50050, &config).await?;
let ctx = BallistaContext::remote("localhost", 50050, & config).await?;
```

## Ballista Configuration Settings
### Ballista Configuration Settings

| key | type | default | description |
| --------------------------------- | ------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
Expand All @@ -46,7 +48,7 @@ let ctx = BallistaContext::remote("localhost", 50050, &config).await?;
| ballista.with_information_schema | Boolean | true | Determines whether the `information_schema` should be created in the context. This is necessary for supporting DDL commands such as `SHOW TABLES`. |
| ballista.plugin_dir | Boolean | true | Specified a path for plugin files. Dynamic library files in this directory will be loaded when scheduler state initializes. |

## DataFusion Configuration Settings
### DataFusion Configuration Settings

In addition to Ballista-specific configuration settings, the following DataFusion settings can also be specified.

Expand All @@ -58,3 +60,24 @@ In addition to Ballista-specific configuration settings, the following DataFusio
| datafusion.explain.physical_plan_only | Boolean | false | When set to true, the explain statement will only print physical plans. |
| datafusion.optimizer.filter_null_join_keys | Boolean | false | When set to true, the optimizer will insert filters before a join between a nullable and non-nullable column to filter out nulls on the nullable side. This filter can add additional overhead when the file format does not fully support predicate push down. |
| datafusion.optimizer.skip_failed_rules | Boolean | true | When set to true, the logical plan optimizer will produce warning messages if any optimization rules produce errors and then proceed to the next rule. When set to false, any rules that produce errors will cause the query to fail. |

## Ballista Scheduler Configuration Settings

Besides the BallistaContext configuration settings, a few configuration settings for the Ballista scheduler to better
manage the whole cluster are also needed to be taken care of.

_Example: Specifying configuration options when starting the scheduler_

```shell
./ballista-scheduler --scheduler-policy push-staged --event-loop-buffer-size 1000000 --executor-slots-policy
round-robin-local
```

| key | type | default | description |
|------------------------------------------------|-----------|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| scheduler_policy | Utf8 | pull-staged | Sets the scheduing policy for the scheduler, possible values: pull-staged, push-staged. |
| event_loop_buffer_size | UInt32 | 10000 | Sets the event loop buffer size. for a system of high throughput, a larger value like 1000000 is recommended. |
| executor_slots_policy | Utf8 | bias | Sets the executor slots policy for the scheduler, possible values: bias, round-robin, round-robin-local. For a cluster with single scheduler, round-robin-local is recommended . |
| finished_job_data_clean_up_interval_seconds | UInt64 | 300 | Sets the delayed interval for cleaning up finished job data, mainly the shuffle data, 0 means the cleaning up is disabled. |
| finished_job_state_clean_up_interval_seconds | UInt64 | 3600 | Sets the delayed interval for cleaning up finished job state stored in the backend, 0 means the cleaning up is disabled. |
| advertise_flight_result_route_endpoint | Utf8 | N/A | Sets the route endpoint for proxying flight results via scheduler. |

0 comments on commit 8335ea9

Please sign in to comment.