Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster state refactor Part 2 #658

Merged
merged 28 commits into from
Feb 11, 2023
Merged

Cluster state refactor Part 2 #658

merged 28 commits into from
Feb 11, 2023

Conversation

thinkharderdev
Copy link
Contributor

Which issue does this PR close?

Closes #554

Rationale for this change

See original ticket for full description but the tl;dr; is:

  1. The existing interface for managing shared state among schedulers is too low-level and both pushes too much complexity into other layers and prevents us from taking advanatage of data store specific freatures (atomics, transactions, etc) to avoid locking in the application layer.
  2. By forcing everything through the KV interface we force all the serialization overhead even if we are only using in-memory state.

What changes are included in this PR?

There are a lot of changes because the StateBackendClient was baked in everywhere but at a high level:

  1. Create JobState trait to complement ClusterState created in last PR. This in the interface for storing/managing global state w/r/t jobs and sessions.
  2. Refactor in-memory implementation to not serialize things to protobuf
  3. Provide KeyValueState implementation based on existing KV interface so we can continue to use etcd and sled for state without any changes.
  4. Take advantage of curated job architecture to minimize distributed locking. Since jobs are owned by a single scheduler we don't need to lock beyond the node level.
  5. Cleanup configs.
  6. Move everything related to state the ballista_scheduler::cluster module

I removed the separate configs for cluster-backend and config-backend because it seems confusing. So currently if you want to use different implementations for the ClusterState and JobState you would need to write your own entrypoint. We might introduce standard "profiles" (standalone/ha/etc) for setting up the two state backends on different implementations but for now it seems too noisy config wise.

NOTE: I was not able to run the integration tests using the standard configuration due to weird timeouts. When using push scheduling everything worked fine so not sure what's going on. I also checked on main and see the same issue so maybe something related to the M1 macbook?

Are there any user-facing changes?

The storage layout for etcd and sled are simplified so existing state would not be valid after upgrade.

thinkharderdev and others added 23 commits November 17, 2022 15:54
* Use node-level local limit

* serialize limit in shuffle writer

* Revert "Merge pull request #19 from coralogix/sc-5792"

This reverts commit 08140ef, reversing
changes made to a7f1384.

* add log

* make sure we don't forget limit for shuffle writer

* update accum correctly and try to break early

* Check local limit accumulator before polling for more data

* fix build

Co-authored-by: Martins Purins <[email protected]>
#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not
@thinkharderdev
Copy link
Contributor Author

# Conflicts:
#	ballista/core/src/serde/generated/ballista.rs
#	ballista/core/src/serde/mod.rs
#	ballista/scheduler/src/bin/main.rs
#	ballista/scheduler/src/cluster/storage/cluster.rs
#	ballista/scheduler/src/cluster/storage/mod.rs
#	ballista/scheduler/src/cluster/storage/sled.rs
#	ballista/scheduler/src/config.rs
#	ballista/scheduler/src/scheduler_process.rs
#	ballista/scheduler/src/scheduler_server/grpc.rs
#	ballista/scheduler/src/scheduler_server/mod.rs
#	ballista/scheduler/src/standalone.rs
#	ballista/scheduler/src/state/backend/memory.rs
#	ballista/scheduler/src/state/execution_graph.rs
#	ballista/scheduler/src/state/execution_graph_dot.rs
#	ballista/scheduler/src/state/executor_manager.rs
#	ballista/scheduler/src/state/mod.rs
#	ballista/scheduler/src/test_utils.rs
pub mod storage;

#[cfg(test)]
#[allow(clippy::uninlined_format_args)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what else to do here. Clippy complains when I inline variables in the format string in the assert and also complains when they are not inlined so just disabling this lint for the module.

@thinkharderdev
Copy link
Contributor Author

Not sure why this is failing (https://github.com/apache/arrow-ballista/actions/runs/4123347318/jobs/7121701363) but it doesn't seem to be an issue with the actual formatting

let addr = addr.parse()?;

let config = SchedulerConfig {
namespace: opt.namespace,
external_host: opt.external_host,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this PR drops bind_host? That can be different from external_host, so I'm not sure we can do that.

Or maybe that is handled somewhere else in this PR and I haven't seen it yet?

Copy link
Member

@andygrove andygrove Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like bind_host is not currently used in the scheduler, but it is in the executor, so this seems like an oversight.

cc @avantgardnerio who may have opinions here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this is slightly confusing :). It is still used to build the bind address for the server

    let addr = format!("{}:{}", opt.bind_host, opt.bind_port);
    let addr = addr.parse()?;

The reason external_host was added to SchedulerConfig is so that we can pass it in to start_server and build the BallistaCluster there only from the SchedulerConfig. In order to do that I had to pass a few more things from the Config struct generated by configure_me which isn't available in the lib code.

So really the only point is to further minimize what we do in main.rs to just setting up logging and validating CLI arguments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I believe external_host is required for FlightSQL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stand corrected - it appears to have morphed into its own advertise_flight_sql_endpoint.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think we can have a follow-on PR for the scheduler bind_host issue.

@thinkharderdev
Copy link
Contributor Author

I'll plan on merging this weekend unless someone else would like more time to review.

@andygrove andygrove merged commit e7f8774 into main Feb 11, 2023
@andygrove andygrove deleted the cluster-state-refactor-2 branch February 11, 2023 13:40
fsdvh added a commit to coralogix/arrow-ballista that referenced this pull request Feb 17, 2023
* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not

* Prepare 0.10.0 Release (apache#522)

* bump version

* CHANGELOG

* Ballista gets a docker image!!! (apache#521)

* Ballista gets a docker image!!!

* Enable flight sql

* Allow executing startup script

* Allow executing executables

* Clippy

* Remove capture group (apache#527)

* fix python build in CI (apache#528)

* fix python build in CI

* save progress

* use same min rust version in all crates

* fix

* use image from pyo3

* use newer image from pyo3

* do not require protoc

* wheels now generated

* rat - exclude generated file

* Update docs for simplified instructions (apache#532)

* Update docs for simplified instructions

* Fix whoopsie

* Update docs/source/user-guide/flightsql.md

Co-authored-by: Andy Grove <[email protected]>

Co-authored-by: Andy Grove <[email protected]>

* remove --locked (apache#533)

* Bump actions/labeler from 4.0.2 to 4.1.0 (apache#525)

* Provide a memory StateBackendClient (apache#523)

* Rename StateBackend::Standalone to StateBackend:Sled

* Copy utility files from sled crate since they cannot be used directly

* Provide a memory StateBackendClient

* Fix dashmap deadlock issue

* Fix for the comments

Co-authored-by: yangzhong <[email protected]>

* only build docker images on rc tags (apache#535)

* docs: fix style in the Helm readme (apache#551)

* Fix Helm chart's image format (apache#550)

* Update datafusion requirement from 14.0.0 to 15.0.0 (apache#552)

* Update datafusion requirement from 14.0.0 to 15.0.0

* Fix UT

* Fix python

* Fix python

* Fix Python

Co-authored-by: yangzhong <[email protected]>

* Make it concurrently to launch tasks to executors (apache#557)

* Make it concurrently to launch tasks to executors

* Refine for comments

Co-authored-by: yangzhong <[email protected]>

* fix(ui): fix last seen (apache#562)

* Support Alibaba Cloud OSS with ObjectStore (apache#567)

* Fix cargo clippy (apache#571)

Co-authored-by: yangzhong <[email protected]>

* Super minor spelling error (apache#573)

* Update env_logger requirement from 0.9 to 0.10 (apache#539)

Updates the requirements on [env_logger](https://github.com/rust-cli/env_logger) to permit the latest version.
- [Release notes](https://github.com/rust-cli/env_logger/releases)
- [Changelog](https://github.com/rust-cli/env_logger/blob/main/CHANGELOG.md)
- [Commits](rust-cli/env_logger@v0.9.0...v0.10.0)

---
updated-dependencies:
- dependency-name: env_logger
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update graphviz-rust requirement from 0.4.0 to 0.5.0 (apache#574)

Updates the requirements on [graphviz-rust](https://github.com/besok/graphviz-rust) to permit the latest version.
- [Release notes](https://github.com/besok/graphviz-rust/releases)
- [Changelog](https://github.com/besok/graphviz-rust/blob/master/CHANGELOG.md)
- [Commits](https://github.com/besok/graphviz-rust/commits)

---
updated-dependencies:
- dependency-name: graphviz-rust
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* updated readme to contain correct versions of dependencies. (apache#580)

* Fix benchmark image link (apache#596)

* Add support for Azure (apache#599)

* Remove outdated script and use evergreen version of rust (apache#597)

* Remove outdated script and use evergreen version of rust

* Use debian protobuf

* feat: update script such that ballista-cli image is built as well (apache#601)

* Fix Cargo.toml format issue (apache#616)

* Refactor executor main (apache#614)

* Refactor executor main

* copy all configs

* toml fmt

* Refactor scheduler main (apache#615)

* refactor scheduler main

* toml fmt

* Python: add method to get explain output as a string (apache#593)

* Update contributor guide (apache#617)

* Cluster state refactor part 1 (apache#560)

* Customize session builder

* Add setter for executor slots policy

* Construct Executor with functions

* Add queued and completed timestamps to successful job status

* Add public methods to SchedulerServer

* Public method for getting execution graph

* Public method for stage metrics

* Use node-level local limit (#20)

* Use node-level local limit

* serialize limit in shuffle writer

* Revert "Merge pull request #19 from coralogix/sc-5792"

This reverts commit 08140ef, reversing
changes made to a7f1384.

* add log

* make sure we don't forget limit for shuffle writer

* update accum correctly and try to break early

* Check local limit accumulator before polling for more data

* fix build

Co-authored-by: Martins Purins <[email protected]>

* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not

* Add ClusterState trait

* Refactor slightly for clarity

* Revert "Use node-level local limit (#20)"

This reverts commit ff96bcd.

* Revert "Public method for stage metrics"

This reverts commit a802315.

* Revert "Public method for getting execution graph"

This reverts commit 490bda5.

* Revert "Add public methods to SchedulerServer"

This reverts commit 5ad27c0.

* Revert "Add queued and completed timestamps to successful job status"

This reverts commit c615fce.

* Revert "Construct Executor with functions"

This reverts commit 24d4830.

* Always forget the apache header

Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>

* replace master with main (apache#621)

* implement new release process (apache#623)

* add docs on who can release (apache#632)

* Upgrade to DataFusion 16 (again) (apache#636)

* Update datafusion dependency to the latest version (apache#612)

* Update datafusion dependency to the latest version

* Fix python

* Skip ut of test_window_lead due to apache/datafusion-python#135

* Fix clippy

---------

Co-authored-by: yangzhong <[email protected]>

* Upgrade to DataFusion 17 (apache#639)

* Upgrade to DF 17

* Restore original error handling functionality

* check in benchmark image (apache#647)

* Remove `python` dir & python-related workflows (apache#654)

* refactor: remove python dir & python-related workflows

* remove brackets

* Handle job resubmission (apache#586)

* Handle job resubmission

* Make resubmission configurable and add test

* Fix debug log

* Add executor self-registration mechanism in the heartbeat service (apache#649)

Co-authored-by: yangzhong <[email protected]>

* Cluster state refactor Part 2 (apache#658)

* Customize session builder

* Add setter for executor slots policy

* Construct Executor with functions

* Add queued and completed timestamps to successful job status

* Add public methods to SchedulerServer

* Public method for getting execution graph

* Public method for stage metrics

* Use node-level local limit (#20)

* Use node-level local limit

* serialize limit in shuffle writer

* Revert "Merge pull request #19 from coralogix/sc-5792"

This reverts commit 08140ef, reversing
changes made to a7f1384.

* add log

* make sure we don't forget limit for shuffle writer

* update accum correctly and try to break early

* Check local limit accumulator before polling for more data

* fix build

Co-authored-by: Martins Purins <[email protected]>

* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not

* Add ClusterState trait

* Refactor slightly for clarity

* Revert "Use node-level local limit (#20)"

This reverts commit ff96bcd.

* Revert "Public method for stage metrics"

This reverts commit a802315.

* Revert "Public method for getting execution graph"

This reverts commit 490bda5.

* Revert "Add public methods to SchedulerServer"

This reverts commit 5ad27c0.

* Revert "Add queued and completed timestamps to successful job status"

This reverts commit c615fce.

* Revert "Construct Executor with functions"

This reverts commit 24d4830.

* Always forget the apache header

* WIP

* Implement JobState

* Tests and fixes

* do not hold ref across await point

* Fix clippy warnings

* Fix tomlfmt github action

* uncomment test

---------

Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>

* Upgrade to DataFusion 18.0.0-rc1 (apache#664)

* Add executor terminating status for graceful shutdown

* Remove empty file

* Minor refactor to reduce duplicate code (apache#659)

* move test_util to ballista-examples package (apache#661)

* Upgrade to DataFusion 18 (apache#668)

* Enable physical plan round-trip tests (apache#666)

* Customize session builder

* Construct Executor with functions

* Add public methods to SchedulerServer

* Public method for getting execution graph

* Public method for stage metrics

* Use node-level local limit (#20)

* Use node-level local limit

* serialize limit in shuffle writer

* Revert "Merge pull request #19 from coralogix/sc-5792"

This reverts commit 08140ef, reversing
changes made to a7f1384.

* add log

* make sure we don't forget limit for shuffle writer

* update accum correctly and try to break early

* Check local limit accumulator before polling for more data

* fix build

Co-authored-by: Martins Purins <[email protected]>

* Add ClusterState trait

* Expose active job count

* Make parse_physical_expr public

* Fix job submitted metric by ignoring resubmissions

* Record when job is queued in scheduler metrics (#28)

* Record when job is queueud in scheduler metrics

* add additional buckets for exec times

* Upstream rebase (#29)

* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not

* Prepare 0.10.0 Release (apache#522)

* bump version

* CHANGELOG

* Ballista gets a docker image!!! (apache#521)

* Ballista gets a docker image!!!

* Enable flight sql

* Allow executing startup script

* Allow executing executables

* Clippy

* Remove capture group (apache#527)

* fix python build in CI (apache#528)

* fix python build in CI

* save progress

* use same min rust version in all crates

* fix

* use image from pyo3

* use newer image from pyo3

* do not require protoc

* wheels now generated

* rat - exclude generated file

* Update docs for simplified instructions (apache#532)

* Update docs for simplified instructions

* Fix whoopsie

* Update docs/source/user-guide/flightsql.md

Co-authored-by: Andy Grove <[email protected]>

Co-authored-by: Andy Grove <[email protected]>

* remove --locked (apache#533)

* Bump actions/labeler from 4.0.2 to 4.1.0 (apache#525)

* Provide a memory StateBackendClient (apache#523)

* Rename StateBackend::Standalone to StateBackend:Sled

* Copy utility files from sled crate since they cannot be used directly

* Provide a memory StateBackendClient

* Fix dashmap deadlock issue

* Fix for the comments

Co-authored-by: yangzhong <[email protected]>

* only build docker images on rc tags (apache#535)

* docs: fix style in the Helm readme (apache#551)

* Fix Helm chart's image format (apache#550)

* Update datafusion requirement from 14.0.0 to 15.0.0 (apache#552)

* Update datafusion requirement from 14.0.0 to 15.0.0

* Fix UT

* Fix python

* Fix python

* Fix Python

Co-authored-by: yangzhong <[email protected]>

* Make it concurrently to launch tasks to executors (apache#557)

* Make it concurrently to launch tasks to executors

* Refine for comments

Co-authored-by: yangzhong <[email protected]>

* fix(ui): fix last seen (apache#562)

* Support Alibaba Cloud OSS with ObjectStore (apache#567)

* Fix cargo clippy (apache#571)

Co-authored-by: yangzhong <[email protected]>

* Super minor spelling error (apache#573)

* Update env_logger requirement from 0.9 to 0.10 (apache#539)

Updates the requirements on [env_logger](https://github.com/rust-cli/env_logger) to permit the latest version.
- [Release notes](https://github.com/rust-cli/env_logger/releases)
- [Changelog](https://github.com/rust-cli/env_logger/blob/main/CHANGELOG.md)
- [Commits](rust-cli/env_logger@v0.9.0...v0.10.0)

---
updated-dependencies:
- dependency-name: env_logger
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update graphviz-rust requirement from 0.4.0 to 0.5.0 (apache#574)

Updates the requirements on [graphviz-rust](https://github.com/besok/graphviz-rust) to permit the latest version.
- [Release notes](https://github.com/besok/graphviz-rust/releases)
- [Changelog](https://github.com/besok/graphviz-rust/blob/master/CHANGELOG.md)
- [Commits](https://github.com/besok/graphviz-rust/commits)

---
updated-dependencies:
- dependency-name: graphviz-rust
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* updated readme to contain correct versions of dependencies. (apache#580)

* Fix benchmark image link (apache#596)

* Add support for Azure (apache#599)

* Remove outdated script and use evergreen version of rust (apache#597)

* Remove outdated script and use evergreen version of rust

* Use debian protobuf

* Customize session builder

* Add setter for executor slots policy

* Construct Executor with functions

* Add queued and completed timestamps to successful job status

* Add public methods to SchedulerServer

* Public method for getting execution graph

* Public method for stage metrics

* Use node-level local limit (#20)

* Use node-level local limit

* serialize limit in shuffle writer

* Revert "Merge pull request #19 from coralogix/sc-5792"

This reverts commit 08140ef, reversing
changes made to a7f1384.

* add log

* make sure we don't forget limit for shuffle writer

* update accum correctly and try to break early

* Check local limit accumulator before polling for more data

* fix build

Co-authored-by: Martins Purins <[email protected]>

* Add ClusterState trait

* Expose active job count

* Remove println

* Resubmit jobs when no resources available for scheduling

* Make parse_physical_expr public

* Reduce log spam

* Fix job submitted metric by ignoring resubmissions

* Record when job is queued in scheduler metrics (#28)

* Record when job is queueud in scheduler metrics

* add additional buckets for exec times

* fmt

* clippy

* tomlfmt

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: yahoNanJing <[email protected]>
Co-authored-by: yangzhong <[email protected]>
Co-authored-by: Xin Hao <[email protected]>
Co-authored-by: Duyet Le <[email protected]>
Co-authored-by: r.4ntix <[email protected]>
Co-authored-by: Jeremy Dyer <[email protected]>
Co-authored-by: Sai Krishna Reddy Lakkam <[email protected]>
Co-authored-by: Aidan Kovacic <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Dan Harris <[email protected]>

* Update from upstream (#30)

* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not

* Prepare 0.10.0 Release (apache#522)

* bump version

* CHANGELOG

* Ballista gets a docker image!!! (apache#521)

* Ballista gets a docker image!!!

* Enable flight sql

* Allow executing startup script

* Allow executing executables

* Clippy

* Remove capture group (apache#527)

* fix python build in CI (apache#528)

* fix python build in CI

* save progress

* use same min rust version in all crates

* fix

* use image from pyo3

* use newer image from pyo3

* do not require protoc

* wheels now generated

* rat - exclude generated file

* Update docs for simplified instructions (apache#532)

* Update docs for simplified instructions

* Fix whoopsie

* Update docs/source/user-guide/flightsql.md

Co-authored-by: Andy Grove <[email protected]>

Co-authored-by: Andy Grove <[email protected]>

* remove --locked (apache#533)

* Bump actions/labeler from 4.0.2 to 4.1.0 (apache#525)

* Provide a memory StateBackendClient (apache#523)

* Rename StateBackend::Standalone to StateBackend:Sled

* Copy utility files from sled crate since they cannot be used directly

* Provide a memory StateBackendClient

* Fix dashmap deadlock issue

* Fix for the comments

Co-authored-by: yangzhong <[email protected]>

* only build docker images on rc tags (apache#535)

* docs: fix style in the Helm readme (apache#551)

* Fix Helm chart's image format (apache#550)

* Update datafusion requirement from 14.0.0 to 15.0.0 (apache#552)

* Update datafusion requirement from 14.0.0 to 15.0.0

* Fix UT

* Fix python

* Fix python

* Fix Python

Co-authored-by: yangzhong <[email protected]>

* Make it concurrently to launch tasks to executors (apache#557)

* Make it concurrently to launch tasks to executors

* Refine for comments

Co-authored-by: yangzhong <[email protected]>

* fix(ui): fix last seen (apache#562)

* Support Alibaba Cloud OSS with ObjectStore (apache#567)

* Fix cargo clippy (apache#571)

Co-authored-by: yangzhong <[email protected]>

* Super minor spelling error (apache#573)

* Update env_logger requirement from 0.9 to 0.10 (apache#539)

Updates the requirements on [env_logger](https://github.com/rust-cli/env_logger) to permit the latest version.
- [Release notes](https://github.com/rust-cli/env_logger/releases)
- [Changelog](https://github.com/rust-cli/env_logger/blob/main/CHANGELOG.md)
- [Commits](rust-cli/env_logger@v0.9.0...v0.10.0)

---
updated-dependencies:
- dependency-name: env_logger
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update graphviz-rust requirement from 0.4.0 to 0.5.0 (apache#574)

Updates the requirements on [graphviz-rust](https://github.com/besok/graphviz-rust) to permit the latest version.
- [Release notes](https://github.com/besok/graphviz-rust/releases)
- [Changelog](https://github.com/besok/graphviz-rust/blob/master/CHANGELOG.md)
- [Commits](https://github.com/besok/graphviz-rust/commits)

---
updated-dependencies:
- dependency-name: graphviz-rust
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* updated readme to contain correct versions of dependencies. (apache#580)

* Fix benchmark image link (apache#596)

* Add support for Azure (apache#599)

* Remove outdated script and use evergreen version of rust (apache#597)

* Remove outdated script and use evergreen version of rust

* Use debian protobuf

* feat: update script such that ballista-cli image is built as well (apache#601)

* Fix Cargo.toml format issue (apache#616)

* Refactor executor main (apache#614)

* Refactor executor main

* copy all configs

* toml fmt

* Refactor scheduler main (apache#615)

* refactor scheduler main

* toml fmt

* Python: add method to get explain output as a string (apache#593)

* Update contributor guide (apache#617)

* Cluster state refactor part 1 (apache#560)

* Customize session builder

* Add setter for executor slots policy

* Construct Executor with functions

* Add queued and completed timestamps to successful job status

* Add public methods to SchedulerServer

* Public method for getting execution graph

* Public method for stage metrics

* Use node-level local limit (#20)

* Use node-level local limit

* serialize limit in shuffle writer

* Revert "Merge pull request #19 from coralogix/sc-5792"

This reverts commit 08140ef, reversing
changes made to a7f1384.

* add log

* make sure we don't forget limit for shuffle writer

* update accum correctly and try to break early

* Check local limit accumulator before polling for more data

* fix build

Co-authored-by: Martins Purins <[email protected]>

* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not

* Add ClusterState trait

* Refactor slightly for clarity

* Revert "Use node-level local limit (#20)"

This reverts commit ff96bcd.

* Revert "Public method for stage metrics"

This reverts commit a802315.

* Revert "Public method for getting execution graph"

This reverts commit 490bda5.

* Revert "Add public methods to SchedulerServer"

This reverts commit 5ad27c0.

* Revert "Add queued and completed timestamps to successful job status"

This reverts commit c615fce.

* Revert "Construct Executor with functions"

This reverts commit 24d4830.

* Always forget the apache header

Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>

* replace master with main (apache#621)

* implement new release process (apache#623)

* add docs on who can release (apache#632)

* Upgrade to DataFusion 16 (again) (apache#636)

* Update datafusion dependency to the latest version (apache#612)

* Update datafusion dependency to the latest version

* Fix python

* Skip ut of test_window_lead due to apache/datafusion-python#135

* Fix clippy

---------

Co-authored-by: yangzhong <[email protected]>

* Upgrade to DataFusion 17 (apache#639)

* Upgrade to DF 17

* Restore original error handling functionality

* Customize session builder

* Construct Executor with functions

* Add queued and completed timestamps to successful job status

* Add public methods to SchedulerServer

* Public method for getting execution graph

* Public method for stage metrics

* Use node-level local limit (#20)

* Use node-level local limit

* serialize limit in shuffle writer

* Revert "Merge pull request #19 from coralogix/sc-5792"

This reverts commit 08140ef, reversing
changes made to a7f1384.

* add log

* make sure we don't forget limit for shuffle writer

* update accum correctly and try to break early

* Check local limit accumulator before polling for more data

* fix build

Co-authored-by: Martins Purins <[email protected]>

* Add ClusterState trait

* Expose active job count

* Remove println

* Resubmit jobs when no resources available for scheduling

* Make parse_physical_expr public

* Reduce log spam

* Fix job submitted metric by ignoring resubmissions

* Record when job is queued in scheduler metrics (#28)

* Record when job is queueud in scheduler metrics

* add additional buckets for exec times

* Upstream rebase (#29)

* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not

* Prepare 0.10.0 Release (apache#522)

* bump version

* CHANGELOG

* Ballista gets a docker image!!! (apache#521)

* Ballista gets a docker image!!!

* Enable flight sql

* Allow executing startup script

* Allow executing executables

* Clippy

* Remove capture group (apache#527)

* fix python build in CI (apache#528)

* fix python build in CI

* save progress

* use same min rust version in all crates

* fix

* use image from pyo3

* use newer image from pyo3

* do not require protoc

* wheels now generated

* rat - exclude generated file

* Update docs for simplified instructions (apache#532)

* Update docs for simplified instructions

* Fix whoopsie

* Update docs/source/user-guide/flightsql.md

Co-authored-by: Andy Grove <[email protected]>

Co-authored-by: Andy Grove <[email protected]>

* remove --locked (apache#533)

* Bump actions/labeler from 4.0.2 to 4.1.0 (apache#525)

* Provide a memory StateBackendClient (apache#523)

* Rename StateBackend::Standalone to StateBackend:Sled

* Copy utility files from sled crate since they cannot be used directly

* Provide a memory StateBackendClient

* Fix dashmap deadlock issue

* Fix for the comments

Co-authored-by: yangzhong <[email protected]>

* only build docker images on rc tags (apache#535)

* docs: fix style in the Helm readme (apache#551)

* Fix Helm chart's image format (apache#550)

* Update datafusion requirement from 14.0.0 to 15.0.0 (apache#552)

* Update datafusion requirement from 14.0.0 to 15.0.0

* Fix UT

* Fix python

* Fix python

* Fix Python

Co-authored-by: yangzhong <[email protected]>

* Make it concurrently to launch tasks to executors (apache#557)

* Make it concurrently to launch tasks to executors

* Refine for comments

Co-authored-by: yangzhong <[email protected]>

* fix(ui): fix last seen (apache#562)

* Support Alibaba Cloud OSS with ObjectStore (apache#567)

* Fix cargo clippy (apache#571)

Co-authored-by: yangzhong <[email protected]>

* Super minor spelling error (apache#573)

* Update env_logger requirement from 0.9 to 0.10 (apache#539)

Updates the requirements on [env_logger](https://github.com/rust-cli/env_logger) to permit the latest version.
- [Release notes](https://github.com/rust-cli/env_logger/releases)
- [Changelog](https://github.com/rust-cli/env_logger/blob/main/CHANGELOG.md)
- [Commits](rust-cli/env_logger@v0.9.0...v0.10.0)

---
updated-dependencies:
- dependency-name: env_logger
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update graphviz-rust requirement from 0.4.0 to 0.5.0 (apache#574)

Updates the requirements on [graphviz-rust](https://github.com/besok/graphviz-rust) to permit the latest version.
- [Release notes](https://github.com/besok/graphviz-rust/releases)
- [Changelog](https://github.com/besok/graphviz-rust/blob/master/CHANGELOG.md)
- [Commits](https://github.com/besok/graphviz-rust/commits)

---
updated-dependencies:
- dependency-name: graphviz-rust
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* updated readme to contain correct versions of dependencies. (apache#580)

* Fix benchmark image link (apache#596)

* Add support for Azure (apache#599)

* Remove outdated script and use evergreen version of rust (apache#597)

* Remove outdated script and use evergreen version of rust

* Use debian protobuf

* Customize session builder

* Add setter for executor slots policy

* Construct Executor with functions

* Add queued and completed timestamps to successful job status

* Add public methods to SchedulerServer

* Public method for getting execution graph

* Public method for stage metrics

* Use node-level local limit (#20)

* Use node-level local limit

* serialize limit in shuffle writer

* Revert "Merge pull request #19 from coralogix/sc-5792"

This reverts commit 08140ef, reversing
changes made to a7f1384.

* add log

* make sure we don't forget limit for shuffle writer

* update accum correctly and try to break early

* Check local limit accumulator before polling for more data

* fix build

Co-authored-by: Martins Purins <[email protected]>

* Add ClusterState trait

* Expose active job count

* Remove println

* Resubmit jobs when no resources available for scheduling

* Make parse_physical_expr public

* Reduce log spam

* Fix job submitted metric by ignoring resubmissions

* Record when job is queued in scheduler metrics (#28)

* Record when job is queueud in scheduler metrics

* add additional buckets for exec times

* fmt

* clippy

* tomlfmt

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: yahoNanJing <[email protected]>
Co-authored-by: yangzhong <[email protected]>
Co-authored-by: Xin Hao <[email protected]>
Co-authored-by: Duyet Le <[email protected]>
Co-authored-by: r.4ntix <[email protected]>
Co-authored-by: Jeremy Dyer <[email protected]>
Co-authored-by: Sai Krishna Reddy Lakkam <[email protected]>
Co-authored-by: Aidan Kovacic <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Dan Harris <[email protected]>

* Post merge update

* update message formatting

* post merge update

* another post-merge updates

* update github actions

* clippy

* update script

* fmt

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: yahoNanJing <[email protected]>
Co-authored-by: yangzhong <[email protected]>
Co-authored-by: Xin Hao <[email protected]>
Co-authored-by: Duyet Le <[email protected]>
Co-authored-by: r.4ntix <[email protected]>
Co-authored-by: Jeremy Dyer <[email protected]>
Co-authored-by: Sai Krishna Reddy Lakkam <[email protected]>
Co-authored-by: Aidan Kovacic <[email protected]>
Co-authored-by: Tim Van Wassenhove <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Dan Harris <[email protected]>

* post merge fixes

* fix branch naming in github actions

* cleanup

* fmt

* update imports

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: yahoNanJing <[email protected]>
Co-authored-by: yangzhong <[email protected]>
Co-authored-by: Xin Hao <[email protected]>
Co-authored-by: Duyet Le <[email protected]>
Co-authored-by: r.4ntix <[email protected]>
Co-authored-by: Jeremy Dyer <[email protected]>
Co-authored-by: Sai Krishna Reddy Lakkam <[email protected]>
Co-authored-by: Aidan Kovacic <[email protected]>
Co-authored-by: Tim Van Wassenhove <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
Co-authored-by: Ian Alexander Joiner <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: jiangzhx <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
ch-sc added a commit to coralogix/arrow-ballista that referenced this pull request Mar 31, 2023
* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not

* Prepare 0.10.0 Release (apache#522)

* bump version

* CHANGELOG

* Ballista gets a docker image!!! (apache#521)

* Ballista gets a docker image!!!

* Enable flight sql

* Allow executing startup script

* Allow executing executables

* Clippy

* Remove capture group (apache#527)

* fix python build in CI (apache#528)

* fix python build in CI

* save progress

* use same min rust version in all crates

* fix

* use image from pyo3

* use newer image from pyo3

* do not require protoc

* wheels now generated

* rat - exclude generated file

* Update docs for simplified instructions (apache#532)

* Update docs for simplified instructions

* Fix whoopsie

* Update docs/source/user-guide/flightsql.md

Co-authored-by: Andy Grove <[email protected]>

Co-authored-by: Andy Grove <[email protected]>

* remove --locked (apache#533)

* Bump actions/labeler from 4.0.2 to 4.1.0 (apache#525)

* Provide a memory StateBackendClient (apache#523)

* Rename StateBackend::Standalone to StateBackend:Sled

* Copy utility files from sled crate since they cannot be used directly

* Provide a memory StateBackendClient

* Fix dashmap deadlock issue

* Fix for the comments

Co-authored-by: yangzhong <[email protected]>

* only build docker images on rc tags (apache#535)

* docs: fix style in the Helm readme (apache#551)

* Fix Helm chart's image format (apache#550)

* Update datafusion requirement from 14.0.0 to 15.0.0 (apache#552)

* Update datafusion requirement from 14.0.0 to 15.0.0

* Fix UT

* Fix python

* Fix python

* Fix Python

Co-authored-by: yangzhong <[email protected]>

* Make it concurrently to launch tasks to executors (apache#557)

* Make it concurrently to launch tasks to executors

* Refine for comments

Co-authored-by: yangzhong <[email protected]>

* fix(ui): fix last seen (apache#562)

* Support Alibaba Cloud OSS with ObjectStore (apache#567)

* Fix cargo clippy (apache#571)

Co-authored-by: yangzhong <[email protected]>

* Super minor spelling error (apache#573)

* Update env_logger requirement from 0.9 to 0.10 (apache#539)

Updates the requirements on [env_logger](https://github.com/rust-cli/env_logger) to permit the latest version.
- [Release notes](https://github.com/rust-cli/env_logger/releases)
- [Changelog](https://github.com/rust-cli/env_logger/blob/main/CHANGELOG.md)
- [Commits](rust-cli/env_logger@v0.9.0...v0.10.0)

---
updated-dependencies:
- dependency-name: env_logger
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update graphviz-rust requirement from 0.4.0 to 0.5.0 (apache#574)

Updates the requirements on [graphviz-rust](https://github.com/besok/graphviz-rust) to permit the latest version.
- [Release notes](https://github.com/besok/graphviz-rust/releases)
- [Changelog](https://github.com/besok/graphviz-rust/blob/master/CHANGELOG.md)
- [Commits](https://github.com/besok/graphviz-rust/commits)

---
updated-dependencies:
- dependency-name: graphviz-rust
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* updated readme to contain correct versions of dependencies. (apache#580)

* Fix benchmark image link (apache#596)

* Add support for Azure (apache#599)

* Remove outdated script and use evergreen version of rust (apache#597)

* Remove outdated script and use evergreen version of rust

* Use debian protobuf

* feat: update script such that ballista-cli image is built as well (apache#601)

* Fix Cargo.toml format issue (apache#616)

* Refactor executor main (apache#614)

* Refactor executor main

* copy all configs

* toml fmt

* Refactor scheduler main (apache#615)

* refactor scheduler main

* toml fmt

* Python: add method to get explain output as a string (apache#593)

* Update contributor guide (apache#617)

* Cluster state refactor part 1 (apache#560)

* Customize session builder

* Add setter for executor slots policy

* Construct Executor with functions

* Add queued and completed timestamps to successful job status

* Add public methods to SchedulerServer

* Public method for getting execution graph

* Public method for stage metrics

* Use node-level local limit (#20)

* Use node-level local limit

* serialize limit in shuffle writer

* Revert "Merge pull request #19 from coralogix/sc-5792"

This reverts commit 08140ef, reversing
changes made to a7f1384.

* add log

* make sure we don't forget limit for shuffle writer

* update accum correctly and try to break early

* Check local limit accumulator before polling for more data

* fix build

Co-authored-by: Martins Purins <[email protected]>

* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not

* Add ClusterState trait

* Refactor slightly for clarity

* Revert "Use node-level local limit (#20)"

This reverts commit ff96bcd.

* Revert "Public method for stage metrics"

This reverts commit a802315.

* Revert "Public method for getting execution graph"

This reverts commit 490bda5.

* Revert "Add public methods to SchedulerServer"

This reverts commit 5ad27c0.

* Revert "Add queued and completed timestamps to successful job status"

This reverts commit c615fce.

* Revert "Construct Executor with functions"

This reverts commit 24d4830.

* Always forget the apache header

Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>

* replace master with main (apache#621)

* implement new release process (apache#623)

* add docs on who can release (apache#632)

* Upgrade to DataFusion 16 (again) (apache#636)

* Update datafusion dependency to the latest version (apache#612)

* Update datafusion dependency to the latest version

* Fix python

* Skip ut of test_window_lead due to apache/datafusion-python#135

* Fix clippy

---------

Co-authored-by: yangzhong <[email protected]>

* Upgrade to DataFusion 17 (apache#639)

* Upgrade to DF 17

* Restore original error handling functionality

* check in benchmark image (apache#647)

* Remove `python` dir & python-related workflows (apache#654)

* refactor: remove python dir & python-related workflows

* remove brackets

* Handle job resubmission (apache#586)

* Handle job resubmission

* Make resubmission configurable and add test

* Fix debug log

* Add executor self-registration mechanism in the heartbeat service (apache#649)

Co-authored-by: yangzhong <[email protected]>

* Cluster state refactor Part 2 (apache#658)

* Customize session builder

* Add setter for executor slots policy

* Construct Executor with functions

* Add queued and completed timestamps to successful job status

* Add public methods to SchedulerServer

* Public method for getting execution graph

* Public method for stage metrics

* Use node-level local limit (#20)

* Use node-level local limit

* serialize limit in shuffle writer

* Revert "Merge pull request #19 from coralogix/sc-5792"

This reverts commit 08140ef, reversing
changes made to a7f1384.

* add log

* make sure we don't forget limit for shuffle writer

* update accum correctly and try to break early

* Check local limit accumulator before polling for more data

* fix build

Co-authored-by: Martins Purins <[email protected]>

* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not

* Add ClusterState trait

* Refactor slightly for clarity

* Revert "Use node-level local limit (#20)"

This reverts commit ff96bcd.

* Revert "Public method for stage metrics"

This reverts commit a802315.

* Revert "Public method for getting execution graph"

This reverts commit 490bda5.

* Revert "Add public methods to SchedulerServer"

This reverts commit 5ad27c0.

* Revert "Add queued and completed timestamps to successful job status"

This reverts commit c615fce.

* Revert "Construct Executor with functions"

This reverts commit 24d4830.

* Always forget the apache header

* WIP

* Implement JobState

* Tests and fixes

* do not hold ref across await point

* Fix clippy warnings

* Fix tomlfmt github action

* uncomment test

---------

Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>

* Upgrade to DataFusion 18.0.0-rc1 (apache#664)

* Minor refactor to reduce duplicate code (apache#659)

* move test_util to ballista-examples package (apache#661)

* Upgrade to DataFusion 18 (apache#668)

* Enable physical plan round-trip tests (apache#666)

* Prep 0.11 (apache#682)

* Change version to 0.11.0

* changelog

* update react-timeago version

* yarn upgrade

* fix

* fix

* revert yarn change

* Print versions

* Print locations

* Avoid github shenanigans

* Try to get runners running

* Try to get runners running

* already root

---------

Co-authored-by: Andy Grove <[email protected]>

* [minor] remove todo (apache#683)

* Add executor terminating status for graceful shutdown (apache#667)

* Add executor terminating status for graceful shutdown

* Remove empty file

* Update ballista/executor/src/executor_process.rs

Co-authored-by: Brent Gardner <[email protected]>

---------

Co-authored-by: Brent Gardner <[email protected]>

* Allow `BallistaContext::read_*` methods to read multiple paths. (apache#679)

* updated dependency in cargo, added read_json method, modified read_* methods to read multiple paths.

* ran cargo fmt

* Added revision for proper builds.

* Update scheduler.md (apache#657)

* Mark `SchedulerState` as pub (apache#688)

* Mark as pub

* Fmt

---------

Co-authored-by: Daniël Heres <[email protected]>

* Update graphviz-rust requirement from 0.5.0 to 0.6.1 (apache#651)

Updates the requirements on [graphviz-rust](https://github.com/besok/graphviz-rust) to permit the latest version.
- [Release notes](https://github.com/besok/graphviz-rust/releases)
- [Changelog](https://github.com/besok/graphviz-rust/blob/master/CHANGELOG.md)
- [Commits](https://github.com/besok/graphviz-rust/commits)

---
updated-dependencies:
- dependency-name: graphviz-rust
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Upgrade DataFusion to 19.0.0 (apache#691)

* update release notes (apache#692)

* Make task launcher pub (apache#695)

Co-authored-by: Daniël Heres <[email protected]>

* Make task_manager pub (apache#698)

Co-authored-by: Daniël Heres <[email protected]>

* Add ExecutionEngine abstraction (apache#687)

* Allow accessing s3 locations in client mode (apache#700)

* Allow accessing s3 locations in client mode

* Removed s3 feature from test dependencies.

* fixed cargo-tomlfmt issues

* deployment/docker-compose.md incorrect remote ref (apache#699)

* Fix for error message during testing (apache#707)

* Fix cargo clippy

* Fix for error message during testing

* Remove unwrap for dealing with JobQueued event

* log task ids when launch tasks

---------

Co-authored-by: yangzhong <[email protected]>

* Upgrade datafusion to 20.0.0 & sqlparser to to 0.32.0 (apache#711)

* Upgrade datafusion & sqlparser

* Move ballista_round_trip tests of benchmark into a separate feature to avoid stack overflow

* Fix failed tests of scheduler

* Update README.md (apache#729)

* Update link to proto file in dev docs (apache#713)

* Fix `show tables` fails (apache#715)

* Remove cancelled jobs from active cache (#36)

* Downgrade expected error to warning (#37)

* Downgrade expected error to warning

* add context

* Serialize configoptions and pass them to executor (#34)

* serialize configoptions and pass them to executor and allow extensions for TaskContext

* use ConfigOptions::with_extensions

* fix usage of ConfigOptions

* clippy

* Add wait_drained to SchedulerServer and Executor (#41)

* Add missing code from previous commits

* Fixes after merging from master

* Reintroduce Executor::with_functions

* Adapt prometheus histogram buckets

* cargo tomlfmt

* cargo fmt --all

* Allow too_many_arguments lint

* Cargo tomlfmt

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: yahoNanJing <[email protected]>
Co-authored-by: yangzhong <[email protected]>
Co-authored-by: Xin Hao <[email protected]>
Co-authored-by: Duyet Le <[email protected]>
Co-authored-by: r.4ntix <[email protected]>
Co-authored-by: Jeremy Dyer <[email protected]>
Co-authored-by: Sai Krishna Reddy Lakkam <[email protected]>
Co-authored-by: Aidan Kovacic <[email protected]>
Co-authored-by: Tim Van Wassenhove <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
Co-authored-by: Ian Alexander Joiner <[email protected]>
Co-authored-by: jiangzhx <[email protected]>
Co-authored-by: Yang Jiang <[email protected]>
Co-authored-by: Lakkam Sai Krishna Reddy <[email protected]>
Co-authored-by: Vrishabh <[email protected]>
Co-authored-by: Daniël Heres <[email protected]>
Co-authored-by: Daniël Heres <[email protected]>
Co-authored-by: Joe Williams <[email protected]>
Co-authored-by: Jaap Aarts <[email protected]>
Co-authored-by: mpurins-coralogix <[email protected]>
}

/// Remove the `ExecutionGraph` for the given job ID from cache
pub(crate) async fn remove_active_execution_graph(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thinkharderdev Sorry to interrupte, plz take look, could you plz tell me why remove the async , same in other functions 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is just removing the ExecutionGraph from the local active cache so it doesn't need to be async. We only need an async function when we modify the global state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think active_job_cache here is a global states here in TaskManager -> SchedulerState, Is there something wrong ? plz tell me 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, by "global state" I mean state shared across multiple schedulers. The active_job_cache is local to a single scheduler and and a purely in-memory data structure

fsdvh added a commit to coralogix/arrow-ballista that referenced this pull request Apr 26, 2023
* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not

* Prepare 0.10.0 Release (apache#522)

* bump version

* CHANGELOG

* Ballista gets a docker image!!! (apache#521)

* Ballista gets a docker image!!!

* Enable flight sql

* Allow executing startup script

* Allow executing executables

* Clippy

* Remove capture group (apache#527)

* fix python build in CI (apache#528)

* fix python build in CI

* save progress

* use same min rust version in all crates

* fix

* use image from pyo3

* use newer image from pyo3

* do not require protoc

* wheels now generated

* rat - exclude generated file

* Update docs for simplified instructions (apache#532)

* Update docs for simplified instructions

* Fix whoopsie

* Update docs/source/user-guide/flightsql.md

Co-authored-by: Andy Grove <[email protected]>

Co-authored-by: Andy Grove <[email protected]>

* remove --locked (apache#533)

* Bump actions/labeler from 4.0.2 to 4.1.0 (apache#525)

* Provide a memory StateBackendClient (apache#523)

* Rename StateBackend::Standalone to StateBackend:Sled

* Copy utility files from sled crate since they cannot be used directly

* Provide a memory StateBackendClient

* Fix dashmap deadlock issue

* Fix for the comments

Co-authored-by: yangzhong <[email protected]>

* only build docker images on rc tags (apache#535)

* docs: fix style in the Helm readme (apache#551)

* Fix Helm chart's image format (apache#550)

* Update datafusion requirement from 14.0.0 to 15.0.0 (apache#552)

* Update datafusion requirement from 14.0.0 to 15.0.0

* Fix UT

* Fix python

* Fix python

* Fix Python

Co-authored-by: yangzhong <[email protected]>

* Make it concurrently to launch tasks to executors (apache#557)

* Make it concurrently to launch tasks to executors

* Refine for comments

Co-authored-by: yangzhong <[email protected]>

* fix(ui): fix last seen (apache#562)

* Support Alibaba Cloud OSS with ObjectStore (apache#567)

* Fix cargo clippy (apache#571)

Co-authored-by: yangzhong <[email protected]>

* Super minor spelling error (apache#573)

* Update env_logger requirement from 0.9 to 0.10 (apache#539)

Updates the requirements on [env_logger](https://github.com/rust-cli/env_logger) to permit the latest version.
- [Release notes](https://github.com/rust-cli/env_logger/releases)
- [Changelog](https://github.com/rust-cli/env_logger/blob/main/CHANGELOG.md)
- [Commits](rust-cli/env_logger@v0.9.0...v0.10.0)

---
updated-dependencies:
- dependency-name: env_logger
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update graphviz-rust requirement from 0.4.0 to 0.5.0 (apache#574)

Updates the requirements on [graphviz-rust](https://github.com/besok/graphviz-rust) to permit the latest version.
- [Release notes](https://github.com/besok/graphviz-rust/releases)
- [Changelog](https://github.com/besok/graphviz-rust/blob/master/CHANGELOG.md)
- [Commits](https://github.com/besok/graphviz-rust/commits)

---
updated-dependencies:
- dependency-name: graphviz-rust
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* updated readme to contain correct versions of dependencies. (apache#580)

* Fix benchmark image link (apache#596)

* Add support for Azure (apache#599)

* Remove outdated script and use evergreen version of rust (apache#597)

* Remove outdated script and use evergreen version of rust

* Use debian protobuf

* feat: update script such that ballista-cli image is built as well (apache#601)

* Fix Cargo.toml format issue (apache#616)

* Refactor executor main (apache#614)

* Refactor executor main

* copy all configs

* toml fmt

* Refactor scheduler main (apache#615)

* refactor scheduler main

* toml fmt

* Python: add method to get explain output as a string (apache#593)

* Update contributor guide (apache#617)

* Cluster state refactor part 1 (apache#560)

* Customize session builder

* Add setter for executor slots policy

* Construct Executor with functions

* Add queued and completed timestamps to successful job status

* Add public methods to SchedulerServer

* Public method for getting execution graph

* Public method for stage metrics

* Use node-level local limit (#20)

* Use node-level local limit

* serialize limit in shuffle writer

* Revert "Merge pull request #19 from coralogix/sc-5792"

This reverts commit 08140ef, reversing
changes made to a7f1384.

* add log

* make sure we don't forget limit for shuffle writer

* update accum correctly and try to break early

* Check local limit accumulator before polling for more data

* fix build

Co-authored-by: Martins Purins <[email protected]>

* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not

* Add ClusterState trait

* Refactor slightly for clarity

* Revert "Use node-level local limit (#20)"

This reverts commit ff96bcd.

* Revert "Public method for stage metrics"

This reverts commit a802315.

* Revert "Public method for getting execution graph"

This reverts commit 490bda5.

* Revert "Add public methods to SchedulerServer"

This reverts commit 5ad27c0.

* Revert "Add queued and completed timestamps to successful job status"

This reverts commit c615fce.

* Revert "Construct Executor with functions"

This reverts commit 24d4830.

* Always forget the apache header

Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>

* replace master with main (apache#621)

* implement new release process (apache#623)

* add docs on who can release (apache#632)

* Upgrade to DataFusion 16 (again) (apache#636)

* Update datafusion dependency to the latest version (apache#612)

* Update datafusion dependency to the latest version

* Fix python

* Skip ut of test_window_lead due to apache/datafusion-python#135

* Fix clippy

---------

Co-authored-by: yangzhong <[email protected]>

* Upgrade to DataFusion 17 (apache#639)

* Upgrade to DF 17

* Restore original error handling functionality

* check in benchmark image (apache#647)

* Remove `python` dir & python-related workflows (apache#654)

* refactor: remove python dir & python-related workflows

* remove brackets

* Handle job resubmission (apache#586)

* Handle job resubmission

* Make resubmission configurable and add test

* Fix debug log

* Add executor self-registration mechanism in the heartbeat service (apache#649)

Co-authored-by: yangzhong <[email protected]>

* Cluster state refactor Part 2 (apache#658)

* Customize session builder

* Add setter for executor slots policy

* Construct Executor with functions

* Add queued and completed timestamps to successful job status

* Add public methods to SchedulerServer

* Public method for getting execution graph

* Public method for stage metrics

* Use node-level local limit (#20)

* Use node-level local limit

* serialize limit in shuffle writer

* Revert "Merge pull request #19 from coralogix/sc-5792"

This reverts commit 08140ef, reversing
changes made to a7f1384.

* add log

* make sure we don't forget limit for shuffle writer

* update accum correctly and try to break early

* Check local limit accumulator before polling for more data

* fix build

Co-authored-by: Martins Purins <[email protected]>

* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not

* Add ClusterState trait

* Refactor slightly for clarity

* Revert "Use node-level local limit (#20)"

This reverts commit ff96bcd.

* Revert "Public method for stage metrics"

This reverts commit a802315.

* Revert "Public method for getting execution graph"

This reverts commit 490bda5.

* Revert "Add public methods to SchedulerServer"

This reverts commit 5ad27c0.

* Revert "Add queued and completed timestamps to successful job status"

This reverts commit c615fce.

* Revert "Construct Executor with functions"

This reverts commit 24d4830.

* Always forget the apache header

* WIP

* Implement JobState

* Tests and fixes

* do not hold ref across await point

* Fix clippy warnings

* Fix tomlfmt github action

* uncomment test

---------

Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>

* Upgrade to DataFusion 18.0.0-rc1 (apache#664)

* Minor refactor to reduce duplicate code (apache#659)

* move test_util to ballista-examples package (apache#661)

* Upgrade to DataFusion 18 (apache#668)

* Enable physical plan round-trip tests (apache#666)

* Prep 0.11 (apache#682)

* Change version to 0.11.0

* changelog

* update react-timeago version

* yarn upgrade

* fix

* fix

* revert yarn change

* Print versions

* Print locations

* Avoid github shenanigans

* Try to get runners running

* Try to get runners running

* already root

---------

Co-authored-by: Andy Grove <[email protected]>

* [minor] remove todo (apache#683)

* Add executor terminating status for graceful shutdown (apache#667)

* Add executor terminating status for graceful shutdown

* Remove empty file

* Update ballista/executor/src/executor_process.rs

Co-authored-by: Brent Gardner <[email protected]>

---------

Co-authored-by: Brent Gardner <[email protected]>

* Allow `BallistaContext::read_*` methods to read multiple paths. (apache#679)

* updated dependency in cargo, added read_json method, modified read_* methods to read multiple paths.

* ran cargo fmt

* Added revision for proper builds.

* Update scheduler.md (apache#657)

* Mark `SchedulerState` as pub (apache#688)

* Mark as pub

* Fmt

---------

Co-authored-by: Daniël Heres <[email protected]>

* Update graphviz-rust requirement from 0.5.0 to 0.6.1 (apache#651)

Updates the requirements on [graphviz-rust](https://github.com/besok/graphviz-rust) to permit the latest version.
- [Release notes](https://github.com/besok/graphviz-rust/releases)
- [Changelog](https://github.com/besok/graphviz-rust/blob/master/CHANGELOG.md)
- [Commits](https://github.com/besok/graphviz-rust/commits)

---
updated-dependencies:
- dependency-name: graphviz-rust
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Upgrade DataFusion to 19.0.0 (apache#691)

* update release notes (apache#692)

* Make task launcher pub (apache#695)

Co-authored-by: Daniël Heres <[email protected]>

* Make task_manager pub (apache#698)

Co-authored-by: Daniël Heres <[email protected]>

* Add ExecutionEngine abstraction (apache#687)

* Allow accessing s3 locations in client mode (apache#700)

* Allow accessing s3 locations in client mode

* Removed s3 feature from test dependencies.

* fixed cargo-tomlfmt issues

* deployment/docker-compose.md incorrect remote ref (apache#699)

* Fix for error message during testing (apache#707)

* Fix cargo clippy

* Fix for error message during testing

* Remove unwrap for dealing with JobQueued event

* log task ids when launch tasks

---------

Co-authored-by: yangzhong <[email protected]>

* Upgrade datafusion to 20.0.0 & sqlparser to to 0.32.0 (apache#711)

* Upgrade datafusion & sqlparser

* Move ballista_round_trip tests of benchmark into a separate feature to avoid stack overflow

* Fix failed tests of scheduler

* Update README.md (apache#729)

* Update link to proto file in dev docs (apache#713)

* Fix `show tables` fails (apache#715)

* Remove cancelled jobs from active cache (#36)

* Downgrade expected error to warning (#37)

* Downgrade expected error to warning

* add context

* Serialize configoptions and pass them to executor (#34)

* serialize configoptions and pass them to executor and allow extensions for TaskContext

* use ConfigOptions::with_extensions

* fix usage of ConfigOptions

* clippy

* Add wait_drained to SchedulerServer and Executor (#41)

* Add missing code from previous commits

* Fixes after merging from master

* Reintroduce Executor::with_functions

* Adapt prometheus histogram buckets

* cargo tomlfmt

* cargo fmt --all

* Allow too_many_arguments lint

* sc-16350: introducing notion of external and internal error in the failed job status

* Cargo tomlfmt

* sc-16350: small test fix

* sc-16350: partially implemented ballista error serialization

* sc-16350: update failed job proto definition

* sc-16350: cleanup

* sc-16350: update protoc

* sc-16350: update action

* sc-16350: more action update

* sc-16350: update test

* sc-16350: allow optional fields

* VTX-522: fix models

* VTX-522: cleanup

* Change error to warn

* VTX-522: fixes

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: yahoNanJing <[email protected]>
Co-authored-by: yangzhong <[email protected]>
Co-authored-by: Xin Hao <[email protected]>
Co-authored-by: Duyet Le <[email protected]>
Co-authored-by: r.4ntix <[email protected]>
Co-authored-by: Jeremy Dyer <[email protected]>
Co-authored-by: Sai Krishna Reddy Lakkam <[email protected]>
Co-authored-by: Aidan Kovacic <[email protected]>
Co-authored-by: Tim Van Wassenhove <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
Co-authored-by: Ian Alexander Joiner <[email protected]>
Co-authored-by: jiangzhx <[email protected]>
Co-authored-by: Yang Jiang <[email protected]>
Co-authored-by: Lakkam Sai Krishna Reddy <[email protected]>
Co-authored-by: Vrishabh <[email protected]>
Co-authored-by: Daniël Heres <[email protected]>
Co-authored-by: Daniël Heres <[email protected]>
Co-authored-by: Joe Williams <[email protected]>
Co-authored-by: Jaap Aarts <[email protected]>
Co-authored-by: mpurins-coralogix <[email protected]>
Co-authored-by: Christoph Schulze <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor StateBackendClient to be a higher-level interface
4 participants