Skip to content

Commit

Permalink
Upstream rebase (#29)
Browse files Browse the repository at this point in the history
* configure_me_codegen retroactively reserved on our `bind_host` parame… (apache#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not

* Prepare 0.10.0 Release (apache#522)

* bump version

* CHANGELOG

* Ballista gets a docker image!!! (apache#521)

* Ballista gets a docker image!!!

* Enable flight sql

* Allow executing startup script

* Allow executing executables

* Clippy

* Remove capture group (apache#527)

* fix python build in CI (apache#528)

* fix python build in CI

* save progress

* use same min rust version in all crates

* fix

* use image from pyo3

* use newer image from pyo3

* do not require protoc

* wheels now generated

* rat - exclude generated file

* Update docs for simplified instructions (apache#532)

* Update docs for simplified instructions

* Fix whoopsie

* Update docs/source/user-guide/flightsql.md

Co-authored-by: Andy Grove <[email protected]>

Co-authored-by: Andy Grove <[email protected]>

* remove --locked (apache#533)

* Bump actions/labeler from 4.0.2 to 4.1.0 (apache#525)

* Provide a memory StateBackendClient (apache#523)

* Rename StateBackend::Standalone to StateBackend:Sled

* Copy utility files from sled crate since they cannot be used directly

* Provide a memory StateBackendClient

* Fix dashmap deadlock issue

* Fix for the comments

Co-authored-by: yangzhong <[email protected]>

* only build docker images on rc tags (apache#535)

* docs: fix style in the Helm readme (apache#551)

* Fix Helm chart's image format (apache#550)

* Update datafusion requirement from 14.0.0 to 15.0.0 (apache#552)

* Update datafusion requirement from 14.0.0 to 15.0.0

* Fix UT

* Fix python

* Fix python

* Fix Python

Co-authored-by: yangzhong <[email protected]>

* Make it concurrently to launch tasks to executors (apache#557)

* Make it concurrently to launch tasks to executors

* Refine for comments

Co-authored-by: yangzhong <[email protected]>

* fix(ui): fix last seen (apache#562)

* Support Alibaba Cloud OSS with ObjectStore (apache#567)

* Fix cargo clippy (apache#571)

Co-authored-by: yangzhong <[email protected]>

* Super minor spelling error (apache#573)

* Update env_logger requirement from 0.9 to 0.10 (apache#539)

Updates the requirements on [env_logger](https://github.com/rust-cli/env_logger) to permit the latest version.
- [Release notes](https://github.com/rust-cli/env_logger/releases)
- [Changelog](https://github.com/rust-cli/env_logger/blob/main/CHANGELOG.md)
- [Commits](rust-cli/env_logger@v0.9.0...v0.10.0)

---
updated-dependencies:
- dependency-name: env_logger
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update graphviz-rust requirement from 0.4.0 to 0.5.0 (apache#574)

Updates the requirements on [graphviz-rust](https://github.com/besok/graphviz-rust) to permit the latest version.
- [Release notes](https://github.com/besok/graphviz-rust/releases)
- [Changelog](https://github.com/besok/graphviz-rust/blob/master/CHANGELOG.md)
- [Commits](https://github.com/besok/graphviz-rust/commits)

---
updated-dependencies:
- dependency-name: graphviz-rust
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* updated readme to contain correct versions of dependencies. (apache#580)

* Fix benchmark image link (apache#596)

* Add support for Azure (apache#599)

* Remove outdated script and use evergreen version of rust (apache#597)

* Remove outdated script and use evergreen version of rust

* Use debian protobuf

* Customize session builder

* Add setter for executor slots policy

* Construct Executor with functions

* Add queued and completed timestamps to successful job status

* Add public methods to SchedulerServer

* Public method for getting execution graph

* Public method for stage metrics

* Use node-level local limit (#20)

* Use node-level local limit

* serialize limit in shuffle writer

* Revert "Merge pull request #19 from coralogix/sc-5792"

This reverts commit 08140ef, reversing
changes made to a7f1384.

* add log

* make sure we don't forget limit for shuffle writer

* update accum correctly and try to break early

* Check local limit accumulator before polling for more data

* fix build

Co-authored-by: Martins Purins <[email protected]>

* Add ClusterState trait

* Expose active job count

* Remove println

* Resubmit jobs when no resources available for scheduling

* Make parse_physical_expr public

* Reduce log spam

* Fix job submitted metric by ignoring resubmissions

* Record when job is queued in scheduler metrics (#28)

* Record when job is queueud in scheduler metrics

* add additional buckets for exec times

* fmt

* clippy

* tomlfmt

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: yahoNanJing <[email protected]>
Co-authored-by: yangzhong <[email protected]>
Co-authored-by: Xin Hao <[email protected]>
Co-authored-by: Duyet Le <[email protected]>
Co-authored-by: r.4ntix <[email protected]>
Co-authored-by: Jeremy Dyer <[email protected]>
Co-authored-by: Sai Krishna Reddy Lakkam <[email protected]>
Co-authored-by: Aidan Kovacic <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
  • Loading branch information
16 people authored Jan 23, 2023
1 parent 867c2c8 commit 763aa23
Show file tree
Hide file tree
Showing 69 changed files with 4,393 additions and 321 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/dev_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:
github.event_name == 'pull_request_target' &&
(github.event.action == 'opened' ||
github.event.action == 'synchronize')
uses: actions/labeler@v4.0.2
uses: actions/labeler@4.1.0
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
configuration-path: .github/workflows/dev_pr/labeler.yml
Expand Down
14 changes: 2 additions & 12 deletions .github/workflows/python_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -94,16 +94,6 @@ jobs:
steps:
- uses: actions/checkout@v3
- run: rm LICENSE.txt
- name: Install protobuf compiler
shell: bash
run: |
mkdir -p $HOME/d/protoc
cd $HOME/d/protoc
export PROTO_ZIP="protoc-21.4-linux-x86_64.zip"
curl -LO https://github.com/protocolbuffers/protobuf/releases/download/v21.4/$PROTO_ZIP
unzip $PROTO_ZIP
export PATH=$PATH:$HOME/d/protoc/bin
protoc --version
- name: Download LICENSE.txt
uses: actions/download-artifact@v3
with:
Expand All @@ -112,11 +102,11 @@ jobs:
- run: cat LICENSE.txt
- name: Build wheels
run: |
export PATH=$PATH:$HOME/d/protoc/bin
export RUSTFLAGS='-C target-cpu=skylake'
rm ../ballista/core/proto/*
docker run --rm -v $(pwd)/..:/io \
--workdir /io/python \
konstin2/maturin:v0.11.2 \
ghcr.io/pyo3/maturin:v0.13.7 \
build --release --manylinux 2010
- name: Archive wheels
uses: actions/upload-artifact@v3
Expand Down
20 changes: 8 additions & 12 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -117,12 +117,7 @@ jobs:
- name: Install protobuf compiler
shell: bash
run: |
mkdir -p $HOME/d/protoc
cd $HOME/d/protoc
export PROTO_ZIP="protoc-21.4-linux-x86_64.zip"
curl -LO https://github.com/protocolbuffers/protobuf/releases/download/v21.4/$PROTO_ZIP
unzip $PROTO_ZIP
export PATH=$PATH:$HOME/d/protoc/bin
apt-get -qq update && apt-get -y -qq install protobuf-compiler
protoc --version
- name: Cache Cargo
uses: actions/cache@v3
Expand All @@ -145,7 +140,7 @@ jobs:
export PATH=$PATH:$HOME/d/protoc/bin
export ARROW_TEST_DATA=$(pwd)/testing/data
export PARQUET_TEST_DATA=$(pwd)/parquet-testing/data
cargo test --features flight-sql
cargo test
cd examples
cargo run --example standalone_sql --features=ballista/standalone
env:
Expand Down Expand Up @@ -304,14 +299,15 @@ jobs:
- name: Build and push Docker image
run: |
echo "github user is $DOCKER_USER"
export DOCKER_TAG="$(git describe --exact-match --tags $(git log -n1 --pretty='%h') || echo '0.10.0-test')"
if [[ $DOCKER_TAG =~ ^[0-9\.]+$ ]]
docker build -t arrow-ballista-standalone:latest -f dev/docker/ballista-standalone.Dockerfile .
export DOCKER_TAG="$(git describe --exact-match --tags $(git log -n1 --pretty='%h') || echo '')"
if [[ $DOCKER_TAG =~ ^[0-9\.]+-rc[0-9]+$ ]]
then
echo "publishing docker tag $DOCKER_TAG"
docker tag arrow-ballista-standalone:latest ghcr.io/apache/arrow-ballista-standalone:$DOCKER_TAG
docker login ghcr.io -u $DOCKER_USER -p "$DOCKER_PASS"
docker push ghcr.io/apache/arrow-ballista-standalone:$DOCKER_TAG
fi
docker login ghcr.io -u $DOCKER_USER -p "$DOCKER_PASS"
docker build -t ghcr.io/apache/arrow-ballista-standalone:$DOCKER_TAG -f dev/docker/ballista-standalone.Dockerfile .
docker push ghcr.io/apache/arrow-ballista-standalone:$DOCKER_TAG
env:
DOCKER_USER: ${{ github.actor }}
DOCKER_PASS: ${{ secrets.GITHUB_TOKEN }}
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ queries at scale factor 10 (10 GB) on a single node with a single executor and 2

The tracking issue for improving these results is [#339](https://github.com/apache/arrow-ballista/issues/339).

![benchmarks](./docs/developer/images/ballista-benchmarks.png)
![benchmarks](https://sqlbenchmarks.io/sqlbench-h/results/env/workstation/sf10/distributed/sqlbench-h-workstation-10-distributed-perquery.png)

# Getting Started

Expand Down
12 changes: 6 additions & 6 deletions ballista-cli/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,25 +18,25 @@
[package]
name = "ballista-cli"
description = "Command Line Client for Ballista distributed query engine."
version = "0.9.0"
version = "0.10.0"
authors = ["Apache Arrow <[email protected]>"]
edition = "2021"
keywords = ["ballista", "cli"]
license = "Apache-2.0"
homepage = "https://github.com/apache/arrow-ballista"
repository = "https://github.com/apache/arrow-ballista"
rust-version = "1.59"
rust-version = "1.63"
readme = "README.md"

[dependencies]
ballista = { path = "../ballista/client", version = "0.9.0", features = [
ballista = { path = "../ballista/client", version = "0.10.0", features = [
"standalone",
] }
clap = { version = "3", features = ["derive", "cargo"] }
datafusion = "14.0.0"
datafusion-cli = "14.0.0"
datafusion = "15.0.0"
datafusion-cli = "15.0.0"
dirs = "4.0.0"
env_logger = "0.9"
env_logger = "0.10"
mimalloc = { version = "0.1", default-features = false }
num_cpus = "1.13.0"
rustyline = "10.0"
Expand Down
78 changes: 78 additions & 0 deletions ballista/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,84 @@

# Changelog

## [0.10.0](https://github.com/apache/arrow-ballista/tree/0.10.0) (2022-11-18)

[Full Changelog](https://github.com/apache/arrow-ballista/compare/0.9.0...0.10.0)

**Implemented enhancements:**

- Add user guide section on prometheus metrics [\#507](https://github.com/apache/arrow-ballista/issues/507)
- Don't throw error when job path not exist in remove\_job\_data [\#502](https://github.com/apache/arrow-ballista/issues/502)
- Fix clippy warning [\#494](https://github.com/apache/arrow-ballista/issues/494)
- Use job\_data\_clean\_up\_interval\_seconds == 0 to indicate executor\_cleanup\_enable [\#488](https://github.com/apache/arrow-ballista/issues/488)
- Add a config for tracing log rolling policy for both scheduler and executor [\#486](https://github.com/apache/arrow-ballista/issues/486)
- Set up repo where we can push benchmark results [\#473](https://github.com/apache/arrow-ballista/issues/473)
- Make the delayed time interval for cleanup job data in both scheduler and executor configurable [\#469](https://github.com/apache/arrow-ballista/issues/469)
- Add some validation for the remove\_job\_data grpc service [\#467](https://github.com/apache/arrow-ballista/issues/467)
- Add ability to build docker images using `release-lto` profile [\#463](https://github.com/apache/arrow-ballista/issues/463)
- Suggest users download \(rather than build\) the FlightSQL JDBC Driver [\#460](https://github.com/apache/arrow-ballista/issues/460)
- Clean up legacy job shuffle data [\#459](https://github.com/apache/arrow-ballista/issues/459)
- Add grpc service for the scheduler to make it able to be triggered by client explicitly [\#458](https://github.com/apache/arrow-ballista/issues/458)
- Replace Mutex\<HashMap\> by using DashMap [\#448](https://github.com/apache/arrow-ballista/issues/448)
- Refine log level [\#446](https://github.com/apache/arrow-ballista/issues/446)
- Upgrade to DataFusion 14.0.0 [\#445](https://github.com/apache/arrow-ballista/issues/445)
- Add a feature for hdfs3 [\#419](https://github.com/apache/arrow-ballista/issues/419)
- Add optional flag which advertises host for Arrow Flight SQL [\#418](https://github.com/apache/arrow-ballista/issues/418)
- Partitioning reasoning in DataFusion and Ballista [\#284](https://github.com/apache/arrow-ballista/issues/284)
- Stop wasting time in CI on MIRI runs [\#283](https://github.com/apache/arrow-ballista/issues/283)
- Publish Docker images as part of each release [\#236](https://github.com/apache/arrow-ballista/issues/236)
- Cleanup job/stage status from TaskManager and clean up shuffle data after a period after JobFinished [\#185](https://github.com/apache/arrow-ballista/issues/185)

**Fixed bugs:**

- build broken: configure\_me\_codegen retroactively reserved `bind_host` [\#519](https://github.com/apache/arrow-ballista/issues/519)
- Return empty results for SQLs with order by [\#451](https://github.com/apache/arrow-ballista/issues/451)
- ballista scheduler is not taken inline parameters into account [\#443](https://github.com/apache/arrow-ballista/issues/443)
- \[FlightSQL\] Cannot connect with Tableau Desktop [\#428](https://github.com/apache/arrow-ballista/issues/428)
- Benchmark q15 fails [\#372](https://github.com/apache/arrow-ballista/issues/372)
- Incorrect documentation for building Ballista on Linux when using docker-compose [\#362](https://github.com/apache/arrow-ballista/issues/362)
- Scheduler silently replaces `ParquetExec` with `EmptyExec` if data path is not correctly mounted in container [\#353](https://github.com/apache/arrow-ballista/issues/353)
- SQL with order by limit returns nothing [\#334](https://github.com/apache/arrow-ballista/issues/334)

**Documentation updates:**

- README updates [\#433](https://github.com/apache/arrow-ballista/pull/433) ([andygrove](https://github.com/andygrove))

**Merged pull requests:**

- configure\_me\_codegen retroactively reserved on our `bind_host` parame… [\#520](https://github.com/apache/arrow-ballista/pull/520) ([avantgardnerio](https://github.com/avantgardnerio))
- Bump actions/cache from 2 to 3 [\#517](https://github.com/apache/arrow-ballista/pull/517) ([dependabot[bot]](https://github.com/apps/dependabot))
- Update graphviz-rust requirement from 0.3.0 to 0.4.0 [\#515](https://github.com/apache/arrow-ballista/pull/515) ([dependabot[bot]](https://github.com/apps/dependabot))
- Add Prometheus metrics endpoint [\#511](https://github.com/apache/arrow-ballista/pull/511) ([thinkharderdev](https://github.com/thinkharderdev))
- Enable tests that work since upgrading to DataFusion 14 [\#510](https://github.com/apache/arrow-ballista/pull/510) ([andygrove](https://github.com/andygrove))
- Update hashbrown requirement from 0.12 to 0.13 [\#506](https://github.com/apache/arrow-ballista/pull/506) ([dependabot[bot]](https://github.com/apps/dependabot))
- Don't throw error when job shuffle data path not exist in executor [\#503](https://github.com/apache/arrow-ballista/pull/503) ([yahoNanJing](https://github.com/yahoNanJing))
- Upgrade to DataFusion 14.0.0 and Arrow 26.0.0 [\#499](https://github.com/apache/arrow-ballista/pull/499) ([andygrove](https://github.com/andygrove))
- Fix clippy warning [\#495](https://github.com/apache/arrow-ballista/pull/495) ([yahoNanJing](https://github.com/yahoNanJing))
- Stop wasting time in CI on MIRI runs [\#491](https://github.com/apache/arrow-ballista/pull/491) ([Ted-Jiang](https://github.com/Ted-Jiang))
- Remove executor config executor\_cleanup\_enable and make the configuation name for executor cleanup more intuitive [\#489](https://github.com/apache/arrow-ballista/pull/489) ([yahoNanJing](https://github.com/yahoNanJing))
- Add a config for tracing log rolling policy for both scheduler and executor [\#487](https://github.com/apache/arrow-ballista/pull/487) ([yahoNanJing](https://github.com/yahoNanJing))
- Add grpc service of cleaning up job shuffle data for the scheduler to make it able to be triggered by client explicitly [\#485](https://github.com/apache/arrow-ballista/pull/485) ([yahoNanJing](https://github.com/yahoNanJing))
- \[Minor\] Bump DataFusion [\#480](https://github.com/apache/arrow-ballista/pull/480) ([Dandandan](https://github.com/Dandandan))
- Remove benchmark results from README [\#478](https://github.com/apache/arrow-ballista/pull/478) ([andygrove](https://github.com/andygrove))
- Update `flightsql.md` to provide correct instruction [\#476](https://github.com/apache/arrow-ballista/pull/476) ([iajoiner](https://github.com/iajoiner))
- Add support for Tableau [\#475](https://github.com/apache/arrow-ballista/pull/475) ([avantgardnerio](https://github.com/avantgardnerio))
- Add SchedulerConfig for the scheduler configurations, like event\_loop\_buffer\_size, finished\_job\_data\_clean\_up\_interval\_seconds, finished\_job\_state\_clean\_up\_interval\_seconds [\#472](https://github.com/apache/arrow-ballista/pull/472) ([yahoNanJing](https://github.com/yahoNanJing))
- Bump DataFusion [\#471](https://github.com/apache/arrow-ballista/pull/471) ([Dandandan](https://github.com/Dandandan))
- Add some validation for remove\_job\_data in the executor server [\#468](https://github.com/apache/arrow-ballista/pull/468) ([yahoNanJing](https://github.com/yahoNanJing))
- Update documentation to reflect the release of the FlightSQL JDBC Driver [\#461](https://github.com/apache/arrow-ballista/pull/461) ([avantgardnerio](https://github.com/avantgardnerio))
- Bump DataFusion version [\#453](https://github.com/apache/arrow-ballista/pull/453) ([andygrove](https://github.com/andygrove))
- Add shuffle for SortPreservingMergeExec physical operator [\#452](https://github.com/apache/arrow-ballista/pull/452) ([yahoNanJing](https://github.com/yahoNanJing))
- Replace Mutex\<HashMap\> by using DashMap [\#449](https://github.com/apache/arrow-ballista/pull/449) ([yahoNanJing](https://github.com/yahoNanJing))
- Refine log level for trial info and periodically invoked places [\#447](https://github.com/apache/arrow-ballista/pull/447) ([yahoNanJing](https://github.com/yahoNanJing))
- MINOR: Add `set -e` to scripts, fix a typo [\#444](https://github.com/apache/arrow-ballista/pull/444) ([andygrove](https://github.com/andygrove))
- Add optional flag which advertises host for Arrow Flight SQL \#418 [\#442](https://github.com/apache/arrow-ballista/pull/442) ([DaltonModlin](https://github.com/DaltonModlin))
- Reorder joins after resolving stage inputs [\#441](https://github.com/apache/arrow-ballista/pull/441) ([Dandandan](https://github.com/Dandandan))
- Add a feature for hdfs3 [\#439](https://github.com/apache/arrow-ballista/pull/439) ([yahoNanJing](https://github.com/yahoNanJing))
- Add Spark benchmarks [\#438](https://github.com/apache/arrow-ballista/pull/438) ([andygrove](https://github.com/andygrove))
- scheduler now verifies that `file://` ListingTable URLs are accessible [\#414](https://github.com/apache/arrow-ballista/pull/414) ([andygrove](https://github.com/andygrove))


## [0.9.0](https://github.com/apache/arrow-ballista/tree/0.9.0) (2022-10-22)

[Full Changelog](https://github.com/apache/arrow-ballista/compare/0.8.0...0.9.0)
Expand Down
17 changes: 9 additions & 8 deletions ballista/client/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,28 +19,29 @@
name = "ballista"
description = "Ballista Distributed Compute"
license = "Apache-2.0"
version = "0.9.0"
version = "0.10.0"
homepage = "https://github.com/apache/arrow-ballista"
repository = "https://github.com/apache/arrow-ballista"
readme = "README.md"
authors = ["Apache Arrow <[email protected]>"]
edition = "2021"
rust-version = "1.59"
rust-version = "1.63"

[dependencies]
ballista-core = { path = "../core", version = "0.9.0" }
ballista-executor = { path = "../executor", version = "0.9.0", optional = true }
ballista-scheduler = { path = "../scheduler", version = "0.9.0", optional = true }
datafusion = "14.0.0"
datafusion-proto = "14.0.0"
ballista-core = { path = "../core", version = "0.10.0" }
ballista-executor = { path = "../executor", version = "0.10.0", optional = true }
ballista-scheduler = { path = "../scheduler", version = "0.10.0", optional = true }
datafusion = "15.0.0"
datafusion-proto = "15.0.0"
futures = "0.3"
log = "0.4"
parking_lot = "0.12"
sqlparser = "0.26"
sqlparser = "0.27"
tempfile = "3"
tokio = "1.0"

[features]
azure = ["ballista-core/azure"]
default = []
hdfs = ["ballista-core/hdfs"]
hdfs3 = ["ballista-core/hdfs3"]
Expand Down
4 changes: 2 additions & 2 deletions ballista/client/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,8 @@ To build a simple ballista example, add the following dependencies to your `Carg

```toml
[dependencies]
ballista = "0.8"
datafusion = "12.0.0"
ballista = "0.10"
datafusion = "14.0.0"
tokio = "1.0"
```

Expand Down
19 changes: 15 additions & 4 deletions ballista/client/src/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

//! Distributed execution context.
use datafusion::arrow::datatypes::SchemaRef;
use log::info;
use parking_lot::Mutex;
use sqlparser::ast::Statement;
Expand Down Expand Up @@ -375,6 +376,16 @@ impl BallistaContext {
..
}) => {
let table_exists = ctx.table_exist(name.as_str())?;
let schema: SchemaRef = Arc::new(schema.as_ref().to_owned().into());
let table_partition_cols = table_partition_cols
.iter()
.map(|col| {
schema
.field_with_name(col)
.map(|f| (f.name().to_owned(), f.data_type().to_owned()))
.map_err(DataFusionError::ArrowError)
})
.collect::<Result<Vec<_>>>()?;

match (if_not_exists, table_exists) {
(_, false) => match file_type.to_lowercase().as_str() {
Expand All @@ -383,9 +394,8 @@ impl BallistaContext {
.has_header(*has_header)
.delimiter(*delimiter as u8)
.table_partition_cols(table_partition_cols.to_vec());
let csv_schema = schema.as_ref().to_owned().into();
if !schema.fields().is_empty() {
options = options.schema(&csv_schema);
options = options.schema(&schema);
}
self.register_csv(name, location, options).await?;
Ok(Arc::new(DataFrame::new(ctx.state.clone(), &plan)))
Expand All @@ -395,7 +405,7 @@ impl BallistaContext {
name,
location,
ParquetReadOptions::default()
.table_partition_cols(table_partition_cols.to_vec()),
.table_partition_cols(table_partition_cols),
)
.await?;
Ok(Arc::new(DataFrame::new(ctx.state.clone(), &plan)))
Expand All @@ -405,7 +415,7 @@ impl BallistaContext {
name,
location,
AvroReadOptions::default()
.table_partition_cols(table_partition_cols.to_vec()),
.table_partition_cols(table_partition_cols),
)
.await?;
Ok(Arc::new(DataFrame::new(ctx.state.clone(), &plan)))
Expand Down Expand Up @@ -582,6 +592,7 @@ mod tests {
table_partition_cols: x.table_partition_cols.clone(),
collect_stat: x.collect_stat,
target_partitions: x.target_partitions,
file_sort_order: None,
};

let table_paths = listing_table
Expand Down
Loading

0 comments on commit 763aa23

Please sign in to comment.