Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple FileFormat from datafusion_data_access #2572

Merged
merged 4 commits into from
May 24, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .asf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ notifications:
pullrequests: [email protected]
jira_options: link label worklog
github:
description: "Apache Arrow DataFusion and Ballista query engines"
description: "Apache Arrow DataFusion SQL Query Engine"
homepage: https://arrow.apache.org/datafusion
labels:
- arrow
Expand Down
20 changes: 20 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,23 @@ If there are user-facing changes then we may require documentation to be updated
<!--
If there are any breaking changes to public APIs, please add the `api change` label.
-->

# Does this PR break compatibility with Ballista?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it doesn't belong in the PR 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's from the merge commit where ballista was removed


<!--
The CI checks will attempt to build [arrow-ballista](https://github.com/apache/arrow-ballista) against this PR. If
this check fails then it indicates that this PR makes a breaking change to the DataFusion API.

If possible, try to make the change in a way that is not a breaking API change. For example, if code has moved
around, try adding `pub use` from the original location to preserve the current API.

If it is not possible to avoid a breaking change (such as when adding enum variants) then follow this process:

- Make a corresponding PR against `arrow-ballista` with the changes required there
- Update `dev/build-arrow-ballista.sh` to clone the appropriate `arrow-ballista` repo & branch
- Merge this PR when CI passes
- Merge the Ballista PR
- Create a new PR here to reset `dev/build-arrow-ballista.sh` to point to `arrow-ballista` master again

_If you would like to help improve this process, please see https://github.com/apache/arrow-datafusion/issues/2583_
-->
7 changes: 3 additions & 4 deletions .github/workflows/dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,8 @@ jobs:
#
# ignore subproject CHANGELOG.md because they are machine generated
npx [email protected] --write \
'{ballista,datafusion,datafusion-examples,docs,python}/**/*.md' \
'!{ballista,datafusion,python}/CHANGELOG.md' \
'{datafusion,datafusion-examples,docs,python}/**/*.md' \
'!{datafusion,python}/CHANGELOG.md' \
README.md \
CONTRIBUTING.md \
'ballista/**/*.{ts,tsx}'
CONTRIBUTING.md
git diff --exit-code
5 changes: 0 additions & 5 deletions .github/workflows/dev_pr/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,6 @@ datafusion:
- datafusion-cli/**/*
- datafusion-examples/**/*

ballista:
- ballista/**/*
- ballista-examples/**/*

python:
- python/**/*

Expand All @@ -41,5 +37,4 @@ documentation:
- README.md
- ./**/README.md
- DEVELOPERS.md
- ballista/docs/**.*
- datafusion/docs/**.*
18 changes: 5 additions & 13 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -121,8 +121,6 @@ jobs:
cargo run --example csv_sql
cargo run --example parquet_sql
cargo run --example avro_sql --features=datafusion/avro
cd ../ballista-examples
cargo run --example test_sql --features=ballista/standalone
env:
CARGO_HOME: "/github/home/.cargo"
CARGO_TARGET_DIR: "/github/home/target"
Expand All @@ -146,6 +144,9 @@ jobs:
- uses: actions/checkout@v2
with:
submodules: true
- uses: actions/setup-python@v3
with:
python-version: '3.x'
- name: Cache Cargo
uses: actions/cache@v2
with:
Expand All @@ -162,16 +163,9 @@ jobs:
uses: ./.github/actions/setup-builder
with:
rust-version: ${{ matrix.rust }}
# Ballista is currently not part of the main workspace so requires a separate test step
- name: Run Ballista tests
- name: Run tests
run: |
export ARROW_TEST_DATA=$(pwd)/testing/data
export PARQUET_TEST_DATA=$(pwd)/parquet-testing/data
cd ballista/rust
# snmalloc requires cmake so build without default features
cargo test --no-default-features --features sled
# Ensure also compiles in standalone mode
cargo test --no-default-features --features standalone
./dev/build-arrow-ballista.sh
env:
CARGO_HOME: "/github/home/.cargo"
CARGO_TARGET_DIR: "/github/home/target"
Expand Down Expand Up @@ -237,8 +231,6 @@ jobs:
POSTGRES_PASSWORD: postgres
- name: Build datafusion-cli
run: (cd datafusion-cli && cargo build)
- name: Build ballista-cli
run: (cd ballista-cli && cargo build)
- name: Test Psql Parity
run: python -m pytest -v integration-tests/test_psql_parity.py
env:
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,6 @@ venv/*

# apache release artifacts
dev/dist

# CI
arrow-ballista
7 changes: 2 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,6 @@ list to help you get started.

This section describes how you can get started at developing DataFusion.

For information on developing with Ballista, see the
[Ballista developer documentation](ballista/docs/README.md).

### Bootstrap environment

DataFusion is written in Rust and it uses a standard rust toolkit:
Expand Down Expand Up @@ -168,7 +165,7 @@ The benchmark will automatically remove any generated parquet file on exit, howe

### Upstream Benchmark Suites

Instructions and tooling for running upstream benchmark suites against DataFusion and/or Ballista can be found in [benchmarks](./benchmarks).
Instructions and tooling for running upstream benchmark suites against DataFusion can be found in [benchmarks](./benchmarks).

These are valuable for comparative evaluation against alternative Arrow implementations and query engines.

Expand Down Expand Up @@ -263,5 +260,5 @@ $ prettier --version
After you've confirmed your prettier version, you can format all the `.md` files:

```bash
prettier -w {ballista,datafusion,datafusion-cli,datafusion-examples,dev,docs}/**/*.md
prettier -w {datafusion,datafusion-cli,datafusion-examples,dev,docs}/**/*.md
```
8 changes: 2 additions & 6 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,11 @@ members = [
"datafusion/physical-expr",
"datafusion/proto",
"datafusion/row",
"datafusion/sql",
"datafusion-examples",
"benchmarks",
"ballista/rust/client",
"ballista/rust/core",
"ballista/rust/executor",
"ballista/rust/scheduler",
"ballista-examples",
]
exclude = ["ballista-cli", "datafusion-cli"]
exclude = ["datafusion-cli"]

[profile.release]
codegen-units = 1
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Projects that adapt to or serve as plugins to DataFusion:

Here are some of the projects known to use DataFusion:

- [Ballista](ballista) Distributed Compute Platform
- [Ballista](https://github.com/apache/arrow-ballista) Distributed Compute Platform
- [Cloudfuse Buzz](https://github.com/cloudfuse-io/buzz-rust)
- [Cube Store](https://github.com/cube-js/cube.js/tree/master/rust)
- [delta-rs](https://github.com/delta-io/delta-rs)
Expand Down
Loading