-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ballista does not support external file systems #10
Comments
Hi @ZhangqyTJ are you right that right now ballista is hardcoded to use local file system object store. Supporting remote file system is in our roadmap and we have all the infrastructure in place to execute on this now thanks to the object store abstraction. As you already mentioned, the main change that needs to be introduced is the plan ser/de layer. You are more than welcome to work on this if you would like to. I don't think anyone is actively working on this at the moment. |
We had some prior discussions around this subject in apache/datafusion#1072 |
I will try to solve this issue |
I think I have resolved this issue: |
Could you share a reproducible code example with us? It would be really hard for us to locate the problem with a generic error message :) |
Project address: https://github.com/ZhangqyTJ/arrow-datafusion.git |
I added the s3 (minio_store) module in datafusion/src/datasource/object_store, and registered the minio_store in benchmarks/tpch.rs through the register_object_store() method of ExecutionContext. But when I start the Scheduler and Executor, and then run "cargo run --bin tpch --release****", the data in minio cannot be read.
After checking the code, I found that LocalFileSystem is used directly at ballista/rust/core/src/serde/physical_plan/from_proto.rs(789) and ballista/rust/core/src/serde/logical_plan/from_proto.rs(201), so I modified these two codes to minio_store and it ran successfully.
How to make Ballista support external file system?
The project address after I added minio_store
https://github.com/ZhangqyTJ/arrow-datafusion.git
Modify the code before running
ballista/rust/core/src/serde/physical_plan/from_proto.rs(789)
ballista/rust/core/src/serde/logical_plan/from_proto.rs(201)
Run command
To run the scheduler from source:
By default the scheduler will bind to
0.0.0.0
and listen on port 50050.To run the executor from source:
To run the benchmarks:
The text was updated successfully, but these errors were encountered: