-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor to support Apache Arrow #307
Comments
Hey @rupurt! I'm not planning to support Apache Arrow nor port OctoSQL to a vectorised execution engine. It's not worth the effort. Re the thesis - it's old and OctoSQL has been rewritten from scratch since. It's around 100x faster now, due to the static typing and how the execution phase now works. In general, if you're working with data where the speedup of a columnar engine would be worth it, just use https://github.com/apache/arrow-datafusion or a project built around it. It has much more manpower behind it. Arrow is a PITA to code around, especially when you want union types, repetition, and deeply nested data structures. Additionally, the Go Arrow library is way behind the Rust one (or others). To answer your last question, if you'd like to port OctoSQL to Arrow, please fork. As far as improvements go, they're welcome! However, please first create issues to discuss the details of the contributions. If you'd like to attempt this redesign on your own, your best bet is to keep the physical phase but rewrite the execution phase almost completely. Here's an experiment you can use as inspiration: https://github.com/cube2222/octosql/tree/vectorization-experiment2 |
Thank you for the info and leads @cube2222. I definitely noticed that you added a dataflow engine which is one of the reasons I'm stoked to comprehend and work with your project! I'm totally with you in regards to datafusion having a larger community and more progress. But my personal belief is that no one has really nailed the serving layer for small/medium/big data besides maybe Presto which is JVM based. I also believe that once we see the right tool every language will implement a version and go is a sleeping giant with a huge and growing fan base. Also, sometimes it's just fun to hack on interesting stuff 😄 |
Data vision and arrow is certainly cool and has momentum. Octosql is nice and low ceremony alternative. I prefer octosql. Will try to make PR on things . @cube2222 would be good to have roadmap and triage issues out to agreed bits of work that people can then work on ?? |
Howdy,
I read your thesis on the
rust-rewrite
branch. Very informative and you have obviously put a lot of thought into the tool. As you noted inPossible Improvements and Lessons Learned
columnar format and Apache Arrow seem to be the state of the art way forward.Do you have any plans to support Arrow or have you thought about what would be some required changes in your engine to make it work? I see a tonne of potential in making
OctoSQL
a general purposego
cli + embedded serving layer in something like the Kappa architecture. I've already read through much of the source and will start contributing PR's for general improvements but would like to know if I should fork and create a separate project or try to refactor your current code base over time to support Arrow.The text was updated successfully, but these errors were encountered: