-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update quarterly roadmap for Q2 #2133
Conversation
I'd like to ideally finalize the implementation for the streaming API and get an experimental impl available via |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
- Add more operators for memory limited execution | ||
- Performance | ||
- Incorporate row-format into operators such as aggregate | ||
- Add row-format benchmarks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add row-format benchmarks | |
- Add row-format benchmarks | |
- Explore JIT-compiling complex expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @matthewmturner
This is going to be an exciting Q2!
|
||
### DataFusion Core | ||
|
||
- Publish official Arrow2 branch | ||
- Implementation of memory manager (i.e. to enable spilling to disk as needed) | ||
- IO Improvements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @tustvold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely sure what this specifically is referring to, but I definitely intend to focus on improving the IO and scheduling stories in arrow-rs and DataFusion. See apache/arrow-rs#1473 and #2079. Not sure if we want to explicitly call out the scheduling side of this.
I may also get to proper filter pushdown to parquet if I have time - apache/arrow-rs#1191
Edit: I've proposed a change with a very high-level statement of what I hope to achieve w.r.t scheduling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Incorporate row-format into operators such as aggregate | ||
- Add row-format benchmarks | ||
- Explore LLVM for JIT, with inline Rust functions as the primary goal | ||
- Documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Documentation | |
- Improve performance of Sort and Merge using Row Format / JIT expressions | |
- Documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope to contribute improvements to the Sort performance (especially for multi-column sorts that include strings) this quarter as well. I don't have any writeup of that yet
- IO Improvements | ||
- Reading, registering, and writing more file formats from both DataFrame API and SQL | ||
- Additional options for IO including partitioning and metadata support | ||
- Memory Management |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Memory Management | |
- Work Scheduling | |
- Improve predictability, observability and performance of IO and CPU-bound work | |
- Develop a more explicit story for managing parallelism during plan execution | |
- Memory Management |
I've yet to create a ticket for this, as I'm still exploring the problem domain, but the precursor discussions can be found apache/arrow-rs#1473 and #2079.
thank you @alamb, @Dandandan, and @tustvold for the suggestions. I will get them added shortly. |
Merging and we can keep iterating / updating in follow on PRs if needed. Thanks again @matthewmturner |
I filed #2427 with some of my thoughts |
Which issue does this PR close?
Closes #1971
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?