-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow Row Format in SortExec #5230 #5242
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jaylmiller -- this is looking good. Very nice first contribution
I think this code could almost go in as it -- the only thing that I think we need to see is some sort of benchmark improvement.
I looked around in our existing benchmarks to find a benchmark for our sort operator, but was not able to. Maybe @tustvold or @Dandandan or @andygrove know of a preexisting benchmark
If not, perhaps we can adapt some of the code in https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/benches/merge.rs to make a SortExec benchmark
Thanks! Should I add said benchmark to this PR then? |
@alamb Does this SortExec benchmark seem good? |
Yes, I think it looks like a good start. Thank you @jaylmiller . Could you please create a new PR with just the sort benchmark (so we can merge that and use it as a way to compare the code on master and this PR)? Thanks again! |
Which issue does this PR close?
issue #5230
Rationale for this change
Modifying SortExec to use arrow row format should increase performance when sorting by multiple columns. Row encoding is preserved so that it can be used by parent nodes (right now that is just
SortPreservingMergeExec
).What changes are included in this PR?
Both iterations of:
Are these changes tested?
Mostly covered by existing unit tests. Added unit tests for the new publicly exported structs.
Are there any user-facing changes?
Some public exports had to be added to be able to be used in the integration test
core/tests/sort_key_cursor
.