-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SortExec No Longer Streams Correctly #1939
Comments
Thanks, @tustvold . I'll have time to work on this later in the week. 😅 Please let me know if it has a higher priority. |
No rush, whenever you have time 😀 |
@yjshen Are you still planning to pick this one up, otherwise I can take a stab at it. It's causing me some excitement whilst experimenting with query scheduling, as the sort is taking place during the plan 😆 |
Please go ahead. I am occupied with enabling all TPC-DS queries in Blaze and getting performance gains these days. Sorry for the delay. |
Describe the bug
https://github.com/apache/arrow-datafusion/pull/1596/files#diff-68811b72d27f9f5173223e0da1af2a467c2e4fff2f5f2237665fa29e1a6575c0L165 appears to have accidentally changed the behaviour of
SortExec
so that it no longer returns a stream that performs the sort operation, but instead performs the sort withinExecutionPlan::execute
.This effectively stalls out constructing the rest of the physical plan until the sort has completed, and prevents result streaming from working correctly.
To Reproduce
Run a query with a large
SortExec
, observe surprising amount of time spent inExecutionPlan::execute
Expected behavior
ExecutionPlan::execute
should return a stream of results, but should not block on those results being availableAdditional context
This resulted in what looked like missing traces in IOx (https://github.com/influxdata/influxdb_iox/issues/3822) as it never actually finished constructing the physical plan from which to collect metrics 😅
FYI @yjshen
The text was updated successfully, but these errors were encountered: