-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add baseline execution stats to WindowAggExec
and UnionExec
, and fixup CoalescePartitionsExec
#1018
Conversation
Fixup tests and implelementation
"metrics=[output_rows=1, elapsed_compute=" | ||
); | ||
assert_metrics!( | ||
&formatted, | ||
"LocalLimitExec: limit=3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by making the query more complicated, it has also introduced a LocalLimitExec
for testing 🎉
// find partition to execute | ||
for input in self.inputs.iter() { | ||
// Calculate whether partition belongs to the current partition | ||
if partition < input.output_partitioning().partition_count() { | ||
return input.execute(partition).await; | ||
let stream = input.execute(partition).await?; | ||
drop(timer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this drop needed if the next line contains a return
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The drop is needed to satisify the borrow checker: let timer = baseline_metrics.elapsed_compute().timer();
has borrowed from baseline_metrics and rust won't allow baseline metrics to be given to ObserverdStream::new while borrowed.
An alternative is to clone the elapsed compute
let elapsed_compute = baseline_metrics.elapsed_compute().clone();
let timer = elapsed_compute.timer();
Which is done in several other parts of this PR. 🤔
Perhaps that would be better to keep the code consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in c0b656c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, looking forward to using those!
I think something unrelated to this PR is causing the tests to fail. I will look into it later today or tomorrow if no one else gets around to it |
Looks to me just the test in |
Thanks 👍 |
Which issue does this PR close?
Finally 🎉 closes #866 (following the same model as #1004).
There are still some operators like Parquet, CSV, Avro, and Json sources that are not instrumented, but I don't have time to devote to intrumenting them now, #1019 tracks that work
Rationale for this change
We want basic understanding of where a plan's time is spent and in what operators. See #866 for more details
What changes are included in this PR?
WindowAggExec
andUnionExec
, using the API from Add BaselineMetrics, Timestamp metrics, add forCoalescePartitionsExec
, rename output_time -> elapsed_compute #909CoalescePartitionsExec
so it reportselapsed_compute
Are there any user-facing changes?
More fields in
EXPLAIN ANALYZE
are now filled outExample of how explain analyze is looking (dense but packed with good info). I find it quite cool that DataFusion can even plan and execute such queries.