-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop copying LogicalPlan and Exprs in OptimizeProjections
(2% faster planning)
#10405
Conversation
3e8264c
to
6bbc24d
Compare
// Test outer projection isn't discarded despite the same schema as inner | ||
// https://github.com/apache/datafusion/issues/8942 | ||
#[test] | ||
fn test_derived_column() -> Result<()> { | ||
let table_scan = test_table_scan()?; | ||
let plan = LogicalPlanBuilder::from(table_scan) | ||
.project(vec![col("a"), lit(0).alias("d")])? | ||
.project(vec![col("a").add(lit(1)).alias("a"), lit(0).alias("d")])? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this change the projection is actually merged
This is due to using the general purpose transform_up
rather than special casing certain Expr
types in the projection rewrite and now CASE
is handled
In order to prevent merging the projections, I needed to make the lower projection non trivial so changed it from a
to a + 1 as a
Note that the tests for correctness added for #8942 in
#8960 such as consecutive_projection_same_schema
are also still passing
02)--SubqueryAlias: NUMBERS | ||
03)----Projection: Int64(1) AS a, Int64(2) AS b, Int64(3) AS c | ||
04)------EmptyRelation | ||
01)SubqueryAlias: NUMBERS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe these plans are better -- they have one less unecessary Projection
(I believe the changes are due to better tracking of child rewrites during optimization)
4a7aa33
to
f80b800
Compare
pub fn unalias(self) -> Expr { | ||
match self { | ||
Expr::Alias(alias) => *alias.expr, | ||
_ => self, | ||
} | ||
} | ||
|
||
/// Recursively potentially multiple aliases from an expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is trim_expr
renamed and with examples
) -> Result<Transformed<LogicalPlan>> { | ||
// Recursively rewrite any nodes that may be able to avoid computation given | ||
// their parents' required indices. | ||
match plan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order of handling plan nodes changed so the ones that rewrite the plan
directly do so in the first match
statement and those that compute required indices do so in a second match
fn optimize_projections( | ||
plan: &LogicalPlan, | ||
plan: LogicalPlan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the fact this function no takes plan
rather than plan
is the key change that permits this PR to avoid copying as much
ac5c77e
to
43d8f0e
Compare
43d8f0e
to
b15d1f4
Compare
OptimizeProjections
OptimizeProjections
(2% faster planning)
06)------Projection: nodes.id + Int64(1) AS id | ||
07)--------Filter: nodes.id < Int64(10) | ||
08)----------TableScan: nodes | ||
01)SubqueryAlias: nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in fact it removes another projection layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is correct
} | ||
LogicalPlan::Window(window) => { | ||
let input_schema = window.input.schema(); | ||
let input_schema = window.input.schema().clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this clone?
let input_schema = inputs[0].schema(); | ||
// If inputs are not pruned do not change schema | ||
// TODO this seems wrong (shouldn't we always use the schema of the input?) | ||
let schema = if schema.fields().len() == input_schema.fields().len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that shouldn't happen.... we can follow up to run tests with only 1 schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it doesn't make sense -- however, it is the same logic as in with_new_exprs
:
datafusion/datafusion/expr/src/logical_plan/plan.rs
Lines 913 to 925 in fad16e7
LogicalPlan::Union(Union { schema, .. }) => { | |
let input_schema = inputs[0].schema(); | |
// If inputs are not pruned do not change schema. | |
let schema = if schema.fields().len() == input_schema.fields().len() { | |
schema.clone() | |
} else { | |
input_schema.clone() | |
}; | |
Ok(LogicalPlan::Union(Union { | |
inputs: inputs.into_iter().map(Arc::new).collect(), | |
schema, | |
})) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #10442
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @alamb for making tha planner better
Thanks for the review @comphead fyi @mustafasrepo |
…r planning) (apache#10405) * Add `LogicalPlan::recompute_schema` for handling rewrite passes * Stop copying LogicalPlan and Exprs in `OptimizeProjections`
Note this also has the changes from #10410 in it
Which issue does this PR close?
Closes #10209
Rationale for this change
Make planning faster by not copying as much
What changes are included in this PR?
OptimizeProjections
to use treenode APIsAre these changes tested?
Existing tests
Are there any user-facing changes?
CASE
expressions)Benchmark Results show a moderate improvement (2% overall in tpch, but some queries like Q3 are like 10% faster)
Details