DEMO: Introduce `TreeNodeMutator` for rewriting `TreeNode`s in place, change optimizer to rewrite `LogicalPlan` in place - 10% faster planning time #9780

alamb · 2024-03-24T13:50:48Z

Note this PR looks large, but it is mostly comments and changes of &plan to plan in tests

Which issue does this PR close?

Part of #9637 (based on some ideas from #9708). 🙏 @jayzhan211

Rationale for this change

TLDR is that the optimizer currently copies LogicalPlans many times unnecessarily (I think it is quadratic if not worse in the number of tree nodes). This is both slow and requires many memory allocations.

Note we can do even better, but I want to keep the PR sizes as small as possible.

What changes are included in this PR?

Add in place mutate / Mutator APIs to TreeNode
Implement mutate for LogicalPlan (aka &mut LogicalPlan)
Rewrite Optimizer to use TreeNode API

I am trying to keep this PR relatively small, and I have other improvements in mind for follow on PRs (see below). This PR is now finally feasible thanks to the really nice TreeNode cleanup from @peter-toth and @berkaysynnada in #8891

Are these changes tested?

Functionally covered by existing tests, and I measured performance using the benchmarks -- I see an mprovement of about 10% across TPCH queries.

Details

++ critcmp main optimizer_tree_node
++ critcmp main optimizer_tree_node
group                                         main                                    optimizer_tree_node
-----                                         ----                                    -------------------
logical_aggregate_with_join                   1.01  1287.8±105.61µs        ? ?/sec    1.00   1274.7±7.85µs        ? ?/sec
logical_plan_tpch_all                         1.00     17.3±0.16ms        ? ?/sec     1.00     17.3±0.14ms        ? ?/sec
logical_select_all_from_1000                  1.00     93.4±1.40ms        ? ?/sec     1.02     95.3±0.34ms        ? ?/sec
logical_select_one_from_700                   1.00    740.6±8.75µs        ? ?/sec     1.00   743.8±17.11µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   789.6±22.14µs        ? ?/sec     1.00   792.5±19.06µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   755.3±13.71µs        ? ?/sec     1.00   759.0±18.47µs        ? ?/sec
physical_plan_tpch_all                        1.11    133.8±0.87ms        ? ?/sec     1.00    120.7±0.70ms        ? ?/sec
physical_plan_tpch_q1                         1.15      7.7±0.15ms        ? ?/sec     1.00      6.7±0.05ms        ? ?/sec
physical_plan_tpch_q10                        1.10      6.3±0.05ms        ? ?/sec     1.00      5.7±0.04ms        ? ?/sec
physical_plan_tpch_q11                        1.09      5.0±0.03ms        ? ?/sec     1.00      4.6±0.03ms        ? ?/sec
physical_plan_tpch_q12                        1.08      4.1±0.03ms        ? ?/sec     1.00      3.8±0.03ms        ? ?/sec
physical_plan_tpch_q13                        1.09      2.7±0.03ms        ? ?/sec     1.00      2.5±0.02ms        ? ?/sec
physical_plan_tpch_q14                        1.05      3.4±0.02ms        ? ?/sec     1.00      3.3±0.02ms        ? ?/sec
physical_plan_tpch_q16                        1.12      5.3±0.04ms        ? ?/sec     1.00      4.7±0.04ms        ? ?/sec
physical_plan_tpch_q17                        1.11      4.9±0.04ms        ? ?/sec     1.00      4.4±0.03ms        ? ?/sec
physical_plan_tpch_q18                        1.11      5.4±0.04ms        ? ?/sec     1.00      4.9±0.03ms        ? ?/sec
physical_plan_tpch_q19                        1.06     10.1±0.09ms        ? ?/sec     1.00      9.5±0.07ms        ? ?/sec
physical_plan_tpch_q2                         1.14     12.2±0.09ms        ? ?/sec     1.00     10.7±0.10ms        ? ?/sec
physical_plan_tpch_q20                        1.14      6.4±0.05ms        ? ?/sec     1.00      5.6±0.04ms        ? ?/sec
physical_plan_tpch_q21                        1.11      9.5±0.08ms        ? ?/sec     1.00      8.6±0.18ms        ? ?/sec
physical_plan_tpch_q22                        1.10      4.7±0.04ms        ? ?/sec     1.00      4.2±0.03ms        ? ?/sec
physical_plan_tpch_q3                         1.08      4.2±0.04ms        ? ?/sec     1.00      3.9±0.03ms        ? ?/sec
physical_plan_tpch_q4                         1.09      3.3±0.02ms        ? ?/sec     1.00      3.0±0.03ms        ? ?/sec
physical_plan_tpch_q5                         1.08      6.1±0.04ms        ? ?/sec     1.00      5.6±0.04ms        ? ?/sec
physical_plan_tpch_q6                         1.05      2.0±0.02ms        ? ?/sec     1.00  1944.8±12.65µs        ? ?/sec
physical_plan_tpch_q7                         1.11      8.6±0.09ms        ? ?/sec     1.00      7.8±0.06ms        ? ?/sec
physical_plan_tpch_q8                         1.12     12.1±0.10ms        ? ?/sec     1.00     10.8±0.09ms        ? ?/sec
physical_plan_tpch_q9                         1.13      9.2±0.07ms        ? ?/sec     1.00      8.1±0.11ms        ? ?/sec
physical_select_all_from_1000                 1.13    684.2±1.45ms        ? ?/sec     1.00    606.5±1.42ms        ? ?/sec
physical_select_one_from_700                  1.00      4.2±0.03ms        ? ?/sec     1.01      4.2±0.03ms        ? ?/sec

Are there any user-facing changes?

Faster planning
Ability to rewrite LogicalPlan in place using TreeNode API

Planned Follow Ons:

Here are follow on tickets that will improve things even more

Add API to OptimizerRule to rewrite in place (and avoid yet another copy)
Add an example of how to use the "in place tree write" API
Supporting rewriting Exprs in place in addition to LogicalPlan (implement TreeNodeMutator -- specifically ExprRewriter should be a big win)
API to support in place rewrites of UserDefinedLogicalPlan
Change other uses of TreeNodeRewriter to TreeNodeMutator (e.g. ExprRewriter, etc)

alamb · 2024-03-29T02:12:55Z

Update here is that I think this PR now has the basic code to do in-place updates of TreeNodes in general and LogicalPlans in particular 🎉

TODOs:

Run performance benchmarks to show this helps
Complete edge case implementations (DDL, user defined nodes), polish PR description

Follow ons:
(moved up)

alamb · 2024-03-29T14:55:17Z

datafusion/sqllogictest/test_files/subquery.slt

------Inner Join:  Filter: t1.t1_id = t2.t2_id AND t1.t1_name = outer_ref(t0.t0_name)
--------TableScan: t1
--------TableScan: t2
+LeftSemi Join: t0.t0_name = __correlated_sq_2.t1_name


this actually seems like an improvement to me -- as the subquery expressions are now properly visited by the rewriter

alamb · 2024-03-30T20:35:41Z

Updated description with performance results (about a 10% improvement for this PR alone). I think we can squeeze a bunch more out of this approach with the additional changes to rewrite in place (will file follow on tickets).

Proceeding to get this PR ready for review

alamb · 2024-03-31T11:11:42Z

datafusion/optimizer/src/eliminate_filter.rs

@@ -88,7 +88,7 @@ mod tests {

    use crate::test::*;

-    fn assert_optimized_plan_equal(plan: &LogicalPlan, expected: &str) -> Result<()> {
+    fn assert_optimized_plan_equal(plan: LogicalPlan, expected: &str) -> Result<()> {


A large number of test changes below are due to taking an owned LogicalPlan (to avoid having to copy it)

alamb · 2024-03-31T11:12:26Z

datafusion/optimizer/src/eliminate_limit.rs


        let formatted_plan = format!("{optimized_plan:?}");
        assert_eq!(formatted_plan, expected);
-        assert_eq!(plan.schema(), optimized_plan.schema());


The test calls Optimizer::optimze now, which already checks the schemas are the same after each rule, so there is no need to check that again.

alamb · 2024-03-31T11:14:40Z

datafusion/sqllogictest/test_files/subquery.slt

@@ -651,8 +651,8 @@ explain SELECT t1_id, t1_name FROM t1 WHERE t1_id in (SELECT t2_id FROM t2 where
 logical_plan
 Filter: t1.t1_id IN (<subquery>)
 --Subquery:
----Limit: skip=0, fetch=10
------Projection: t2.t2_id
+----Projection: t2.t2_id


The limit is now pushed under the subquery

alamb · 2024-04-01T13:05:13Z

datafusion-examples/examples/rewrite_expr.rs

@@ -59,7 +59,7 @@ pub fn main() -> Result<()> {

    // then run the optimizer with our custom rule
    let optimizer = Optimizer::with_rules(vec![Arc::new(MyOptimizerRule {})]);
-    let optimized_plan = optimizer.optimize(&analyzed_plan, &config, observe)?;
+    let optimized_plan = optimizer.optimize(analyzed_plan, &config, observe)?;


This is the only API change exposed to users -- Optimizer::optimize takes an owned LogicalPlan rather than a &LogicalPlan. This is to avoid the need to copy over and over again

alamb · 2024-04-01T13:07:52Z

datafusion/common/src/tree_node.rs

@@ -174,6 +182,70 @@ pub trait TreeNode: Sized {
        })
    }

+    /// Implements the [visitor pattern](https://en.wikipedia.org/wiki/Visitor_pattern) for


Here is the new TreeNode API to rewrite the nodes in place

I also tried Transformed<&mut Node>, but this got caught up in the borrow checker. Returning Transformed<()> seemed to work and made using this API feasible

@peter-toth and @berkaysynnada I would love any feedback on this API that you may have

alamb · 2024-04-01T13:09:50Z

datafusion/common/src/tree_node.rs

    fn apply_children<F: FnMut(&Self) -> Result<TreeNodeRecursion>>(
        &self,
        f: &mut F,
    ) -> Result<TreeNodeRecursion>;

-    /// Apply transform `F` to the node's children. Note that the transform `F`
-    /// might have a direction (pre-order or post-order).
+    /// Rewrite the node's children in place using `F`.


I chose to leave mutate_children unimplemented for Expr and the other TreeNodes initially to reduce the size of this PR (and thus the review burden). I will file a follow on ticket to track doing so

alamb · 2024-04-01T13:12:27Z

datafusion/common/src/tree_node.rs

+    ///
+    /// * If `f` returns Ok, sets `self.transformed` to `true` if either self or
+    /// the result of `f` were transformed.
+    pub fn and_then<F>(self, f: F) -> Result<Transformed<()>>


Introducing this function allowed the implementation of mutate and mutate children to be more concise. I couldn't figure out how to implement it in general for two Transformed as there are two different payloads

alamb · 2024-04-01T13:13:01Z

datafusion/expr/src/logical_plan/mutate.rs

+use std::sync::{Arc, OnceLock};
+
+impl LogicalPlan {
+    /// applies `f` to each expression of this node, potentially rewriting it in


This file contains the implementation to allow rewriting the LogicalPlan in place. rewrite_exprs is needed initially to rewrite subquery references. I also expect we will be able to use this to implement more performant optimizer passes in the future (e.g. ExprSimplifier)

alamb · 2024-04-01T13:18:59Z

datafusion/optimizer/src/optimizer.rs

-            .collect();
-
-        let exprs = plan.expressions();
-        plan.with_new_exprs(exprs, new_inputs).map(Some)


this used to copy all expressions and the entire LogicalPlan at least once

alamb · 2024-04-01T13:19:37Z

datafusion/optimizer/src/optimizer.rs

-                    let optimize_self_opt = self.optimize_node(rule, plan, config)?;
-                    let optimize_inputs_opt = match &optimize_self_opt {
-                        Some(optimized_plan) => {
-                            self.optimize_inputs(rule, optimized_plan, config)?


each of these calls also copies the plan

alamb · 2024-04-01T13:20:34Z

datafusion/optimizer/src/optimizer.rs

        assert_eq!(
            "Optimizer rule 'get table_scan rule' failed\ncaused by\nget table_scan rule\ncaused by\n\
            Internal error: Failed due to a difference in schemas, \
-            original schema: DFSchema { fields: [\


this test actually shows the output was reversed -- the original schema was [] not the table schema (which is what is changed by the optimizer)

alamb · 2024-04-01T13:21:33Z

datafusion/optimizer/src/push_down_filter.rs

-                &OptimizerContext::new(),
-            )?
-            .unwrap_or_else(|| plan.clone());
+        let optimized_plan =


This now uses the Optimizer directly rather than calling optimize_recursively (which the Optimizer does internally)

alamb · 2024-04-01T13:22:17Z

datafusion/optimizer/src/test/mod.rs

+    fn observe(_plan: &LogicalPlan, _rule: &dyn OptimizerRule) {}
+
+    // in tests we are applying only one rule once
+    let opt_context = OptimizerContext::new().with_max_passes(1);


it turns out if you apply this rule multiple times, it changes the output. I explicitly made the test only run it a single time to match the previous behavior

Use TreeNode API in Optimizer

jayzhan211

🚀

jayzhan211 · 2024-04-01T14:15:50Z

datafusion/expr/src/logical_plan/mutate.rs

+        .clone();
+
+    // take the old value out of the Arc
+    std::mem::swap(node, &mut new_node);


I think we can use std::mem::take here.

There is no default impl for Arc<LogicalPlan> so I am not sure how to use the take API here 🤔

I think we can have a default implementation with empty relation and empty schema, like the placeholder here

I agree it would be nice -- maybe like adding a Default impl for LogicalPlan (that is empty relation?)

jayzhan211 · 2024-04-01T14:21:06Z

datafusion/expr/src/logical_plan/mutate.rs

+/// Applies `f` to rewrite the existing node, while avoiding `clone`'ing as much
+/// as possiblw.
+///
+/// TODO eventually remove `Arc<LogicalPlan>` from `LogicalPlan` and have it own


Do you mean we can have LogicalPlan eventually? Not even with Box<LogicalPlan>? I think we may need Box for recursive plan

Yeah, sorry I meant use Box eventually

jayzhan211 · 2024-04-01T14:26:14Z

datafusion/expr/src/logical_plan/mutate.rs

+    // we make a temporary Expr to rewrite and then put it back
+
+    let mut swap_column = Column::new_unqualified("TEMP_unnest_column");
+    std::mem::swap(column, &mut swap_column);


I think we can have mem::take here too with a default implementation for Column

I tried using take and it seems like there is no default impl for Column 🤔

error[E0277]: the trait bound `datafusion_common::Column: std::default::Default` is not satisfied --> datafusion/expr/src/logical_plan/mutate.rs:288:41 | 288 | let mut swap_column= std::mem::take(column); | -------------- ^^^^^^ the trait `std::default::Default` is not implemented for `datafusion_common::Column` | | | required by a bound introduced by this call

yes, I would prefer "" as default column if it makes sense

I see -- the reason I left a TEMP... name here was so that on error it was obvious that the Column was partially rewritten.

Given the comments below, however, it seems like we may not even need this API

jayzhan211 · 2024-04-01T14:35:40Z

datafusion/common/src/tree_node.rs

+                // rewrite children recursively with mutator
+                self.mutate_children(|c| c.mutate(mutator))?
+                    .try_transform_node_with(
+                        |_: ()| mutator.f_up(self),


It seems |_| is also fine. what is () for?

The () is the type of the Node in Transform<()> which is somewhat confusing

peter-toth · 2024-04-02T09:21:36Z

I wonder if it would make sense to split this PR into 2 parts as it seem to mix 2 very different ideas.

It introduces an in place mutation TreeNode API. First of all, I really like that API and I too think it can be faster than the current consume and procude style ones (TreeNode::rewrite(), TreeNode::transform_down_up()). But I think the only reason why it is faster is because only smaller references (e.g. &mut LogicalPlans) are stored on the stack during a transformation and not the original structures themselves (LogicalPlans). Otherwise the 2 APIs are very similar, there should be nothing that you can do with a &mut LogicalPlans but you can't do with an owned LogicalPlan.
It uses a trick in rewrite_arc to avoid copying Arcs. But this trick shouldn't require introducing the new API, it should simply work with changing the existing LogicalPlan::map_children().

So I don't think we need a new API for #9768. Don't get me wrong, I favour the new API but I would rather move that to a completely separate PR and slowly transform the code to using that in the future.

As for the new LogicalPlan methods (rewrite_inputs(), rewrite_subqueries(), rewrite_exprs(), ...) in this PR, the problem is that they use the new Transformed::and_then() method to iterate over sibling nodes in the tree (e.g. https://github.com/apache/arrow-datafusion/pull/9780/files#diff-b2431d53b487c80c797aecd88859229fb902f74bf888a598878386055b717bbcR188-R189 or https://github.com/apache/arrow-datafusion/pull/9780/files#diff-b2431d53b487c80c797aecd88859229fb902f74bf888a598878386055b717bbcR229-R230). But that's not correct as the new and_then() doesn't respect TreeNodeRecursion::Stop behaviour. We already have TransformedIterator::map_until_stop_and_collect() and TreeNode::try_transform_node() to chain transformations in such cases (see examples in Expr::map_children()). BTW, I'm trying to simplyfy those try_transform_node chained calls in #9876.

…node

alamb · 2024-04-02T10:59:28Z

Thank you for the feedback @peter-toth

I wonder if it would make sense to split this PR into 2 parts as it seem to mix 2 very different ideas.

I can definitely split it into two PRs. My rationale for one PR was that a PR for TreeNode::mutate API would be too abstract.

It introduces an in place mutation TreeNode API. First of all, I really like that API and I too think it can be faster than the current consume and procude style ones (TreeNode::rewrite(), TreeNode::transform_down_up()). But I think the only reason why it is faster is because only smaller references (e.g. &mut LogicalPlans) are stored on the stack during a transformation and not the original structures themselves (LogicalPlans).

I agree

Otherwise the 2 APIs are very similar, there should be nothing that you can do with a &mut LogicalPlans but you can't do with an owned LogicalPlan.

The only difference I see in the APIs is that using &mut LogicalPlan leaves a partially rewritten node on error, whereas an owned LogicalPlan consumes the node on error.

It uses a trick in rewrite_arc to avoid copying Arcs. But this trick shouldn't require introducing the new API, it should simply work with changing the existing LogicalPlan::map_children().

Changing map_children to rewrite the Arc'd inputs is a very interesting idea -- I think we could do that in a separate PR. I'll give it a try

So I don't think we need a new API for #9768. Don't get me wrong, I favour the new API but I would rather move that to a completely separate PR and slowly transform the code to using that in the future.

As for the new LogicalPlan methods (rewrite_inputs(), rewrite_subqueries(), rewrite_exprs(), ...) in this PR, the problem is that they use the new Transformed::and_then() method to iterate over sibling nodes in the tree (e.g. https://github.com/apache/arrow-datafusion/pull/9780/files#diff-b2431d53b487c80c797aecd88859229fb902f74bf888a598878386055b717bbcR188-R189 or https://github.com/apache/arrow-datafusion/pull/9780/files#diff-b2431d53b487c80c797aecd88859229fb902f74bf888a598878386055b717bbcR229-R230). But that's not correct as the new and_then() doesn't respect TreeNodeRecursion::Stop behaviour. We already have TransformedIterator::map_until_stop_and_collect() and TreeNode::try_transform_node() to chain transformations in such cases (see examples in Expr::map_children()). BTW, I'm trying to simplyfy those try_transform_node chained calls in #9876.

This is also an excellent call -- I actually tried initially to use TransformedIterator::map_until_stop_and_collect() but I couldn't make it work with the generics (it needed a function that took (Self) as I recall.

So how about this for a plan:

I'll make a PR that changes LogicalPlan::map_children to avoid copies if possible (which should be beneficial in many places in DataFusion)
I'll create a PR that changes the Optimizer to use a TreeNodeRewriter (rather than TreeNodeMutator) which will avoid even more copies after 1 is complete.

We can then compare the performance of the TreeNodeRewriter to this PR with TreeNodeMutator and decide and see if the additional TreeNodeMutator API is worth the extra API surface / migration.

peter-toth · 2024-04-02T11:31:30Z

So how about this for a plan:

I'll make a PR that changes LogicalPlan::map_children to avoid copies if possible (which should be beneficial in many places in DataFusion)

I'll create a PR that changes the Optimizer to use a TreeNodeRewriter (rather than TreeNodeMutator) which will avoid even more copies after 1 is complete.

We can then compare the performance of the TreeNodeRewriter to this PR with TreeNodeMutator and decide and see if the additional TreeNodeMutator API is worth the extra API surface / migration.

Sounds very good to me, I'm really curious how much improvement the changes can bring.

peter-toth · 2024-04-02T12:12:49Z

datafusion/expr/src/logical_plan/mutate.rs

+    /// applies `f` to LogicalPlans in any subquery expressions
+    ///
+    /// If Err is returned, the plan may be left in a partially modified state
+    fn rewrite_subqueries<F>(&mut self, mut f: F) -> Result<Transformed<()>>


I'm a bit concerned about this part. I mean the TreeNode APIs are currently inconsistent about handling "inner" trees. I'm refferring to LogicalPlan for example. It overrides TreeNode::apply() and TreeNode::visit() to apply f on subquery's logical plans too.

But first of all, none of the other trees do that,

secodly, LogicalPlan doesn't override rewrite() or transform...() so transformations are applied only the "current" tree and not on "whole" logical plan.

I'm not sure that the current state is right or wrong as I can imagine both usescases in which

a transformations needs to be applied on the current tree only

a transformations need to change the whole tree (subplans trees included)

Now, this code seems to apply f on subquery logical plans too. If we keep it that way mutate() will be inconsistent with other APIs. But I'm also not sure that why it applies f on the root node only? Shouldn't it call subquery.subquery.mutate(f) to start a traversal on the subquery plan tree?

Is it needed because of the Optimizer? Does it optimize the whole logical plan at once? If so, I think we should introduce new APIs on LogicalPlan that is different to the existing TreeNode APIs. In Spark for example we have TreeNode.transform() and QueryPlan.transformWithSubqueries() for the different purposes.

Shall I open a PR that adds LogicalPlan::rewriteWithSubqueries() and LogicalPlan::transform...WithSubqueries()?

Now, this code seems to apply f on subquery logical plans too. If we keep it that way mutate() will be inconsistent with other APIs

I actually found I needed to handle recursing into subqueries to get some optimizer tests tests passing. Interestingly, recursing into subqueries also seems to have improved planning in some cases like #9780 (comment)

Is it needed because of the Optimizer? Does it optimize the whole logical plan at once? If so, I think we should introduce new APIs on LogicalPlan that is different to the existing TreeNode APIs. In Spark for example we have TreeNode.transform() and QueryPlan.transformWithSubqueries() for the different purposes.

OptimizerRules can currently specify if they want to handle recursion themselves or not: https://docs.rs/datafusion/latest/datafusion/optimizer/trait.OptimizerRule.html#method.apply_order

I'm not sure that the current state is right or wrong as I can imagine both usescases in which

a transformations needs to be applied on the current tree only

a transformations need to change the whole tree (subplans trees included)

I agree I can see usecases for both. I think the optimizer rules that handle subqueries specially currently do their own walk of the tree. For example https://github.com/apache/arrow-datafusion/blob/cc1db8a2043c73bda7adec309b42c08d88defab8/datafusion/optimizer/src/decorrelate_predicate_subquery.rs doesn't specify apply_order and walks down the tree

Shall I open a PR that adds LogicalPlan::rewriteWithSubqueries() and LogicalPlan::transform...WithSubqueries()?

It seems like a good idea to me. If you you are going to do this, however, I would recommend finding some manual recursion / optimizer pass (maybe decorrelate_predicate_subquery.rs?) that you can use for a usecase of the new API.

Thank you again for all your help

I've opened a PR here: #9913.
I think the new LogicalPlan methods like LogicalPlan::transform_expressions() can be useful for 2. in #9780 (comment)

Update here I have created two PRs as discussed

Refactor Optimizer to use owned plans and TreeNode API (10% faster planning) #9948

Avoid copying (so much) for LogicalPlan::map_children #9946

Unfortunately, currently individually they either don't improve performance or actually slow things down.

I am going to test what happens when I run them both together

When I combined both #9948 and #9946 tpch planning time improves by 12% (basically the same as this PR)

Thus I conclude we do not need the tree mutator API quite yet. Let's continue the discussions on those other PRs

Details

group both_tests main optimizer_tree_node2 ----- ---------- ---- -------------------- logical_aggregate_with_join 1.01 1203.3±30.57µs ? ?/sec 1.00 1197.3±17.89µs ? ?/sec 1.00 1194.4±22.14µs ? ?/sec logical_plan_tpcds_all 1.00 157.6±2.32ms ? ?/sec 1.00 157.3±3.68ms ? ?/sec 1.00 157.6±3.67ms ? ?/sec logical_plan_tpch_all 1.01 16.9±0.34ms ? ?/sec 1.00 16.7±0.22ms ? ?/sec 1.04 17.5±0.37ms ? ?/sec logical_select_all_from_1000 1.01 19.6±0.17ms ? ?/sec 1.00 19.4±0.17ms ? ?/sec 1.00 19.4±0.17ms ? ?/sec logical_select_one_from_700 1.01 790.8±12.36µs ? ?/sec 1.00 784.0±12.23µs ? ?/sec 1.00 786.1±12.87µs ? ?/sec logical_trivial_join_high_numbered_columns 1.00 740.8±12.72µs ? ?/sec 1.00 737.2±28.09µs ? ?/sec 1.00 739.5±12.25µs ? ?/sec logical_trivial_join_low_numbered_columns 1.00 719.9±14.19µs ? ?/sec 1.02 731.2±11.31µs ? ?/sec 1.00 721.4±19.28µs ? ?/sec physical_plan_tpcds_all 1.00 1666.1±16.04ms ? ?/sec 1.12 1872.6±16.17ms ? ?/sec 1.20 2.0±0.02s ? ?/sec physical_plan_tpch_all 1.00 110.2±1.86ms ? ?/sec 1.10 121.3±1.90ms ? ?/sec 1.21 133.3±2.21ms ? ?/sec physical_plan_tpch_q1 1.00 6.2±0.11ms ? ?/sec 1.19 7.4±0.12ms ? ?/sec 1.23 7.6±0.07ms ? ?/sec physical_plan_tpch_q10 1.00 5.1±0.07ms ? ?/sec 1.11 5.6±0.11ms ? ?/sec 1.20 6.1±0.10ms ? ?/sec physical_plan_tpch_q11 1.00 4.5±0.09ms ? ?/sec 1.10 5.0±0.07ms ? ?/sec 1.20 5.4±0.10ms ? ?/sec physical_plan_tpch_q12 1.00 3.6±0.06ms ? ?/sec 1.11 4.0±0.07ms ? ?/sec 1.21 4.4±0.11ms ? ?/sec physical_plan_tpch_q13 1.00 2.4±0.04ms ? ?/sec 1.10 2.7±0.06ms ? ?/sec 1.18 2.9±0.08ms ? ?/sec physical_plan_tpch_q14 1.00 3.1±0.05ms ? ?/sec 1.09 3.4±0.06ms ? ?/sec 1.19 3.7±0.09ms ? ?/sec physical_plan_tpch_q16 1.00 4.5±0.06ms ? ?/sec 1.10 4.9±0.08ms ? ?/sec 1.24 5.6±0.10ms ? ?/sec physical_plan_tpch_q17 1.00 4.3±0.07ms ? ?/sec 1.09 4.7±0.05ms ? ?/sec 1.21 5.2±0.10ms ? ?/sec physical_plan_tpch_q18 1.00 4.6±0.09ms ? ?/sec 1.10 5.1±0.09ms ? ?/sec 1.22 5.6±0.10ms ? ?/sec physical_plan_tpch_q19 1.00 9.0±0.11ms ? ?/sec 1.06 9.5±0.11ms ? ?/sec 1.23 11.1±0.14ms ? ?/sec physical_plan_tpch_q2 1.00 9.6±0.12ms ? ?/sec 1.11 10.6±0.18ms ? ?/sec 1.23 11.7±0.14ms ? ?/sec physical_plan_tpch_q20 1.00 5.5±0.08ms ? ?/sec 1.11 6.1±0.08ms ? ?/sec 1.24 6.9±0.14ms ? ?/sec physical_plan_tpch_q21 1.00 7.6±0.11ms ? ?/sec 1.12 8.5±0.14ms ? ?/sec 1.22 9.3±0.13ms ? ?/sec physical_plan_tpch_q22 1.00 4.1±0.10ms ? ?/sec 1.10 4.5±0.10ms ? ?/sec 1.23 5.0±0.12ms ? ?/sec physical_plan_tpch_q3 1.00 3.6±0.07ms ? ?/sec 1.08 3.9±0.06ms ? ?/sec 1.19 4.3±0.11ms ? ?/sec physical_plan_tpch_q4 1.00 2.7±0.06ms ? ?/sec 1.11 3.0±0.05ms ? ?/sec 1.21 3.2±0.11ms ? ?/sec physical_plan_tpch_q5 1.00 5.3±0.08ms ? ?/sec 1.08 5.7±0.12ms ? ?/sec 1.17 6.2±0.08ms ? ?/sec physical_plan_tpch_q6 1.00 1905.1±30.83µs ? ?/sec 1.06 2.0±0.03ms ? ?/sec 1.15 2.2±0.05ms ? ?/sec physical_plan_tpch_q7 1.00 7.0±0.11ms ? ?/sec 1.08 7.6±0.12ms ? ?/sec 1.19 8.4±0.14ms ? ?/sec physical_plan_tpch_q8 1.00 8.9±0.14ms ? ?/sec 1.10 9.7±0.18ms ? ?/sec 1.19 10.6±0.16ms ? ?/sec physical_plan_tpch_q9 1.00 6.7±0.11ms ? ?/sec 1.10 7.3±0.10ms ? ?/sec 1.21 8.0±0.10ms ? ?/sec physical_select_all_from_1000 1.00 114.1±0.77ms ? ?/sec 1.13 129.2±1.14ms ? ?/sec 1.12 128.3±0.79ms ? ?/sec physical_select_one_from_700 1.00 3.9±0.08ms ? ?/sec 1.03 4.1±0.05ms ? ?/sec 1.04 4.1±0.04ms ? ?/sec

Very nice improvement! Thanks @alamb!

alamb · 2024-04-02T13:14:07Z

Converting to draft as we work on this API so someone doesn't merge it accidentally

berkaysynnada · 2024-04-02T13:57:46Z

I will review this PR and the conversation here tomorrow. Thanks @alamb, the results are so promising 🚀

alamb · 2024-04-04T16:03:47Z

Per #9780 (comment) I think we can accomplish the same thing without a mutator at this time, so closing this PR

alamb marked this pull request as draft March 24, 2024 13:50

github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules labels Mar 24, 2024

This was referenced Mar 24, 2024

Avoid clone for LogicalPlan during optimizer passes #9768

Closed

WIP: Avoid copying LogicalPlans / Exprs during OptimizerPasses #9708

Closed

[EPIC] Stop copying LogicalPlan during OptimizerPasses #9637

Closed

github-actions bot added logical-expr Logical plan and expressions and removed logical-expr Logical plan and expressions labels Mar 25, 2024

alamb force-pushed the alamb/optimizer_tree_node branch from da634a2 to 233d8e9 Compare March 29, 2024 01:58

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 29, 2024

alamb changed the title ~~WIP: Rewrite optimizer to use TreeNode API, implement in-place rewrites for LogicalPlan~~ Introduce TreeNodeMutator for rewriting TreeNodes in place, change optimizer to rewrite LogicalPlan in place Mar 29, 2024

alamb commented Mar 29, 2024

View reviewed changes

alamb commented Mar 31, 2024

View reviewed changes

github-actions bot added the core Core DataFusion crate label Mar 31, 2024

alamb commented Mar 31, 2024

View reviewed changes

alamb changed the title ~~Introduce TreeNodeMutator for rewriting TreeNodes in place, change optimizer to rewrite LogicalPlan in place~~ Introduce TreeNodeMutator for rewriting TreeNodes in place, change optimizer to rewrite LogicalPlan in place - 15% faster planning time Apr 1, 2024

alamb added the api change Changes the API exposed to users of the crate label Apr 1, 2024

alamb commented Apr 1, 2024

View reviewed changes

Add TreeNodeMutator API

1bedfec

Use TreeNode API in Optimizer

alamb force-pushed the alamb/optimizer_tree_node branch from 1011559 to 1bedfec Compare April 1, 2024 13:23

alamb marked this pull request as ready for review April 1, 2024 13:25

jayzhan211 approved these changes Apr 1, 2024

View reviewed changes

alamb self-assigned this Apr 1, 2024

Merge remote-tracking branch 'apache/main' into alamb/optimizer_tree_…

7974355

…node

peter-toth reviewed Apr 2, 2024

View reviewed changes

alamb marked this pull request as draft April 2, 2024 13:13

This was referenced Apr 4, 2024

Avoid copying (so much) for LogicalPlan::map_children #9946

Closed

Refactor Optimizer to use owned plans and TreeNode API (10% faster planning) #9948

Merged

alamb closed this Apr 4, 2024

DEMO: Introduce TreeNodeMutator for rewriting TreeNodes in place, change optimizer to rewrite LogicalPlan in place - 10% faster planning time #9780

DEMO: Introduce TreeNodeMutator for rewriting TreeNodes in place, change optimizer to rewrite LogicalPlan in place - 10% faster planning time #9780

Conversation

alamb commented Mar 24, 2024 • edited by Dandandan Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Planned Follow Ons:

alamb commented Mar 29, 2024 • edited Loading

alamb Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

alamb commented Mar 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb Apr 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayzhan211 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayzhan211 Apr 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peter-toth commented Apr 2, 2024

alamb commented Apr 2, 2024

peter-toth commented Apr 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Apr 2, 2024

berkaysynnada commented Apr 2, 2024

alamb commented Apr 4, 2024

DEMO: Introduce `TreeNodeMutator` for rewriting `TreeNode`s in place, change optimizer to rewrite `LogicalPlan` in place - 10% faster planning time #9780

DEMO: Introduce `TreeNodeMutator` for rewriting `TreeNode`s in place, change optimizer to rewrite `LogicalPlan` in place - 10% faster planning time #9780

alamb commented Mar 24, 2024 •

edited by Dandandan

Loading

alamb commented Mar 29, 2024 •

edited

Loading

alamb Mar 29, 2024 •

edited

Loading

alamb Apr 1, 2024 •

edited

Loading

jayzhan211 Apr 2, 2024 •

edited

Loading