Deprecate SessionContext::create_logical_plan (#4617) #4679

tustvold · 2022-12-20T12:58:17Z

Which issue does this PR close?

Part of #4617

Rationale for this change

This is part of moving query execution off SessionContext onto DataFrame, ensuring that a query is executed against the same SessionState that planned it. It is also in many cases significantly less code.

This also will drastically reduce the churn from #4607, as large portions of that PR were related to create_physical_plan becoming async.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

tustvold · 2022-12-20T12:59:11Z

datafusion/core/src/dataframe.rs

+    /// Note: This method should not be used outside testing, as it loses the snapshot
+    /// of the [`SessionState`] attached to this [`DataFrame`] and consequently subsequent
+    /// operations may take place against a different state
+    #[deprecated(note = "Use DataFrame::to_optimized_plan")]


I opted to deprecate this method as it was a touch confusing, having a method that actually performed a potentially mutating optimization pass without explicitly calling it out

tustvold · 2022-12-20T13:01:04Z

datafusion/core/src/execution/context.rs

+
+        // create a query planner
+        let plan = {
+            // TODO: Move catalog off SessionState onto SessionContext


This somewhat relates to #4607 but also serves avoids issues due to the interior mutability of CatalogList

tustvold · 2022-12-20T13:03:32Z

benchmarks/src/bin/nyctaxi.rs

-    let plan = ctx.create_logical_plan(sql)?;
-    let plan = ctx.optimize(&plan)?;
+    let dataframe = ctx.sql(sql).await?;
    if debug {
-        println!("Optimized logical plan:\n{:?}", plan);
+        println!("Optimized logical plan:\n{:?}", dataframe.logical_plan());
    }
-    let physical_plan = ctx.create_physical_plan(&plan).await?;
-    let task_ctx = ctx.task_ctx();
-    let result = collect(physical_plan, task_ctx).await?;


As ctx is interior mutable this would effectively plan, optimize, and execute potentially against at least 3 different states 😱

tustvold · 2022-12-20T13:05:14Z

datafusion/core/tests/sql/subqueries.rs

@@ -252,15 +252,8 @@ async fn tpch_q17_correlated() -> Result<()> {
        );"#;

    // assert plan
-    let plan = ctx
-        .create_logical_plan(sql)
-        .map_err(|e| format!("{:?} at {}", e, "error"))


These map_err were only adding a context of "error" so I just opted to remove them 😅

tustvold · 2022-12-20T13:06:12Z

datafusion/core/src/dataframe.rs

@@ -714,6 +737,12 @@ impl DataFrame {
        }
    }

+    /// Convert a prepare logical plan into its inner logical plan with all params replaced with their corresponding values
+    pub fn with_param_values(self, param_values: Vec<ScalarValue>) -> Result<Self> {
+        let plan = self.plan.with_param_values(param_values)?;


I plan to move this off LogicalPlan and onto LogicalPlanBuilder in a subsequent PR, to be consistent with the rest of the system

tustvold · 2022-12-20T13:52:21Z

datafusion/core/tests/sql/errors.rs

@@ -143,6 +135,7 @@ async fn invalid_qualified_table_references() -> Result<()> {
 }

 #[tokio::test]
+#[allow(deprecated)] // TODO: Remove this test once create_logical_plan removed


The new API prevents this sort of issue

alamb

Thank you @tustvold . I think this is a great change ❤️

I really like how this is improving the APIs to make them more consistent and slowly moving us towards a more consistent design.

cc @andygrove @mingmwang @xudong963 @yahoNanJing @liukun4515

I think we should leave this open for at least another 24 hours to give others a chance to review if they want or note they would like more time.

alamb · 2022-12-20T18:42:40Z

datafusion/core/src/dataframe.rs

+        &self.plan
+    }
+
+    /// Return the logical plan represented by this DataFrame without running the optimizers


👍
I wonder if we should call it into_unoptimized_plan for consistency 🤔

datafusion/core/src/dataframe.rs

alamb · 2022-12-20T18:50:17Z

datafusion/core/tests/sql/avro.rs

@@ -121,11 +121,8 @@ async fn avro_single_nan_schema() {
    .await
    .unwrap();
    let sql = "SELECT mycol FROM single_nan";
-    let plan = ctx.create_logical_plan(sql).unwrap();


🙈 -- this pattern is very old in the DataFusion codebase -- I think it was simply copied around and has never been cleaned up. Thank you 🙏

alamb · 2022-12-20T18:53:55Z

datafusion/core/tests/sql/mod.rs

-        .map_err(|e| format!("{:?} at {}", e, msg))
-        .unwrap();
-
-    assert_eq!(logical_schema.as_ref(), optimized_logical_schema.as_ref());


This change appears to have lost the check logical schema and optimized schema are the same. I think that is a valuable check to have in tests

alamb · 2022-12-20T18:54:53Z

datafusion/core/tests/sql/select.rs

-            num_rows += batch.num_rows();
-        }
-    }
-    assert_eq!(20, num_rows);


I agree that the 20 output row check is covered by assert_batches_eq 👍

xudong963

The title is "Deprecate SessionContext::create_physical_plan", but If I'm not mistaken, what is actually discarded is SessionContext::create_logical_plan?

tustvold · 2022-12-21T12:16:07Z

Yup, got ahead of myself 😅 That will be next

andygrove

I don't have time to review in detail but LGTM.

…physical-plan

tustvold · 2022-12-21T18:20:12Z

Unless there any objections I intend to merge this in the next few hours, hopefully before it ends up with merge conflicts again 😅

ursabot · 2022-12-21T20:51:57Z

Benchmark runs are scheduled for baseline = bfef105 and contender = cb096f6. cb096f6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Deprecate SessionContext::create_phsyical_plan (apache#4617)

9c9444b

github-actions bot added the core Core DataFusion crate label Dec 20, 2022

tustvold commented Dec 20, 2022

View reviewed changes

Fix tests

5c23f09

tustvold commented Dec 20, 2022

View reviewed changes

tustvold changed the title ~~Deprecate SessionContext::create_phsyical_plan (#4617)~~ Deprecate SessionContext::create_physical_plan (#4617) Dec 20, 2022

tustvold requested a review from alamb December 20, 2022 18:27

alamb approved these changes Dec 20, 2022

View reviewed changes

xudong963 reviewed Dec 21, 2022

View reviewed changes

tustvold changed the title ~~Deprecate SessionContext::create_physical_plan (#4617)~~ Deprecate SessionContext::create_logical_plan (#4617) Dec 21, 2022

andygrove approved these changes Dec 21, 2022

View reviewed changes

tustvold added 3 commits December 21, 2022 17:20

Merge remote-tracking branch 'upstream/master' into deprecate-create-…

832dbac

…physical-plan

Fix merge conflicts

838c9a9

Rename to_* to into_*

f15f1e1

Add schema check to execute_to_batches

64c969d

alamb merged commit cb096f6 into apache:master Dec 21, 2022

This was referenced Dec 23, 2022

Non-deprecated support for planning SQL without DDL #4720

Closed

Non-deprecated support for planning SQL without DDL, deprecate some more SessionContext methods #4721

Closed

tustvold mentioned this pull request Dec 27, 2022

Deprecate SessionContext physical plan methods (#4617) #4751

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate SessionContext::create_logical_plan (#4617) #4679

Deprecate SessionContext::create_logical_plan (#4617) #4679

tustvold commented Dec 20, 2022

tustvold Dec 20, 2022 •

edited

Loading

tustvold Dec 20, 2022

tustvold Dec 20, 2022

tustvold Dec 20, 2022

tustvold Dec 20, 2022

tustvold Dec 20, 2022

alamb left a comment

alamb Dec 20, 2022

tustvold Dec 21, 2022

alamb Dec 20, 2022

alamb Dec 20, 2022

alamb Dec 20, 2022

xudong963 left a comment

tustvold commented Dec 21, 2022 •

edited

Loading

andygrove left a comment

tustvold commented Dec 21, 2022

ursabot commented Dec 21, 2022

Deprecate SessionContext::create_logical_plan (#4617) #4679

Deprecate SessionContext::create_logical_plan (#4617) #4679

Conversation

tustvold commented Dec 20, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

tustvold Dec 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xudong963 left a comment

Choose a reason for hiding this comment

tustvold commented Dec 21, 2022 • edited Loading

andygrove left a comment

Choose a reason for hiding this comment

tustvold commented Dec 21, 2022

ursabot commented Dec 21, 2022

tustvold Dec 20, 2022 •

edited

Loading

tustvold commented Dec 21, 2022 •

edited

Loading