-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate SessionContext::create_logical_plan (#4617) #4679
Conversation
datafusion/core/src/dataframe.rs
Outdated
/// Note: This method should not be used outside testing, as it loses the snapshot | ||
/// of the [`SessionState`] attached to this [`DataFrame`] and consequently subsequent | ||
/// operations may take place against a different state | ||
#[deprecated(note = "Use DataFrame::to_optimized_plan")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted to deprecate this method as it was a touch confusing, having a method that actually performed a potentially mutating optimization pass without explicitly calling it out
|
||
// create a query planner | ||
let plan = { | ||
// TODO: Move catalog off SessionState onto SessionContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This somewhat relates to #4607 but also serves avoids issues due to the interior mutability of CatalogList
let plan = ctx.create_logical_plan(sql)?; | ||
let plan = ctx.optimize(&plan)?; | ||
let dataframe = ctx.sql(sql).await?; | ||
if debug { | ||
println!("Optimized logical plan:\n{:?}", plan); | ||
println!("Optimized logical plan:\n{:?}", dataframe.logical_plan()); | ||
} | ||
let physical_plan = ctx.create_physical_plan(&plan).await?; | ||
let task_ctx = ctx.task_ctx(); | ||
let result = collect(physical_plan, task_ctx).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As ctx is interior mutable this would effectively plan, optimize, and execute potentially against at least 3 different states 😱
@@ -252,15 +252,8 @@ async fn tpch_q17_correlated() -> Result<()> { | |||
);"#; | |||
|
|||
// assert plan | |||
let plan = ctx | |||
.create_logical_plan(sql) | |||
.map_err(|e| format!("{:?} at {}", e, "error")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These map_err were only adding a context of "error"
so I just opted to remove them 😅
@@ -714,6 +737,12 @@ impl DataFrame { | |||
} | |||
} | |||
|
|||
/// Convert a prepare logical plan into its inner logical plan with all params replaced with their corresponding values | |||
pub fn with_param_values(self, param_values: Vec<ScalarValue>) -> Result<Self> { | |||
let plan = self.plan.with_param_values(param_values)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to move this off LogicalPlan
and onto LogicalPlanBuilder
in a subsequent PR, to be consistent with the rest of the system
@@ -143,6 +135,7 @@ async fn invalid_qualified_table_references() -> Result<()> { | |||
} | |||
|
|||
#[tokio::test] | |||
#[allow(deprecated)] // TODO: Remove this test once create_logical_plan removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new API prevents this sort of issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tustvold . I think this is a great change ❤️
I really like how this is improving the APIs to make them more consistent and slowly moving us towards a more consistent design.
cc @andygrove @mingmwang @xudong963 @yahoNanJing @liukun4515
I think we should leave this open for at least another 24 hours to give others a chance to review if they want or note they would like more time.
&self.plan | ||
} | ||
|
||
/// Return the logical plan represented by this DataFrame without running the optimizers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I wonder if we should call it into_unoptimized_plan
for consistency 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -121,11 +121,8 @@ async fn avro_single_nan_schema() { | |||
.await | |||
.unwrap(); | |||
let sql = "SELECT mycol FROM single_nan"; | |||
let plan = ctx.create_logical_plan(sql).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙈 -- this pattern is very old in the DataFusion codebase -- I think it was simply copied around and has never been cleaned up. Thank you 🙏
.map_err(|e| format!("{:?} at {}", e, msg)) | ||
.unwrap(); | ||
|
||
assert_eq!(logical_schema.as_ref(), optimized_logical_schema.as_ref()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change appears to have lost the check logical schema and optimized schema are the same. I think that is a valuable check to have in tests
num_rows += batch.num_rows(); | ||
} | ||
} | ||
assert_eq!(20, num_rows); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the 20 output row check is covered by assert_batches_eq
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The title is "Deprecate SessionContext::create_physical_plan
", but If I'm not mistaken, what is actually discarded is SessionContext::create_logical_plan
?
Yup, got ahead of myself 😅 That will be next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have time to review in detail but LGTM.
Unless there any objections I intend to merge this in the next few hours, hopefully before it ends up with merge conflicts again 😅 |
Benchmark runs are scheduled for baseline = bfef105 and contender = cb096f6. cb096f6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Part of #4617
Rationale for this change
This is part of moving query execution off
SessionContext
ontoDataFrame
, ensuring that a query is executed against the sameSessionState
that planned it. It is also in many cases significantly less code.This also will drastically reduce the churn from #4607, as large portions of that PR were related to
create_physical_plan
becoming async.What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?