Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate SessionContext::create_logical_plan (#4617) #4679

Merged
merged 6 commits into from
Dec 21, 2022

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Part of #4617

Rationale for this change

This is part of moving query execution off SessionContext onto DataFrame, ensuring that a query is executed against the same SessionState that planned it. It is also in many cases significantly less code.

This also will drastically reduce the churn from #4607, as large portions of that PR were related to create_physical_plan becoming async.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Dec 20, 2022
/// Note: This method should not be used outside testing, as it loses the snapshot
/// of the [`SessionState`] attached to this [`DataFrame`] and consequently subsequent
/// operations may take place against a different state
#[deprecated(note = "Use DataFrame::to_optimized_plan")]
Copy link
Contributor Author

@tustvold tustvold Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted to deprecate this method as it was a touch confusing, having a method that actually performed a potentially mutating optimization pass without explicitly calling it out


// create a query planner
let plan = {
// TODO: Move catalog off SessionState onto SessionContext
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This somewhat relates to #4607 but also serves avoids issues due to the interior mutability of CatalogList

Comment on lines -122 to -129
let plan = ctx.create_logical_plan(sql)?;
let plan = ctx.optimize(&plan)?;
let dataframe = ctx.sql(sql).await?;
if debug {
println!("Optimized logical plan:\n{:?}", plan);
println!("Optimized logical plan:\n{:?}", dataframe.logical_plan());
}
let physical_plan = ctx.create_physical_plan(&plan).await?;
let task_ctx = ctx.task_ctx();
let result = collect(physical_plan, task_ctx).await?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As ctx is interior mutable this would effectively plan, optimize, and execute potentially against at least 3 different states 😱

@@ -252,15 +252,8 @@ async fn tpch_q17_correlated() -> Result<()> {
);"#;

// assert plan
let plan = ctx
.create_logical_plan(sql)
.map_err(|e| format!("{:?} at {}", e, "error"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These map_err were only adding a context of "error" so I just opted to remove them 😅

@@ -714,6 +737,12 @@ impl DataFrame {
}
}

/// Convert a prepare logical plan into its inner logical plan with all params replaced with their corresponding values
pub fn with_param_values(self, param_values: Vec<ScalarValue>) -> Result<Self> {
let plan = self.plan.with_param_values(param_values)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to move this off LogicalPlan and onto LogicalPlanBuilder in a subsequent PR, to be consistent with the rest of the system

@@ -143,6 +135,7 @@ async fn invalid_qualified_table_references() -> Result<()> {
}

#[tokio::test]
#[allow(deprecated)] // TODO: Remove this test once create_logical_plan removed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new API prevents this sort of issue

@tustvold tustvold changed the title Deprecate SessionContext::create_phsyical_plan (#4617) Deprecate SessionContext::create_physical_plan (#4617) Dec 20, 2022
@tustvold tustvold requested a review from alamb December 20, 2022 18:27
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tustvold . I think this is a great change ❤️

I really like how this is improving the APIs to make them more consistent and slowly moving us towards a more consistent design.

cc @andygrove @mingmwang @xudong963 @yahoNanJing @liukun4515

I think we should leave this open for at least another 24 hours to give others a chance to review if they want or note they would like more time.

&self.plan
}

/// Return the logical plan represented by this DataFrame without running the optimizers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
I wonder if we should call it into_unoptimized_plan for consistency 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

datafusion/core/src/dataframe.rs Outdated Show resolved Hide resolved
@@ -121,11 +121,8 @@ async fn avro_single_nan_schema() {
.await
.unwrap();
let sql = "SELECT mycol FROM single_nan";
let plan = ctx.create_logical_plan(sql).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈 -- this pattern is very old in the DataFusion codebase -- I think it was simply copied around and has never been cleaned up. Thank you 🙏

.map_err(|e| format!("{:?} at {}", e, msg))
.unwrap();

assert_eq!(logical_schema.as_ref(), optimized_logical_schema.as_ref());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change appears to have lost the check logical schema and optimized schema are the same. I think that is a valuable check to have in tests

num_rows += batch.num_rows();
}
}
assert_eq!(20, num_rows);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the 20 output row check is covered by assert_batches_eq 👍

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title is "Deprecate SessionContext::create_physical_plan", but If I'm not mistaken, what is actually discarded is SessionContext::create_logical_plan?

@tustvold tustvold changed the title Deprecate SessionContext::create_physical_plan (#4617) Deprecate SessionContext::create_logical_plan (#4617) Dec 21, 2022
@tustvold
Copy link
Contributor Author

tustvold commented Dec 21, 2022

Yup, got ahead of myself 😅 That will be next

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time to review in detail but LGTM.

@tustvold
Copy link
Contributor Author

Unless there any objections I intend to merge this in the next few hours, hopefully before it ends up with merge conflicts again 😅

@alamb alamb merged commit cb096f6 into apache:master Dec 21, 2022
@ursabot
Copy link

ursabot commented Dec 21, 2022

Benchmark runs are scheduled for baseline = bfef105 and contender = cb096f6. cb096f6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants