-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deprecated support for planning SQL without DDL, deprecate some more SessionContext methods #4721
Non-deprecated support for planning SQL without DDL, deprecate some more SessionContext methods #4721
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -245,26 +245,31 @@ impl SessionContext { | |
self.state.read().config.clone() | ||
} | ||
|
||
/// Creates a [`DataFrame`] that will execute a SQL query. | ||
/// Creates a [`DataFrame`] that executes a SQL query supported by | ||
/// DataFusion, including DDL (such as `CREATE TABLE`). | ||
/// | ||
/// This method is `async` because queries of type `CREATE EXTERNAL TABLE` | ||
/// might require the schema to be inferred. | ||
/// You can use [`Self::plan_sql`] and | ||
/// [`DataFrame::create_physical_plan`] directly if you need read only | ||
/// query support (no way to create external tables, for example) | ||
/// | ||
/// This method is `async` because queries of type `CREATE | ||
/// EXTERNAL TABLE` might require the schema to be inferred. | ||
pub async fn sql(&self, sql: &str) -> Result<DataFrame> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the core SessionContext::sql API does not change |
||
let mut statements = DFParser::parse_sql(sql)?; | ||
if statements.len() != 1 { | ||
return Err(DataFusionError::NotImplemented( | ||
"The context currently only supports a single SQL statement".to_string(), | ||
)); | ||
} | ||
|
||
// create a query planner | ||
let plan = { | ||
// TODO: Move catalog off SessionState onto SessionContext | ||
let state = self.state.read(); | ||
let query_planner = SqlToRel::new(&*state); | ||
query_planner.statement_to_plan(statements.pop_front().unwrap())? | ||
}; | ||
let plan = self.plan_sql(sql)?; | ||
self.dataframe(plan).await | ||
} | ||
|
||
/// Creates a [`DataFrame`] that will execute the specified | ||
/// LogicalPlan, including DDL (such as `CREATE TABLE`). | ||
/// Use [`Self::dataframe_without_ddl`] if you do not want | ||
/// to support DDL statements. | ||
/// | ||
/// Any DDL statements are executed during this function (not when | ||
/// the [`DataFrame`] is evaluated) | ||
/// | ||
/// This method is `async` because queries of type `CREATE EXTERNAL TABLE` | ||
/// might require the schema to be inferred by performing I/O. | ||
pub async fn dataframe(&self, plan: LogicalPlan) -> Result<DataFrame> { | ||
match plan { | ||
LogicalPlan::CreateExternalTable(cmd) => { | ||
self.create_external_table(&cmd).await | ||
|
@@ -492,6 +497,15 @@ impl SessionContext { | |
} | ||
} | ||
|
||
/// Creates a [`DataFrame`] that will execute the specified | ||
/// LogicalPlan, but will error if the plans represent DDL such as | ||
/// `CREATE TABLE` | ||
/// | ||
/// Use [`Self::dataframe`] to run plans with DDL | ||
pub fn dataframe_without_ddl(&self, plan: LogicalPlan) -> Result<DataFrame> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the API to support #4720 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This appears to just be DataFrame new, do we really need this? |
||
Ok(DataFrame::new(self.state(), plan)) | ||
} | ||
|
||
// return an empty dataframe | ||
fn return_empty_dataframe(&self) -> Result<DataFrame> { | ||
let plan = LogicalPlanBuilder::empty(false).build()?; | ||
|
@@ -559,11 +573,9 @@ impl SessionContext { | |
} | ||
Ok(false) | ||
} | ||
/// Creates a logical plan. | ||
/// | ||
/// This function is intended for internal use and should not be called directly. | ||
#[deprecated(note = "Use SessionContext::sql which snapshots the SessionState")] | ||
pub fn create_logical_plan(&self, sql: &str) -> Result<LogicalPlan> { | ||
|
||
/// Creates a [`LogicalPlan`] from a SQL query. | ||
pub fn plan_sql(&self, sql: &str) -> Result<LogicalPlan> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again this appears to just call DFParser followed by the query planner There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is also problematic for the same reason that create_logical_plan is problematic - it returns a LogicalPlan without any mechanism to optimise/execute against the same state There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes that is exactly what it does. There needs to be some way for users to create a LogicalPlan and get datafusion to optimize and run it properly it (e.g. if the user makes the LogicalPlan directly from their own query language such as influxrpc or VegaFusion) |
||
let mut statements = DFParser::parse_sql(sql)?; | ||
|
||
if statements.len() != 1 { | ||
|
@@ -1000,32 +1012,24 @@ impl SessionContext { | |
} | ||
|
||
/// Optimizes the logical plan by applying optimizer rules. | ||
pub fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> { | ||
self.state.read().optimize(plan) | ||
#[deprecated( | ||
note = "Use `SessionContext::dataframe_without_ddl` and `DataFrame::into_optimized_plan`" | ||
)] | ||
pub fn optimize(&self, plan: LogicalPlan) -> Result<LogicalPlan> { | ||
self.dataframe_without_ddl(plan)?.into_optimized_plan() | ||
} | ||
|
||
/// Creates a physical plan from a logical plan. | ||
/// Creates a physical [`ExecutionPlan`] from a [`LogicalPlan`]. | ||
#[deprecated( | ||
note = "Use `SessionContext::::dataframe_without_ddl` and `DataFrame::create_physical_plan`" | ||
)] | ||
pub async fn create_physical_plan( | ||
&self, | ||
logical_plan: &LogicalPlan, | ||
logical_plan: LogicalPlan, | ||
) -> Result<Arc<dyn ExecutionPlan>> { | ||
let state_cloned = { | ||
let mut state = self.state.write(); | ||
state.execution_props.start_execution(); | ||
|
||
// We need to clone `state` to release the lock that is not `Send`. We could | ||
// make the lock `Send` by using `tokio::sync::Mutex`, but that would require to | ||
// propagate async even to the `LogicalPlan` building methods. | ||
// Cloning `state` here is fine as we then pass it as immutable `&state`, which | ||
// means that we avoid write consistency issues as the cloned version will not | ||
// be written to. As for eventual modifications that would be applied to the | ||
// original state after it has been cloned, they will not be picked up by the | ||
// clone but that is okay, as it is equivalent to postponing the state update | ||
// by keeping the lock until the end of the function scope. | ||
state.clone() | ||
}; | ||
|
||
state_cloned.create_physical_plan(logical_plan).await | ||
self.dataframe_without_ddl(logical_plan)? | ||
.create_physical_plan() | ||
.await | ||
} | ||
|
||
/// Executes a query and writes the results to a partitioned CSV file. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1097,15 +1097,15 @@ impl DefaultPhysicalPlanner { | |
// TABLE" -- it must be handled at a higher level (so | ||
// that the appropriate table can be registered with | ||
// the context) | ||
Err(DataFusionError::Internal( | ||
Err(DataFusionError::Plan( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are not really "internal errors" as they can be triggered by trying to run a sql query that contains DDL There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Imo this is a footgun we should aim to remove, I have a plan There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #4721 (comment) are the key usecases -- as long as they are possible / easy / well documented it will be great |
||
"Unsupported logical plan: CreateExternalTable".to_string(), | ||
)) | ||
} | ||
LogicalPlan::Prepare(_) => { | ||
// There is no default plan for "PREPARE" -- it must be | ||
// handled at a higher level (so that the appropriate | ||
// statement can be prepared) | ||
Err(DataFusionError::Internal( | ||
Err(DataFusionError::Plan( | ||
"Unsupported logical plan: Prepare".to_string(), | ||
)) | ||
} | ||
|
@@ -1114,7 +1114,7 @@ impl DefaultPhysicalPlanner { | |
// It must be handled at a higher level (so | ||
// that the schema can be registered with | ||
// the context) | ||
Err(DataFusionError::Internal( | ||
Err(DataFusionError::Plan( | ||
"Unsupported logical plan: CreateCatalogSchema".to_string(), | ||
)) | ||
} | ||
|
@@ -1123,7 +1123,7 @@ impl DefaultPhysicalPlanner { | |
// It must be handled at a higher level (so | ||
// that the schema can be registered with | ||
// the context) | ||
Err(DataFusionError::Internal( | ||
Err(DataFusionError::Plan( | ||
"Unsupported logical plan: CreateCatalog".to_string(), | ||
)) | ||
} | ||
|
@@ -1132,7 +1132,7 @@ impl DefaultPhysicalPlanner { | |
// It must be handled at a higher level (so | ||
// that the schema can be registered with | ||
// the context) | ||
Err(DataFusionError::Internal( | ||
Err(DataFusionError::Plan( | ||
"Unsupported logical plan: CreateMemoryTable".to_string(), | ||
)) | ||
} | ||
|
@@ -1141,7 +1141,7 @@ impl DefaultPhysicalPlanner { | |
// It must be handled at a higher level (so | ||
// that the schema can be registered with | ||
// the context) | ||
Err(DataFusionError::Internal( | ||
Err(DataFusionError::Plan( | ||
"Unsupported logical plan: DropTable".to_string(), | ||
)) | ||
} | ||
|
@@ -1150,7 +1150,7 @@ impl DefaultPhysicalPlanner { | |
// It must be handled at a higher level (so | ||
// that the schema can be registered with | ||
// the context) | ||
Err(DataFusionError::Internal( | ||
Err(DataFusionError::Plan( | ||
"Unsupported logical plan: DropView".to_string(), | ||
)) | ||
} | ||
|
@@ -1159,16 +1159,16 @@ impl DefaultPhysicalPlanner { | |
// It must be handled at a higher level (so | ||
// that the schema can be registered with | ||
// the context) | ||
Err(DataFusionError::Internal( | ||
Err(DataFusionError::Plan( | ||
"Unsupported logical plan: CreateView".to_string(), | ||
)) | ||
} | ||
LogicalPlan::SetVariable(_) => { | ||
Err(DataFusionError::Internal( | ||
Err(DataFusionError::Plan( | ||
"Unsupported logical plan: SetVariable must be root of the plan".to_string(), | ||
)) | ||
} | ||
LogicalPlan::Explain(_) => Err(DataFusionError::Internal( | ||
LogicalPlan::Explain(_) => Err(DataFusionError::Plan( | ||
"Unsupported logical plan: Explain must be root of the plan".to_string(), | ||
)), | ||
LogicalPlan::Analyze(a) => { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the new pattern to get an optimized plan (rather than calling
ctx.optimize
directly