Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate DDL / Catalog manipulation LogicalPlans (refactor) #3349

Closed
Tracked by #5291
alamb opened this issue Sep 3, 2022 · 3 comments
Closed
Tracked by #5291

Consolidate DDL / Catalog manipulation LogicalPlans (refactor) #3349

alamb opened this issue Sep 3, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Sep 3, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I would like to more cleanly separate the DDL (data definition) statements supported by DataFusion from the query execution.

Specifically, DataFusion offers basic support for "catalog" like operations such as registering external tables (CREATE EXTERNAL TABLE) , views (CREATE VIEW) and others. Some systems (like datafusion-cli or the tpch-benchmarks) use this basic support and others such as https://github.com/influxdata/influxdb_iox use DataFusion to query tables and views that are defined elsewhere.

The current support for catalog like operations is implemented as individual LogicalPlan variants such as LogicalPlan::CreateExternalTable

https://github.com/apache/arrow-datafusion/blob/5621e3bbd050eeb79646240ec0a09426badfa162/datafusion/expr/src/logical_plan/plan.rs#L77-L90

This results in two potential issues:

  1. Adding support for a new statement (such as DROP TABLE, as in implement drop view #3267) results in a bunch of boiler plate that may not be needed
  2. It is non-trivial to understand how to configure DataFusion so that it does not support CREATE VIEW or CREATE TABLE if that is not desired ( because, e.g. it would be a security hole - see this test):

Describe the solution you'd like
I would like to consolidate LogicalPlan::Create* and LogicalPlan::Drop* variants into a single LogicalPlan::DDL variant, and make a new DDLStatement enum like

enum LogicalPlan { 
...
  DDL(DDLStatement),
...
}

enum DDLStatement {
  CreateTable(CreateTable),
  DropTable(DroptTable),
  ..
}

Describe alternatives you've considered
We can not make any changes (as this change would result in some API churn)

Additional context
See #3267 (comment)

@alamb alamb added the enhancement New feature or request label Sep 3, 2022
@alamb alamb changed the title Consolidate DDL LogicalPlans Consolidate DDL LogicalPlans (refactor) Sep 3, 2022
@tustvold
Copy link
Contributor

tustvold commented Jan 3, 2023

Another option here would be to remove such DDL from LogicalPlan entirely, and instead only surface DDL on datafusion_sql::Statement.

The interface for QueryPlanner could then be changed to use sql_parser::Query instead of sql_parser::Statement.

I'll have a play and see what I can come up with

@alamb alamb changed the title Consolidate DDL LogicalPlans (refactor) Consolidate DDL / Catalog manipulation LogicalPlans (refactor) Feb 15, 2023
@avantgardnerio
Copy link
Contributor

remove such DDL from LogicalPlan entirely

FWIW, doing so would undo the motivation behind the PR that kicked off this discussion. I think if we have:

LogicalPlan::DDL
LogicalPlan::DML(Insert(DQL), Update(DQL), Delete(DQL))
LogicalPlan::DQL(project, filter, etc)

Then the planner can still do things like optimize DML, replace placeholder tokens in DML/DQL, etc.

It also makes it so there is a consistent path in execution engines:

AST -> LogicalPlan -> PhysicalPlan

vs

AST -> [LogicalPlan, Statement] -> [query plan executor, statement executor]

however, if we merely group them as mentioned in this issue, then we can have the existing optimizer code focus on LogicalPlan::DQL and the rest of the code can keep functioning as it is.

Just my $0.02, interested to hear how this affects others...

@alamb
Copy link
Contributor Author

alamb commented Jan 5, 2024

@alamb alamb closed this as completed Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants