-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LogicalPlan::SubqueryAlias #2172
Conversation
@@ -518,6 +518,18 @@ impl LogicalPlanBuilder { | |||
}))) | |||
} | |||
|
|||
/// Apply an alias | |||
pub fn alias(&self, alias: &str) -> Result<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way that this same aliasing can be represented is a Projection
node
So if the input has schema with columns a
, b
, c
Then to implement alias("foo")
you could build a LogicalPlanNode
that was Project([a foo.a, b as foo.b, c as foo.c])
There may be some good reason to introduce a new type of LogicalPlanNode too that I don't understand, but I wanted to point out this alternative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, and I see you don't want to do that :) in the description
I wonder if you could use the DFSchema::relation
name for this purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need something to very specifically say that a table is being used as an alias. An easier change might be just to add an additional field to the TableScan
to record the original table name. I think I will put up a separate PR for that approach.
So to explain why I need this. I want to use DataFusion as a SQL parser and planner but I want to execute the query in a different engine. I can provide a TableProvider
so that DataFuision can get the schema for table person
and I get a logical plan. When I go to translate that plan to a physical plan, it refers to a table called peeps
(the alias) and I have no way to know the actual table name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the Spark plan for this use case:
'Project [*]
+- 'SubqueryAlias peeps
+- 'UnresolvedRelation [person], [], false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the approach taken in this PR, as it could handle subqueries rather than just table scans (as suggested by the name).
I think this is a cleaner, more generic approach.
Looks like in DataFusion, the logical Project node already support Alias. It should be similar to SparkSQL's SubqueryAlias.
|
@alamb @Dandandan @mingmwang I have fixed the test failures and renamed |
// take alias into account to support `JOIN table1 as table2` | ||
alias | ||
.as_ref() | ||
.map(|a| a.name.value.as_str()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like an improvement to me -- I think the projection pushdown code might benefit from an additional test, but I also think the existing coverage is likely sufficient.
@@ -432,6 +432,34 @@ fn optimize_plan( | |||
alias: alias.clone(), | |||
})) | |||
} | |||
LogicalPlan::SubqueryAlias(SubqueryAlias { input, alias, .. }) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth a test for projection pushdown specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Test added.
@@ -3458,7 +3458,8 @@ mod tests { | |||
let expected = "Projection: #person.first_name, #person.id\ | |||
\n Inner Join: Using #person.id = #person2.id\ | |||
\n TableScan: person projection=None\ | |||
\n TableScan: person2 projection=None"; | |||
\n SubqueryAlias: person2\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the appearance of SubqueryAlias
in this query is somewhat strange to me as there is no subquery in this SQL (maybe a better name is RelationAlias
or something)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally named this AliasedRelation
and then changed it to SubqueryAlias
because that is what Spark uses. I don't have a strong preference either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
me neither. Whatever you prefer is fine with me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's leave it as is for now since we will make use of it for subqueries (soon, hopefully).
I filed follow-on issues for use this approach for projection and union:
Thanks @andygrove |
Which issue does this PR close?
Closes ##2164.
Rationale for this change
I would like the logical plan to better represent aliased relations to help with translation to a different execution engine. Currently, original table names are discarded during SQL planning.
What changes are included in this PR?
Introduces new
SubqueryAlias
node inLogicalPlan
and updates the SQL planner to wrap this aroundTableScan
rather than just use the alias name in theTableScan
.Are there any user-facing changes?
The query plan will look different now.