-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to return LogicalPlan
by value from TableProvider::get_logical_plan
#12113
Add ability to return LogicalPlan
by value from TableProvider::get_logical_plan
#12113
Conversation
1b81b73
to
c1f3eb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@askalt why is the change to inline_table_scan
required?
I wonder if we want to open an issue first, discuss the use case, and then proceed with implementing it. Or maybe we do already have an issue opened, and in that case, can you link it?
I has created the issue where the context is described. Please, see #12120 |
.map(Transformed::yes) | ||
// during the early stage of planning. | ||
LogicalPlan::TableScan(table_scan) if table_scan.filters.is_empty() => { | ||
if let Some(sub_plan) = table_scan.source.get_logical_plan_cloned() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about this change, there seems to be two different things happening:
- Using de-structuring rather than unwrap, which seems useful (but that should be possible even without
get_logical_plan_cloned
- Using an owned
LogicalPlan
.
It seems in this case there is effectively an extra clone that wasn't present before (inside get_logical_plan_cloned
), can you explain why this is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a clone here:
LogicalPlanBuilder::from(sub_plan.clone()) |
Now this clone happens in get_logical_plan_cloned(...)
. De-structuring does not create new clone.
Here is example how it works before/after patch:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=173e55e34ea4d772159a4a3e4a696f86
What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That clone is still there at line 64, the extra clone I am talking about is the one that happens because of the invocation at line 61. Or am I reading the code wrongly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, got it! Fixed (was a typo).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though you require get_logical_plan_cloned
for some reason, I don't get it why do we need to change the code here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to have an example in Datafusion as well. Having a method but not used at all is quite confusing to maintain -- what is it used for and whether it could be deprecated. With the example, we could also evolve with the better idea / interface gradually and could also avoid ending up with plenty of specialized method that is only used in specific downstream project but not generally useful to most of the implementation.
Btw, If you need to mutate plan concurrently, I think what you need is something like fn update_logical_plan()
instead of returning a cloned 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the following logic:
Now TableProvider has a method get_logical_plan
that requires reference returning. It makes impossible to implement TableProvider that can mutate inner plan, because it can't return a shared reference.
But, there are optimizations that don't require reference to the inner logical plan (for example, plan inlining) and they can do with a clone.
So, why should the available API limit the possible implementations? It looks like immutability of the plan is inherent in the design of the TableProvider. If this is the case, then let me know, I will choose another implementation method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some research and it seems to me like this is the only use of get_logical_plan
https://github.com/search?q=repo%3Aapache%2Fdatafusion%20get_logical_plan&type=code
So in other words, if we make this change there will be no more uses of get_logical_plan
(in the current codebase)
What would we think about just changing get_logical_plan
to return Option<LogicalPlan>
? rather than adding a new API? This would require cloning for anyone who already uses this API but I think it would make more sense if the primary usecase of the API is to return a view definition.
We could also potentially make a more complicated API that would avoid cloning always like
fn get_logical_plan(&self) -> Cow<LogicalPlan> { .. }
So that the table provider could determine if it wanted to copy the node or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could change get_logical_plan
. Either Option<LogicalPlan>
or Cow<LogicalPlan>
both looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! The patch is updated using Cow
.
c1f3eb2
to
4065263
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, please wait a review from @alamb
4065263
to
da3ce29
Compare
This patch changes the `get_logical_plan(...)` method signature. Now it returns a `Cow<LogicalPlan>` to allow an implementation to return plan by value.
da3ce29
to
135fdcb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @askalt @edmondop and @jayzhan211
// during the early stage of planning. | ||
LogicalPlan::TableScan(table_scan) if table_scan.filters.is_empty() => { | ||
if let Some(sub_plan) = table_scan.source.get_logical_plan() { | ||
let sub_plan = sub_plan.into_owned(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
LogicalPlan
by value from TableProvider
LogicalPlan
by value from TableProvider::get_logical_plan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
🚀 |
The (singular) `TableSource::supports_filter_pushdown` was removed upstream. [1] I simply moved the method to the private `SqlTableSource::supports_filter_pushdown` because it is still used by the (plural) `TableSource::supports_filters_pushdown`. Additionally, the return type of `TableSource::get_logical_plan` was changed to Option<Cow<_>> upstream. [2] Refs [1]: apache/datafusion#12239 [2]: apache/datafusion#12113
This patch adds an ability to return a
LogicalPlan
by value from aTableProvider
.Which issue does this PR close?
Closes #12120
Rationale for this change
Let's assume that we have
TableProvider
that supports concurrent modifications of the inner logical plan (for example, mutable VIEW), and it cannot afford to provide a reference.What changes are included in this PR?
get_logical_plan(...)
method signature was changed. Now it returns aCow<LogicalPlan>
.Are there any user-facing changes?
Need to update the
get_logical_plan(...)
implementations to returnCow::Borrowed
instead of the reference.