Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to return LogicalPlan by value from TableProvider::get_logical_plan #12113

Merged

Conversation

askalt
Copy link
Contributor

@askalt askalt commented Aug 22, 2024

This patch adds an ability to return a LogicalPlan by value from a TableProvider.

Which issue does this PR close?

Closes #12120

Rationale for this change

Let's assume that we have TableProvider that supports concurrent modifications of the inner logical plan (for example, mutable VIEW), and it cannot afford to provide a reference.

What changes are included in this PR?

get_logical_plan(...) method signature was changed. Now it returns a Cow<LogicalPlan>.

Are there any user-facing changes?

Need to update the get_logical_plan(...) implementations to return Cow::Borrowed instead of the reference.

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules catalog Related to the catalog crate labels Aug 22, 2024
@askalt askalt force-pushed the askalt/add-possibility-to-give-cloned-plan branch from 1b81b73 to c1f3eb2 Compare August 22, 2024 16:05
@askalt askalt marked this pull request as ready for review August 22, 2024 16:06
Copy link
Contributor

@edmondop edmondop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@askalt why is the change to inline_table_scan required?

I wonder if we want to open an issue first, discuss the use case, and then proceed with implementing it. Or maybe we do already have an issue opened, and in that case, can you link it?

@askalt
Copy link
Contributor Author

askalt commented Aug 22, 2024

@askalt why is the change to inline_table_scan required?

I wonder if we want to open an issue first, discuss the use case, and then proceed with implementing it. Or maybe we do already have an issue opened, and in that case, can you link it?

I has created the issue where the context is described. Please, see #12120

.map(Transformed::yes)
// during the early stage of planning.
LogicalPlan::TableScan(table_scan) if table_scan.filters.is_empty() => {
if let Some(sub_plan) = table_scan.source.get_logical_plan_cloned() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about this change, there seems to be two different things happening:

  • Using de-structuring rather than unwrap, which seems useful (but that should be possible even without get_logical_plan_cloned
  • Using an owned LogicalPlan.

It seems in this case there is effectively an extra clone that wasn't present before (inside get_logical_plan_cloned), can you explain why this is necessary?

Copy link
Contributor Author

@askalt askalt Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a clone here:

LogicalPlanBuilder::from(sub_plan.clone())

Now this clone happens in get_logical_plan_cloned(...). De-structuring does not create new clone.

Here is example how it works before/after patch:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=173e55e34ea4d772159a4a3e4a696f86

What am I missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That clone is still there at line 64, the extra clone I am talking about is the one that happens because of the invocation at line 61. Or am I reading the code wrongly?

Copy link
Contributor Author

@askalt askalt Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, got it! Fixed (was a typo).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though you require get_logical_plan_cloned for some reason, I don't get it why do we need to change the code here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to have an example in Datafusion as well. Having a method but not used at all is quite confusing to maintain -- what is it used for and whether it could be deprecated. With the example, we could also evolve with the better idea / interface gradually and could also avoid ending up with plenty of specialized method that is only used in specific downstream project but not generally useful to most of the implementation.

Btw, If you need to mutate plan concurrently, I think what you need is something like fn update_logical_plan() instead of returning a cloned 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the following logic:
Now TableProvider has a method get_logical_plan that requires reference returning. It makes impossible to implement TableProvider that can mutate inner plan, because it can't return a shared reference.

But, there are optimizations that don't require reference to the inner logical plan (for example, plan inlining) and they can do with a clone.

So, why should the available API limit the possible implementations? It looks like immutability of the plan is inherent in the design of the TableProvider. If this is the case, then let me know, I will choose another implementation method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some research and it seems to me like this is the only use of get_logical_plan

https://github.com/search?q=repo%3Aapache%2Fdatafusion%20get_logical_plan&type=code

So in other words, if we make this change there will be no more uses of get_logical_plan (in the current codebase)

What would we think about just changing get_logical_plan to return Option<LogicalPlan> ? rather than adding a new API? This would require cloning for anyone who already uses this API but I think it would make more sense if the primary usecase of the API is to return a view definition.

We could also potentially make a more complicated API that would avoid cloning always like

fn get_logical_plan(&self) -> Cow<LogicalPlan> { .. }

So that the table provider could determine if it wanted to copy the node or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could change get_logical_plan. Either Option<LogicalPlan> or Cow<LogicalPlan> both looks good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! The patch is updated using Cow.

@askalt askalt force-pushed the askalt/add-possibility-to-give-cloned-plan branch from c1f3eb2 to 4065263 Compare August 22, 2024 20:03
Copy link
Contributor

@edmondop edmondop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, please wait a review from @alamb

@askalt askalt force-pushed the askalt/add-possibility-to-give-cloned-plan branch from 4065263 to da3ce29 Compare August 25, 2024 09:05
@github-actions github-actions bot added the core Core DataFusion crate label Aug 25, 2024
This patch changes the `get_logical_plan(...)` method signature.
Now it returns a `Cow<LogicalPlan>` to allow an implementation to return
plan by value.
@askalt askalt force-pushed the askalt/add-possibility-to-give-cloned-plan branch from da3ce29 to 135fdcb Compare August 25, 2024 09:33
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @askalt @edmondop and @jayzhan211

// during the early stage of planning.
LogicalPlan::TableScan(table_scan) if table_scan.filters.is_empty() => {
if let Some(sub_plan) = table_scan.source.get_logical_plan() {
let sub_plan = sub_plan.into_owned();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb added the api change Changes the API exposed to users of the crate label Aug 25, 2024
@alamb alamb changed the title Add ability to return LogicalPlan by value from TableProvider Add ability to return LogicalPlan by value from TableProvider::get_logical_plan Aug 25, 2024
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor

alamb commented Aug 26, 2024

🚀

@alamb alamb merged commit da3f6af into apache:main Aug 26, 2024
26 checks passed
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this pull request Sep 10, 2024
The (singular) `TableSource::supports_filter_pushdown` was removed upstream. [1]
I simply moved the method to the private `SqlTableSource::supports_filter_pushdown`
because it is still used by the (plural) `TableSource::supports_filters_pushdown`.

Additionally, the return type of `TableSource::get_logical_plan` was changed to Option<Cow<_>> upstream. [2]

Refs
[1]: apache/datafusion#12239
[2]: apache/datafusion#12113
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this pull request Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate catalog Related to the catalog crate core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the possibility to work with plan clones directly in inline_table_scan
4 participants