Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CREATE VIEW #2279

Merged
merged 9 commits into from
May 11, 2022
Merged

Add CREATE VIEW #2279

merged 9 commits into from
May 11, 2022

Conversation

matthewmturner
Copy link
Contributor

Which issue does this PR close?

Closes #1740

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Apr 20, 2022
@matthewmturner
Copy link
Contributor Author

My idea here is to create a new ViewTable struct (similar concept to how MemTable is used but obviously not in memory) which of course would have TableType::View and we just use that with the existing register_table functionality within ExecutionContext / SchemaProvider. @alamb does that sound good from your end?

@alamb
Copy link
Contributor

alamb commented Apr 21, 2022

Thanks @matthewmturner -- I'll try and give it a look tomorrow

@matthewmturner
Copy link
Contributor Author

@alamb I actually hadn't even made it to the point of implementing that yet - just wanted to see if conceptually you thought that was the right approach. Wasn't expecting you to review anything yet. If you aren't sure and it would require you looking into it I can just give it a shot and get back to you. I don't want you to have to do unnecessary work.

@alamb
Copy link
Contributor

alamb commented Apr 22, 2022

Sorry for the late review @matthewmturner

At a high level, I would expect a VIEW to be represented by a query -- maybe as a SQL string / parsed Query or perhaps a LogicalPlan

So let's say we have a query like this

select sum(y) from T where a > 5 group by a;

If T is a table, I would expect the plan to look like

GroupBy (gby a, sum(y))
  Filter(a > 5)
    TableScan T

If T is a view,

create view T as select  a, y from bar where y > 10000

I would expect the plan to look like:

GroupBy (gby a, sum(y))
  Filter(a > 5)
    Project (a, y) <-- the LogicalPlan for the View is pasted in here
      Filter (y > 10000)
       TableScan bar

The question then becomes how do you want to get the LogicalPlan for the view when it is referenced. Storing a SQL string might be the simplest, but I am not sure how that would work with the DataFrame API

But then again, I am not sure a view makes sense in the context of a dataframe -- the user would just clone the DataFrame (which is a wrapper around a LogicalPlan 🤔 )

@xudong963 xudong963 added the enhancement New feature or request label Apr 23, 2022
@matthewmturner
Copy link
Contributor Author

I apologize I have been quite busy lately and havent been able to continue my efforts on this. Im really hoping to get this in for 8.0 release, I will try to work on this over the weekend.

@alamb
Copy link
Contributor

alamb commented May 5, 2022

I apologize I have been quite busy lately ...

No worries at all - I totally understand!

@matthewmturner
Copy link
Contributor Author

@alamb would you mind checking this out and see if its in the right direction? I still have to resolve conflicts and update ballista but wanted to make sure this was at least getting close before proceeding.

@matthewmturner
Copy link
Contributor Author

matthewmturner commented May 10, 2022

@andygrove FYI I'm hoping to get this in for 8.0 release. ill be continuing my work on it tonight,

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking very much on the right track @matthewmturner 👍

Comment on lines 36 to 37
/// An implementation of `TableProvider` that uses the object store
/// or file system listing capability to get the list of files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation uses another LogicalPlan

filters: &[Expr],
limit: Option<usize>,
) -> Result<Arc<dyn ExecutionPlan>> {
self.context.create_physical_plan(&self.logical_plan).await
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


assert_batches_eq!(expected, &results);

let results = session_ctx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think more interesting tests might be to create a view and then run queries against the view (also adding things like filters to on the view

For example

select coumn1, count(colum2) from xyz where column3 > 2

@andygrove
Copy link
Member

I just took a quick look through this and LGTM ❤️

/// To create ExecutionPlan
context: SessionContext,
/// LogicalPlan of the view
logical_plan: LogicalPlan,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original SQL might be useful too to store, to preserve formatting when doing things like describe view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed that would be nice, do you think we could do it as a follow on PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, I'm new to datafusion and maybe I can help with the follow work ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 welcome @Veeupup . That would be great. I will try and write up a ticket later today that describes the work in some more detail in case that is helpful

Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2529

@Dandandan not sure if you had specific ideas on what command you wanted to show view definitions.

@matthewmturner matthewmturner marked this pull request as ready for review May 10, 2022 23:42
@matthewmturner
Copy link
Contributor Author

Ah - i believe the tpch tests use a published version of ballista. so i dont think we can run Q15 until our next release. let me know if im misunderstanding, i didnt have time to dig too far into it.

@andygrove
Copy link
Member

Ah - i believe the tpch tests use a published version of ballista. so i dont think we can run Q15 until our next release. let me know if im misunderstanding, i didnt have time to dig too far into it.

q15 was definitely a stretch goal for this PR 😄 I will take a look

@andygrove
Copy link
Member

RAT is failing due to an empty file @ datafusion/core/src/physical_plan/view.rs

@matthewmturner
Copy link
Contributor Author

@andygrove thanks - i removed.

@matthewmturner
Copy link
Contributor Author

@andygrove @alamb hopefully this is good now.

was i able to complete it in time for the 8.0 release?

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @matthewmturner

@matthewmturner
Copy link
Contributor Author

@Igosuki FYI

@andygrove andygrove merged commit 19d937a into apache:master May 11, 2022
@alamb alamb mentioned this pull request Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support CREATE/DROP/ALTER VIEW (and associated DataFrame functions)
6 participants