-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement TableProvider for DataFrameImpl #1699
Merged
Merged
Changes from 13 commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
73bbc50
Add TableProvider impl for DataFrameImpl
cpcloud 5c70f81
Add physical plan in
cpcloud a2614e1
Clean up plan construction and names construction
cpcloud 3ee3736
Remove duplicate comments
cpcloud f174257
Remove unused parameter
cpcloud b4f298e
Add test
cpcloud 638affc
Remove duplicate limit comment
cpcloud c3a3ca7
Use cloned instead of individual clone
cpcloud b5400ae
Reduce the amount of code to get a schema
cpcloud 3b5fd3b
Add comments to test
cpcloud 847f7ab
Fix plan comparison
cpcloud 75ee10b
Compare only the results of execution
cpcloud 463bf6a
Remove println
cpcloud 4c44de8
Refer to df_impl instead of table in test
cpcloud 64b6fff
Fix the register_table test to use the correct result set for comparison
cpcloud 4cc7ac5
Consolidate group/agg exprs
cpcloud 4572472
Format
cpcloud 7c456a9
Remove outdated comment
cpcloud File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we can add
TableProvider
as one of the trait bounds for theDataframe
trait so the trait object can be used to register tables without casting to a concrete type.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example of what you mean?
It won't be possible to register
Arc<dyn DataFrame>
s directly even ifDataFrame
has a trait bound ofTableProvider
, because that would require trait upcasting IIUC, which is still an unstable feature.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @houqp was suggesting changing
To something like
And then
impl TableProvider
forDataFrame
(rather thanDataFrameImpl
) -- which would mean that any&dyn DataFrame
could be used as a table providerIf this is interesting, I am happy to file a ticket describing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I don't think you can impl a trait for a trait (since the introduction of the
dyn
requirement) but we might be able to do a blanket impl likeAlong with the trait bound, I think this will work to allow any DataFrame implementation to be used as a table provider.
The main question for me is: given that the
scan
method returns a physical plan, do I need to add a new DataFrame method to get a physical plan? Or is there another way to get that with existing APIs?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I may have a prototype, I'll push something up here in a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any way to achieve this behavior
without
trait_upcasting
😞, even with a blanketimpl
plus an additional trait bound. I'll put up a draft PR.