Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add library guide for table provider and catalog providers #7287

Merged
merged 13 commits into from
Aug 16, 2023

Conversation

tshauck
Copy link
Contributor

@tshauck tshauck commented Aug 15, 2023

Which issue does this PR close?

Closes # Related to #7014

This won't finish #7014, but adds the skeleton described in that issue as well as content for the custom table provider and catalog. I can also add some info on Scalar UDFs, but am not sure I know enough about the others to write something w/o a summary to expand on.

Hope this is what you'all are looking for, certainly open to feedback.

Finally, I'm getting some unrelated (I think) warnings, are these known?

/Users/thauck/personal/code/github.com/tshauck/arrow-datafusion/docs/temp/user-guide/example-usage.md:24: WARNING: 'myst' cross-reference target not found: '../../../datafusion-examples' [myst.xref_missing]
/Users/thauck/personal/code/github.com/tshauck/arrow-datafusion/docs/temp/user-guide/example-usage.md:211: WARNING: Pygments lexer name 'rust,ignore' is not known
/Users/thauck/personal/code/github.com/tshauck/arrow-datafusion/docs/temp/user-guide/sql/data_types.md:31: WARNING: Could not lex literal_block '❯ select arrow_typeof(interval \'1 month\');\n+-------------------------------------+\n| arrowtypeof(IntervalYearMonth("1")) |\n+-------------------------------------+\n| Interval(YearMonth)                 |\n+-------------------------------------+\n' as "sql". Highlighting skipped.
...

Rationale for this change

It was requested in #7014. I also would've found this useful for exon.

Are these changes tested?

I locally generated the docs, per the README. Ex..

image image

Are there any user-facing changes?

Yes, this would update the docs users could read.

@tshauck tshauck changed the title Add guild for table provider Add library guild for table provider and catalog providers Aug 15, 2023
@tshauck tshauck changed the title Add library guild for table provider and catalog providers Add library guide for table provider and catalog providers Aug 15, 2023
@tshauck
Copy link
Contributor Author

tshauck commented Aug 15, 2023

I think the current failure is related to #7286... I'll rebase and relook when it's merged.

@alamb
Copy link
Contributor

alamb commented Aug 15, 2023

Thank you @tshauck -- I will review this later today

@tshauck tshauck force-pushed the add-guild-for-table-provider branch from 3ff3acd to 090b42e Compare August 15, 2023 18:07
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tshauck -- the library guide is awesome

I left some suggestions, but I also think this PR can be merged as is. Really 👏 thank you for starting this skeleton

cc @MrPowers -- I think this is very much in the spirit of what you and I spoke about earlier this summer for a guide.


# Adding User Defined Functions: Scalar/Window/Aggregate

Coming Soon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as part of merging this PR I can file some subtasks to fill out these other sections.


As mentioned, `scan` returns an execution plan, and in particular a `Result<Arc<dyn ExecutionPlan>>`. The core of this is returning something that can be dynamically dispatched to an `ExecutionPlan`. And as per the general DataFusion idea, we'll need to implement it.

#### Execution Plan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the docs/source/library-user-guide/execution-plans.md section should direct here as it has an example of execution plans 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execution-plans.md looks like this now:

image

- `FileFormat` - a trait for reading a file format
- `ListingTableProvider` - a useful trait for implementing a `TableProvider` that lists files in a directory

[ex]: https://github.com/apache/arrow-datafusion/blob/a5e86fae3baadbd99f8fd0df83f45fde22f7b0c6/datafusion-examples/examples/custom_datasource.rs#L214C1-L276
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving links to the docs.rs versions might be more maintainable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, not sure how to do this. I tried https://docs.rs/crate/datafusion/latest/source/, but the examples aren't there from what I can tell.

@tshauck
Copy link
Contributor Author

tshauck commented Aug 15, 2023

Thanks for all the feedback! I'll work through it later this evening and follow up on anything.

@tshauck
Copy link
Contributor Author

tshauck commented Aug 16, 2023

Thanks for the review, I think I got to the feedback. I can try to do the UDF page, but I don't have enough experience with the other areas to provide good info (though I'd like to read the one on using exprs 😄 ).

Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a good job and start the guide to user and teach user to use the datafusion in their project as a lib

LGTM

@alamb
Copy link
Contributor

alamb commented Aug 16, 2023

Great work -- thanks again @tshauck -- I will file some additional tickets for the "Coming Soon" sections as well

@alamb alamb merged commit 3da9192 into apache:main Aug 16, 2023
@tshauck tshauck deleted the add-guild-for-table-provider branch August 16, 2023 14:47
@alamb
Copy link
Contributor

alamb commented Aug 16, 2023

Follow on tickets filed: #7014 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants