-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved support for "User Defined Catalogs" #5291
Labels
enhancement
New feature or request
Comments
This was referenced Feb 15, 2023
I'm actually currently working on figuring out the catalog api and implementing a catalog for my own project. Would be happy to adapt some of my code into an example. |
That would be awesome @jaylmiller -- thank you very much |
We have an example now of catalog
Thus I think this work is done now |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I think it is a bit confusing now how to use DataFusion with a custom catalog.
Background
DataFusion is primarily a query engine, rather than a complete database system that also must handle persistence, catalog management, ingest, data lifecycle management, and other things.
Systems like Ballista or GreptimeDB are examples of complete systems that use DataFusion for query but have their own catalog implementations.
However, in order to function the query engine needs to read information catalog, and DataFusion provides a rich set of APIs such as the following
The query engine also knows how to plan for Catalog manipulations which often need planner support (e.g. to do type checking or coercion, etc)
Making things even more confusing is that DataFusion does have a basic ephemeral in-memory based catalog implementation, https://docs.rs/datafusion/18.0.0/datafusion/catalog/catalog/struct.MemoryCatalogList.html and the methods on SessionContext know how to modify that memory catalog.
Challenges
The interface and use between the built in catalog support and how to plug in an external catalog are not super clear. For example this PR #5277
Also, as projects like #5130 get under way it becomes even more important to distinguish between catalog manipulations and simply catalog read-only access
Another example is the fact that
SessionContext::sql
by default modifies the in memory catalog:https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html#method.sql
Describe the solution you'd like
I would like a clearer interface (or maybe just documentation) that makes it clear what manipulations are allowed and which are not, as well as an example that other people could follow to implement an external catalog. This interface should make it clear what the catalog supports and what it does not (aka does it allow creating new tables or views?)
To do this, I suggest:
This project might also help
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
N/A
The text was updated successfully, but these errors were encountered: