-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]Support jdbc catalog #1459
Comments
Delta by design stores its metadata on the storage. Could you explain why you want to move the metadata to a relational database? |
Only the table name and location are stored in the relational database, and the other metadata is stored on the storage system. Iceberg can also customize the catalog, such as the jdbc catalog, to store the table name in a customized Catalog System. |
Adding a new catalog implementation is not a mission of Delta Lake. We would like to focus on the storage format and integrate with popular catalog systems (such as Hive Metastore, AWS Glue) instead. |
Delta does not provide a concrete implementation, but provides an interface that the user can customize. |
I think Spark has already provided an interface for custom catalog implementation. I'm not super familiar with Iceberg. But I think Iceberg introduced this because catalog is a fundamental concept in Iceberg and Iceberg is heavily coupled with catalog. Delta has a different design principle and it decouples from catalog. Adding a catalog interface to Delta would break our design principle. |
There is a scenario where multiple different delta tables are written to dell ecs storage. If the management table name, iceberg jdbc catalog can write the table name into the relational data and record the table location. |
Catalog is the solution. For example, can you use Hive Metastore? Hive Metastore is just using relational databases. |
Relying on the hadoop ecosystem, using hms is a heavy solution. Direct jdbc storage is simpler. |
Totally agree that hms is heavier. But it's a de facto standard. In addition, Delta mostly just relies on Spark's catalog APIs. You can implement Spark's APIs and just use Delta Lake with that. |
hudi is also developing the jdbc catalog |
While Iceberg and Hudi are developing a JDBC catalog, this is because they rely on the catalog for their metadata. As @zsxwing noted, Delta does not require a catalog for its metadata, and there are architectural advantages to this approach. In addition to using the Spark APIs, you can also Delta Standalone (Scala/Java), Delta Rust (or delta-rs), and/or Delta-python (delta.rs python bindings) to query the metadata. |
Reference iceberg jdbc catalog:https://iceberg.apache.org/docs/latest/jdbc/
Store metadata directly into a relational database, independent of hms. It can also be customized based on the jdbc catalog
@zsxwing
The text was updated successfully, but these errors were encountered: