[Feature Request]Support jdbc catalog #1459

melin · 2022-10-26T14:43:15Z

Reference iceberg jdbc catalog：https://iceberg.apache.org/docs/latest/jdbc/

Store metadata directly into a relational database, independent of hms. It can also be customized based on the jdbc catalog
@zsxwing

zsxwing · 2022-10-27T05:19:51Z

Delta by design stores its metadata on the storage. Could you explain why you want to move the metadata to a relational database?

melin · 2022-10-27T05:46:41Z

Delta by design stores its metadata on the storage. Could you explain why you want to move the metadata to a relational database?

Only the table name and location are stored in the relational database, and the other metadata is stored on the storage system.
Iceberg storage： https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java

Iceberg can also customize the catalog, such as the jdbc catalog, to store the table name in a customized Catalog System.

zsxwing · 2022-10-27T06:40:38Z

Adding a new catalog implementation is not a mission of Delta Lake. We would like to focus on the storage format and integrate with popular catalog systems (such as Hive Metastore, AWS Glue) instead.

melin · 2022-10-27T06:44:59Z

Adding a new catalog implementation is not a mission of Delta Lake. We would like to focus on the storage format and integrate with popular catalog systems (such as Hive Metastore, AWS Glue) instead.

Delta does not provide a concrete implementation, but provides an interface that the user can customize.

zsxwing · 2022-10-27T06:55:51Z

Delta does not provide a concrete implementation, but provides an interface that the user can customize.

I think Spark has already provided an interface for custom catalog implementation.

I'm not super familiar with Iceberg. But I think Iceberg introduced this because catalog is a fundamental concept in Iceberg and Iceberg is heavily coupled with catalog. Delta has a different design principle and it decouples from catalog. Adding a catalog interface to Delta would break our design principle.

melin · 2022-10-27T07:27:55Z

Delta does not provide a concrete implementation, but provides an interface that the user can customize.

I think Spark has already provided an interface for custom catalog implementation.

I'm not super familiar with Iceberg. But I think Iceberg introduced this because catalog is a fundamental concept in Iceberg and Iceberg is heavily coupled with catalog. Delta has a different design principle and it decouples from catalog. Adding a catalog interface to Delta would break our design principle.

There is a scenario where multiple different delta tables are written to dell ecs storage. If the management table name, iceberg jdbc catalog can write the table name into the relational data and record the table location.
Want to know that delta has a solution?

zsxwing · 2022-10-27T16:26:01Z

Want to know that delta has a solution?

Catalog is the solution. For example, can you use Hive Metastore? Hive Metastore is just using relational databases.

melin · 2022-10-28T01:45:59Z

Want to know that delta has a solution?

Catalog is the solution. For example, can you use Hive Metastore? Hive Metastore is just using relational databases.

Relying on the hadoop ecosystem, using hms is a heavy solution. Direct jdbc storage is simpler.

zsxwing · 2022-11-11T06:59:45Z

Relying on the hadoop ecosystem, using hms is a heavy solution. Direct jdbc storage is simpler.

Totally agree that hms is heavier. But it's a de facto standard. In addition, Delta mostly just relies on Spark's catalog APIs. You can implement Spark's APIs and just use Delta Lake with that.

melin · 2022-11-17T11:04:41Z

Relying on the hadoop ecosystem, using hms is a heavy solution. Direct jdbc storage is simpler.

Totally agree that hms is heavier. But it's a de facto standard. In addition, Delta mostly just relies on Spark's catalog APIs. You can implement Spark's APIs and just use Delta Lake with that.

hudi is also developing the jdbc catalog

dennyglee · 2022-11-18T07:40:25Z

While Iceberg and Hudi are developing a JDBC catalog, this is because they rely on the catalog for their metadata. As @zsxwing noted, Delta does not require a catalog for its metadata, and there are architectural advantages to this approach. In addition to using the Spark APIs, you can also Delta Standalone (Scala/Java), Delta Rust (or delta-rs), and/or Delta-python (delta.rs python bindings) to query the metadata.

melin added the enhancement New feature or request label Oct 26, 2022

melin mentioned this issue Oct 27, 2022

Roadmap 2022 H2 (discussion) #1307

Open

melin closed this as completed Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]Support jdbc catalog #1459

[Feature Request]Support jdbc catalog #1459

melin commented Oct 26, 2022 •

edited

Loading

zsxwing commented Oct 27, 2022

melin commented Oct 27, 2022 •

edited

Loading

zsxwing commented Oct 27, 2022

melin commented Oct 27, 2022

zsxwing commented Oct 27, 2022

melin commented Oct 27, 2022 •

edited

Loading

zsxwing commented Oct 27, 2022

melin commented Oct 28, 2022 •

edited

Loading

zsxwing commented Nov 11, 2022

melin commented Nov 17, 2022

dennyglee commented Nov 18, 2022

[Feature Request]Support jdbc catalog #1459

[Feature Request]Support jdbc catalog #1459

Comments

melin commented Oct 26, 2022 • edited Loading

zsxwing commented Oct 27, 2022

melin commented Oct 27, 2022 • edited Loading

zsxwing commented Oct 27, 2022

melin commented Oct 27, 2022

zsxwing commented Oct 27, 2022

melin commented Oct 27, 2022 • edited Loading

zsxwing commented Oct 27, 2022

melin commented Oct 28, 2022 • edited Loading

zsxwing commented Nov 11, 2022

melin commented Nov 17, 2022

dennyglee commented Nov 18, 2022

melin commented Oct 26, 2022 •

edited

Loading

melin commented Oct 27, 2022 •

edited

Loading

melin commented Oct 27, 2022 •

edited

Loading

melin commented Oct 28, 2022 •

edited

Loading