-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add generic Iceberg catalog adapter creation to Java / Python #5754
Conversation
We include the support libraries for REST and Glue catalogs but others will need the user to have the libraries in the class path. These are the catalogs that will need additional files:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious; how does pyiceberg deduce the catalog in the following example:
from pyiceberg.catalog import load_catalog
catalog = load_catalog(
"default",
**{
"uri": "http://rest:8181/",
"s3.endpoint": "http://minio:9000/",
"s3.access-key-id": "minioadmin",
"s3.secret-access-key": "minioadmin",
}
)
taxi_dataset = catalog.load_table("default.taxi_dataset").to_arrow()
Maybe we could use the same strategy? (I'm looking into it and will report back.)
Looks like they infer the type if the if uri.startswith("http"):
return CatalogType.REST
elif uri.startswith("thrift"):
return CatalogType.HIVE
elif uri.startswith(("sqlite", "postgresql")):
return CatalogType.SQL |
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed. Seems like a nice PR, if it works.
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviwed python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to try and use it and report back.
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
def adapter( | ||
properties: Dict[str, str], | ||
name: Optional[str] = None | ||
) -> IcebergCatalogAdapter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should compare this directly to https://py.iceberg.apache.org/reference/pyiceberg/catalog/#pyiceberg.catalog.load_catalog, https://github.com/apache/iceberg-python/blob/pyiceberg-0.6.1/pyiceberg/catalog/__init__.py#L185-L200
def load_catalog(name: Optional[str] = None, **properties: Optional[str]) -> Catalog:
Should we call this load_catalog_adapter
, load_catalog
, or load_adapter
instead? Should we match the same sort of signature, ie using **properties
?
In addition, I think from python standpoint, we also need to compare to the wider pyiceberg API to compare / contrast, https://py.iceberg.apache.org/api/
I don't think we necessarily need to dive deep into the details today, but we could potentially add a layer of yaml config in the future like pyiceberg does with ~/.pyiceberg.yaml
(in which case, the config is looked up by name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the order of args, we are not able to do adapter("name", { "properties": ... })
, which seems like the more natural order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new call is IcebergCatalogAdapter.adapter()
, I think that is pretty clear but I'm open to new name.
throw new IllegalArgumentException(String.format("Catalog type or implementation property '%s' is required", | ||
CatalogProperties.CATALOG_IMPL)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to infer the catalog type based on URI like pyiceberg does:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do this and it will bring us closer to PyIceberg configuration. My approach was not to repeat all the hand-holding that pyiceberg does, but to mirror the Java Iceberg API which is more barebones.
Not sure which approach is better, but I lean toward less code and more user responsibility (esp. where it matches Iceberg Java API).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with introducing these kinds of helpers, as long as we can be sure that they are:
- Unambiguous
- Unlikely to break later because of changes to the expected properties
That said, we can merge property-driven catalog support independently and add these kinds of "nice to have" features later; they will not change the interface, as far as I can tell.
…to-create data-instructions from property collection.
Util/channel/src/main/java/io/deephaven/util/channel/DataInstructionsProviderLoader.java
Outdated
Show resolved
Hide resolved
Util/channel/src/main/java/io/deephaven/util/channel/DataInstructionsProviderLoader.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3InstructionsProviderPlugin.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3InstructionsProviderPlugin.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3InstructionsProviderPlugin.java
Outdated
Show resolved
Hide resolved
Util/channel/src/main/java/io/deephaven/util/channel/DataInstructionsProviderPlugin.java
Outdated
Show resolved
Hide resolved
Util/channel/src/main/java/io/deephaven/util/channel/DataInstructionsProviderLoader.java
Outdated
Show resolved
Hide resolved
Util/channel/src/main/java/io/deephaven/util/channel/DataInstructionsProviderLoader.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTools.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Show resolved
Hide resolved
@@ -245,3 +246,73 @@ def adapter_aws_glue( | |||
except Exception as e: | |||
raise DHError(e, "Failed to build Iceberg Catalog Adapter") from e | |||
|
|||
|
|||
def adapter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be unit tested yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like it is not testable yet.
Labels indicate documentation is required. Issues for documentation have been opened: Community: deephaven/deephaven-docs-community#305 |
Java, connecting to a RESTCatalog using MinIO
Python, connecting to a RESTCatalog using MinIO
Java, connecting to AWS Glue
NOTE: credentials set in local environment
Python, connecting to AWS Glue
NOTE: credentials set in local environment