-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Iceberg catalog warehouse property in REST #15762
Add support for Iceberg catalog warehouse property in REST #15762
Conversation
...ino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/IcebergRestCatalogConfig.java
Outdated
Show resolved
Hide resolved
...eberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/TrinoIcebergRestCatalogFactory.java
Outdated
Show resolved
Hide resolved
Hi @danielcweeks, Could you check the CI failure? |
LGTM |
c4cdc9e
to
8fa7519
Compare
I've fixed CI failures and tests should pass now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. CC: @colebow for the doc review.
@@ -95,6 +95,9 @@ Property Name Description | |||
``iceberg.rest-catalog.uri`` REST server API endpoint URI (required). | |||
Example: ``http://iceberg-with-rest:8181`` | |||
|
|||
``iceberg.rest-catalog.warehouse`` Warehouse identifier/location for the catalog (optional). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in JDBC catalog we call it iceberg.jdbc-catalog.default-warehouse-dir
with description "The default warehouse directory to use for JDBC"
the "identifier" is misleading
the config name is unnecessarily different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mosabua do we write "(optional)" for optional properties, or just for required ones (as above), or ?
@Config("iceberg.rest-catalog.warehouse") | ||
@ConfigDescription("The warehouse location/identifier to use with the REST catalog server") | ||
public IcebergRestCatalogConfig setWarehouse(String warehouse) | ||
{ | ||
this.warehouse = Optional.ofNullable(warehouse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's sync this with io.trino.plugin.iceberg.catalog.jdbc.IcebergJdbcCatalogConfig#setDefaultWarehouseDir
(config name, description, property name, field name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the comment below, the REST catalog makes this a little more flexible and doesn't necessarily require an actual directory (it could just be a name that the catalog maps to a directory).
@@ -38,6 +39,7 @@ | |||
private final CatalogName catalogName; | |||
private final String trinoVersion; | |||
private final URI serverUri; | |||
private final Optional<String> warehouse; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaultWarehouseDir
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findepi this isn't exactly the same thing as with JDBC since the warehouse property isn't required to be a directory (it could just be an identifier that may map to a location/bucket/etc.). The property itself is really just 'warehouse' and we probably don't want to restrict that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is warehouse identifier?
are there any docs for that that we should reference from our docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An identifier could be anything that the REST Server would use to map to the warehouse being addressed. This could be as simple as something like prod
which the server then would map to whatever the canonical location is for prod. For the docs, we could just reference the spec, but there's no strict requirement on format or how a server uses the value.
A good example of why this is useful is if you have something like the JDBCCatalog where you have to explicitly define the location in the config, but if something like Flink is also using the same catalog backend, but unintentionally points to a different location, you have a single catalog with data that's split across multiple locations. This can even result in individual tables having data end up split across multiple locations depending on writes/overwrites.
@@ -71,6 +74,7 @@ public synchronized TrinoCatalog create(ConnectorIdentity identity) | |||
if (icebergCatalog == null) { | |||
ImmutableMap.Builder<String, String> properties = ImmutableMap.builder(); | |||
properties.put(CatalogProperties.URI, serverUri.toString()); | |||
warehouse.ifPresent(location -> properties.put(CatalogProperties.WAREHOUSE_LOCATION, location)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielcweeks I set random characters with iceberg.rest-catalog.warehouse
in IcebergRestQueryRunnerMain
, but I didn't find any change. I expected the REST catalog throws an error or log something because the behind JDBC catalogs uses the different warehouse
catalog property. Can you share how REST catalog behaves in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ebyhr this is how the REST catalog currently uses this property: https://github.com/apache/iceberg/blob/4b5ee1a0ce77161b9195d8db4fe0e8350ea93fa8/core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java#L736-L746.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but the link doesn't answer the above question in my opinion. What's the expected behavior in the above random warehouse case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ebyhr I tried to explain in this comment, but this can't really be compared to how Hive works with locations and the expectation is not that the value should necessarily dictate anything about the path.
Because Iceberg has multiple implementations for Catalog, all with different behaviors, the warehouse
property is just a standard mechanism by which to indicate intent to the implementation. For simple implementations, like Hadoop or Hive Catalog, that could be just a physical location to store data. However, it is more of an abstraction that can be used by the implementation.
For example, the catalog service at Netflix has multiple catalogs (prod
, test
, shadow
, etc.), but there's just a single service endpoint (say metacat.data.netflix.net
). To address each of those independently, you need to either have different domains for each or expose different ports to the same service. With RESTCatalog implementation, you're just providing a request to address a specific one without having to change the surrounding infrastructure (and something like prod
is a lot easier to reason about than port 337033
).
The REST spec defines this as an option, but does not dictate how it is handled (and ignoring it is also an option, since that is a catalog behavior).
We could write tests, but at that point we'd be writing both the test and the expected behavior, so the test would largely be meaningless.
@nastra FYI i push empty commit called |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
|
Description
This PR adds support for configuring the
warehouse
catalog property for the REST Catalog which is part of the REST Spec.Additional context and related issues
N/A
Release notes
(X) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: