Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Iceberg default warehouse location config #9614

Closed
wants to merge 1 commit into from

Conversation

jackye1995
Copy link
Member

Support using a configurable warehouse location to determine the default table location.

This is to match the Iceberg side catalog behavior: https://github.com/apache/iceberg/blob/master/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java#L452-L459.

@losipiuk @findepi

@findepi
Copy link
Member

findepi commented Oct 30, 2021

In Iceberg, I can define location property on the schema.
@jackye1995 Why do we need this?

cc @losipiuk @phd3 @alexjo2144

@jackye1995 jackye1995 changed the title Iceberg: support default warehouse location Support Iceberg default warehouse location config Nov 23, 2021
@jackye1995
Copy link
Member Author

In Iceberg, I can define location property on the schema. Why do we need this?

There are 2 reasons for this config:

  1. Iceberg has the concept of a default warehouse location for catalog, so that users do not have to specify table location. This is similar to Hive's hive.metastore.warehouse.dir. But as you know Iceberg does not prefer using Hadoop config, so a warehouse location config is used. Adding this config in Trino brings Iceberg and Trino catalog concept closer.
  2. Hadoop catalog actually requires such a config. If user supplies an alternative table location, it should throw an exception. This is because Hadoop catalog relies on the notion of warehouse/db/table to store catalog architecture. It does file listing to find all the databases and tables within a database. So this config is necessary for Hadoop catalog implementation.

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test covering creation of tables without location, within schema without location set.
This is what becomes possible with this change.

return this;
}

public String getCatalogWarehouse()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the property Optional<String>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -51,6 +53,19 @@ public IcebergConfig setCatalogType(CatalogType catalogType)
return this;
}

@Config("iceberg.catalog.warehouse")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about the name. something like "default storage location"?

cc @losipiuk @alexjo2144 @phd3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to match https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/CatalogProperties.java#L31. Please let me know if other name is preferred, I can update to that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For schemas and tables we use "location". So probably we should use it also here. iceberg.default-schema-location?

@@ -39,6 +39,7 @@
private final CatalogType catalogType;
private final boolean isUniqueTableLocation;
private final boolean isUsingSystemSecurity;
private final String warehouse;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be named more meaningfully. Let's figure config property name first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I updated to catalogWarehouse for now

"please either set 'location' when creating the database, or set 'iceberg.catalog.warehouse' " +
"to allow a default location at 'warehousePath/databaseName.db'", database.getDatabaseName()));
}
database = Database.builder(database)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not build a fake Database object.
Instead, let's modify getTableDefaultLocation (which already has related logic).

"to allow a default location at 'warehousePath/databaseName.db'", database.getDatabaseName()));
}
database = Database.builder(database)
.setLocation(Optional.of(format("%s/%s.db", warehouse, schemaTableName.getSchemaName())))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why .db?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackye1995
Copy link
Member Author

@findepi test is actually quite difficult to add, the file metastore has a hard-coded database location, so that code path would never be exercised. Not sure if there is any other good way to trigger that.

@jackye1995
Copy link
Member Author

We can also choose to not have this feature for Hive catalog and only support it with Hadoop and Glue catalog, because based on the current logic of HiveWriteUtils.getTableDefaultLocation, table name creation should fail if the provided path does not exist in non-S3 HDFS environment. So it seems like Trino assumes the existence of a database location for this to work for Hive. Any thoughts?

@jackye1995
Copy link
Member Author

close in favor of #10151

@jackye1995 jackye1995 closed this Dec 2, 2021
@findepi
Copy link
Member

findepi commented Dec 16, 2021

This is to match the Iceberg side catalog behavior: https://github.com/apache/iceberg/blob/master/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java#L452-L459.

The link no longer seems to be adequate. @jackye1995 can you link to a file in a commit, instead of a branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants