Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add register_table procedure support for iceberg table #14375

Merged

Conversation

krvikash
Copy link
Contributor

@krvikash krvikash commented Sep 29, 2022

Description

Fixes #13552

Adding register_table procedure for iceberg connector to register table using existing metadata.

  • Iceberg table will be created using provided table location
  • Look for the latest metadata file ending with .metadata.json inside the metadata folder and use it for creating the table
  • If provided location is invalid/does not exist then throw an exception
  • If no metadata file exists inside provided table location then throw an exception
  • If more than one metadata file exists with the latest sequence number then throw an exception
  • schema_name, table_name, and table_location should be not-null and valid, Otherwise, the exception will be thrown
  • User can optionally provide the valid metadata_file_location (See the usage below)

Valid usages:

  • register_table(schema_name => ..., table_name => ..., table_location => ...)
  • register_table(schema_name => ..., table_name => ..., table_location => ..., metadata_file_location => ...)

Sample Queries:
CALL iceberg.system.register_table('default', 'src_22', 'hdfs://hadoop-master:9000/user/hive/warehouse/orders_5-581fad8517934af6be1857a903559d44');

CALL iceberg.system.register_table('default', 'src_22', 'hdfs://hadoop-master:9000/user/hive/warehouse/orders_5-581fad8517934af6be1857a903559d44', null);

CALL iceberg.system.register_table('default', 'src_22', 'hdfs://hadoop-master:9000/user/hive/warehouse/orders_5-581fad8517934af6be1857a903559d44', '00003-409702ba-4735-4645-8f14-09537cc0b2c8.metadata.json');

Test cases are added for success and failure scenarios in the following classes.

  • Flat File -> TestIcebergRegisterTableProcedure
  • AWS Glue, Minio, GCP -> BaseIcebergConnectorSmokeTest
  • HDFS (Spark) -> TestIcebergSparkCompatibility

Configuration:
By default register_table procedure is disabled. Enable procedure by setting iceberg.register-table-procedure.enabled to true in config.

Non-technical explanation

NA

Release notes

( ) This is not user-visible or docs only and no release notes are required.
(X) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Sep 29, 2022
@krvikash krvikash changed the title Iceberg support register table procedure Iceberg support registertable procedure Sep 29, 2022
@krvikash krvikash changed the title Iceberg support registertable procedure Add register_table procedure support for iceberg table Sep 29, 2022
@krvikash krvikash force-pushed the iceberg-support-register-table-procedure branch from 6e09f17 to e464f17 Compare September 29, 2022 14:34
@krvikash krvikash self-assigned this Sep 29, 2022
@alexjo2144
Copy link
Member

The approach generally looks good to me. Thanks for putting this together

@krvikash krvikash marked this pull request as ready for review September 30, 2022 08:50
@krvikash krvikash force-pushed the iceberg-support-register-table-procedure branch 6 times, most recently from 57a7092 to 9ddb33d Compare October 1, 2022 15:23
import static java.lang.String.format;
import static org.assertj.core.api.Assertions.assertThat;

public class TestIcebergRegisterTableProcedure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe integrate the tests from this class into BaseIcebergConnectorTest ?

@krvikash krvikash force-pushed the iceberg-support-register-table-procedure branch 2 times, most recently from 0308ec8 to 034538f Compare October 4, 2022 08:09
@krvikash krvikash force-pushed the iceberg-support-register-table-procedure branch from 70a8a52 to 0f59c9f Compare November 7, 2022 11:59
@krvikash krvikash force-pushed the iceberg-support-register-table-procedure branch from 0f59c9f to a278ee1 Compare November 7, 2022 13:22
@krvikash krvikash force-pushed the iceberg-support-register-table-procedure branch 2 times, most recently from 55d8d5c to 8058841 Compare November 8, 2022 11:27
Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some documentation wording nit picks 👍

@krvikash krvikash force-pushed the iceberg-support-register-table-procedure branch from 8058841 to cfcfbd0 Compare November 8, 2022 15:41
@findepi findepi merged commit ce68ae9 into trinodb:master Nov 9, 2022
@findepi findepi mentioned this pull request Nov 9, 2022
@github-actions github-actions bot added this to the 403 milestone Nov 9, 2022
@mosabua
Copy link
Member

mosabua commented Nov 10, 2022

Congratulations @krvikash .. well done on persisting through and working with all the reviewers towards a successful merge. Great collaboration everyone!

@krvikash
Copy link
Contributor Author

Congratulations @krvikash .. well done on persisting through and working with all the reviewers towards a successful merge. Great collaboration everyone!

Thanks @mosabua :)

@krvikash krvikash deleted the iceberg-support-register-table-procedure branch April 30, 2023 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Add support for creating an Iceberg table from existing table content
7 participants