Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg: use native implementation to obtain snapshot schema #13614

Merged
merged 1 commit into from
Aug 18, 2022

Conversation

xiacongling
Copy link
Contributor

@xiacongling xiacongling commented Aug 11, 2022

Description

Is this change a fix, improvement, new feature, refactoring, or other?

a fix

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Iceberg Connector

How would you describe this change to a non-technical end user or system administrator?

Uss native implementation to obtain snapshot schema when executing Iceberg time travel queries. Time travel queries on a table created with Iceberg<0.12 will fail without this fix.

Related issues, pull requests, and links

related to #12743 #12786
fixes #13613

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

() No release notes entries required.
(x) Release notes entries required with the following suggested text

# Iceberg
* Fix failure when executing time travel queries on a table created with Iceberg<0.12. ({issue}`13613`)

@cla-bot
Copy link

cla-bot bot commented Aug 11, 2022

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiacongling would you be able to add a regression product test for this case?

see TestIcebergSparkCompatibility for existing tests

@xiacongling
Copy link
Contributor Author

@findepi sure

@cla-bot
Copy link

cla-bot bot commented Aug 12, 2022

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@xiacongling
Copy link
Contributor Author

xiacongling commented Aug 12, 2022

Spend some time with the testing framework and finally get it work. A case is added in SuiteCompatibility for the reason that Iceberg metafile is changing over times, we must make sure tables created by an early version of Trino can be queried normally in newer versions.

Early versions (<= 364) of Trino use iceberg v0.11 which did not support time traveling. The snapshot did not have a field named schema-id in its metadata JSON file. After updated to Trino 392, it will fail to query such tables with the following exception message:

io.trino.tempto.query.QueryExecutionException: java.sql.SQLException: Query failed (#20220812_044558_00002_36g47): Cannot invoke "org.apache.iceberg.Schema.asStruct()" because "schema" is null

@xiacongling
Copy link
Contributor Author

@findepi Could you review this PR?

@xiacongling xiacongling force-pushed the time-travel-query-bugfix branch from 55e9583 to 0c85b4d Compare August 12, 2022 11:47
@cla-bot
Copy link

cla-bot bot commented Aug 12, 2022

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@findinpath
Copy link
Contributor

@xiacongling please squash the commits and follow the
guideline for the commit messages:

https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#format-git-commit-messages

@@ -326,7 +327,7 @@ public IcebergTableHandle getTableHandle(
long snapshotId = endVersion.map(connectorTableVersion -> getSnapshotIdFromVersion(table, connectorTableVersion))
.orElseGet(() -> resolveSnapshotId(table, name.getSnapshotId().get(), isAllowLegacySnapshotSyntax(session)));
tableSnapshotId = Optional.of(snapshotId);
tableSchema = table.schemas().get(table.snapshot(snapshotId).schemaId());
tableSchema = schemaFor(table, snapshotId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, here's what that method does:

  public static Schema schemaFor(Table table, long snapshotId) {
    Snapshot snapshot = table.snapshot(snapshotId);
    Preconditions.checkArgument(snapshot != null, "Cannot find snapshot with ID %s", snapshotId);
    Integer schemaId = snapshot.schemaId();

    // schemaId could be null, if snapshot was created before Iceberg added schema id to snapshot
    if (schemaId != null) {
      Schema schema = table.schemas().get(schemaId);
      Preconditions.checkState(schema != null,
          "Cannot find schema with schema id %s", schemaId);
      return schema;
    }

    // TODO: recover the schema by reading previous metadata files
    return table.schema();
  }

@xiacongling xiacongling force-pushed the time-travel-query-bugfix branch from 0c85b4d to 7ba88af Compare August 12, 2022 13:42
@cla-bot
Copy link

cla-bot bot commented Aug 12, 2022

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@xiacongling
Copy link
Contributor Author

@findinpath done

@xiacongling xiacongling force-pushed the time-travel-query-bugfix branch from 7ba88af to 66ae1dc Compare August 15, 2022 02:13
@cla-bot
Copy link

cla-bot bot commented Aug 15, 2022

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@xiacongling xiacongling force-pushed the time-travel-query-bugfix branch from 66ae1dc to 45387dc Compare August 16, 2022 13:59
@cla-bot
Copy link

cla-bot bot commented Aug 16, 2022

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Iceberg snapshot definition in table metadata file does not have a
schema-id until Iceberg version 0.12. Time travel queries on a
table created by early versions of Trino (<=364) will fail without
this fix.
@ebyhr ebyhr force-pushed the time-travel-query-bugfix branch from 45387dc to f67c2f1 Compare August 17, 2022 03:20
@cla-bot
Copy link

cla-bot bot commented Aug 17, 2022

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@ebyhr
Copy link
Member

ebyhr commented Aug 17, 2022

I confirmed @xiacongling already submit CLA last week.
cc: @martint

@martint
Copy link
Member

martint commented Aug 18, 2022

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Aug 18, 2022
@cla-bot
Copy link

cla-bot bot commented Aug 18, 2022

The cla-bot has been summoned, and re-checked this pull request!

@ebyhr ebyhr merged commit adcfbe9 into trinodb:master Aug 18, 2022
@ebyhr
Copy link
Member

ebyhr commented Aug 18, 2022

Merged, thanks!

@ebyhr ebyhr mentioned this pull request Aug 18, 2022
@github-actions github-actions bot added this to the 394 milestone Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Fail to execute Iceberg time travel query
7 participants