Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Iceberg $history table when using REST Catalog #17470

Merged

Conversation

findinpath
Copy link
Contributor

Description

In the current version of Iceberg library, when not all the snapshots are initially loaded (for the REST session catalog), the snapshotsLog does not get reinitialized, causing to expose only the current snapshots in the history() method of the Iceberg table.

Related Iceberg code

https://github.com/apache/iceberg/blob/8865be48bcb20ce07972bf5176955d03a1347d27/core/src/main/java/org/apache/iceberg/TableMetadata.java#L497-L516

Additional context and related issues

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@findinpath findinpath force-pushed the findinpath/iceberg-history-table-bugfix branch from 6fe8047 to f28ca0d Compare May 12, 2023 07:11
@@ -77,14 +76,13 @@ public RecordCursor cursor(ConnectorTransactionHandle transactionHandle, Connect

Set<Long> ancestorIds = ImmutableSet.copyOf(SnapshotUtil.currentAncestorIds(icebergTable));
TimeZoneKey timeZoneKey = session.getTimeZoneKey();
for (HistoryEntry historyEntry : icebergTable.history()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current version of Iceberg library, when not all the
snapshots are initially loaded (for the REST session catalog),
the snapshotsLog does not get reinitialized, causing to
expose only the current snapshots in the history() method
of the Iceberg table.

is this a bug in the library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a bug in the library?

Yes, please see the discussion https://apache-iceberg.slack.com/archives/C03LG1D563F/p1683871245426989

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findinpath unfortunately the discussion is no longer visible in slack (thread is a bit old).

We recently hit an issue here because we made the assumption that $history table only contained snapshots that were at some point or another the current snapshot of a table.

I wonder what is the difference now between $snapshots and $history? Since I can get pretty much anything from $history using only $snapshots.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gustavoatt i don't see the replies anymore on the slack thread and lost context :(

@@ -595,6 +595,32 @@ public void testDropTableWithNonExistentTableLocation()
assertFalse(getQueryRunner().tableExists(getSession(), tableName));
}

@Test
public void testMetadataTables()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a code comment why this is in SmokeTest.

@findepi findepi changed the title Rely on Table.snapshots() method to build the history cursor Fix Iceberg $history table when using REST Catalog May 12, 2023
@findepi
Copy link
Member

findepi commented May 12, 2023

@findinpath consider changing the commit message to something like Fix Iceberg $history table when using REST Catalog

In the current version of Iceberg library, when not all the
snapshots are initially loaded (for the REST session catalog),
the `snapshotsLog` does not get reinitialized, causing to
expose only the current snapshots in the `history()` method
of the Iceberg table.
@findinpath findinpath force-pushed the findinpath/iceberg-history-table-bugfix branch from f28ca0d to ea3b9e0 Compare May 14, 2023 14:31
@findinpath findinpath requested a review from findepi May 14, 2023 15:14
@findepi findepi merged commit 6daeaf3 into trinodb:master May 15, 2023
@github-actions github-actions bot added this to the 418 milestone May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

4 participants