Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify getRawSystemTable in Delta Lake #18157

Merged
merged 1 commit into from
Jul 14, 2023

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Jul 7, 2023

Description

Simplify getRawSystemTable in Delta Lake. After this change, we can avoid a new field for DeltaLakeTableHandle in #17592

Release notes

(x) This is not user-visible or docs only and no release notes are required.

@cla-bot cla-bot bot added the cla-signed label Jul 7, 2023
@github-actions github-actions bot added the delta-lake Delta Lake connector label Jul 7, 2023
@ebyhr ebyhr requested review from findepi and findinpath July 7, 2023 06:28
@ebyhr ebyhr force-pushed the ebi/delta-get-raw-system-table branch 2 times, most recently from 1dba326 to bb46e79 Compare July 10, 2023 04:36
@@ -409,6 +409,9 @@ private void testCorruptedTableLocation(String tableName, Path tableLocation, bo
assertQueryReturnsEmptyResult("SELECT column_name, data_type FROM information_schema.columns WHERE table_schema = CURRENT_SCHEMA AND table_name LIKE 'bad\\_person\\_%' ESCAPE '\\'");
assertQueryReturnsEmptyResult("SELECT column_name, data_type FROM system.jdbc.columns WHERE table_cat = CURRENT_CATALOG AND table_schem = CURRENT_SCHEMA AND table_name LIKE 'bad\\_person\\_%' ESCAPE '\\'");

// $history system table returns an empty result for corrupted table
assertQueryReturnsEmptyResult("SELECT * FROM \"" + tableName + "$history\"");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When retrieving the commit entries io.trino.plugin.deltalake.DeltaLakeMetadata#getCommitInfoEntries (or even more low-level io.trino.plugin.deltalake.transactionlog.checkpoint.TransactionLogTail#loadNewTail(io.trino.filesystem.TrinoFileSystem, java.lang.String, java.util.Optional<java.lang.Long>) ) we should check whether we have at least one commit entry.

If the table doesn't have any commit entry, something is wrong with it and should be signaled in this kind of query.

If the table has at least one commit entry, but they contain corrupted information (e.g. : protocol entry not found in the transaction log), it would still be a bit controversial to return the history entries of the "corrupted table" successfully.

I find value in doing the extra work by getting the table handle and eventually signaling whether there is an issue.

@ebyhr ebyhr force-pushed the ebi/delta-get-raw-system-table branch from bb46e79 to 4347376 Compare July 11, 2023 02:21
Comment on lines +3037 to +3047
String tableLocation = table.get().location();
TableSnapshot tableSnapshot = getSnapshot(new SchemaTableName(systemTableName.getSchemaName(), tableName), tableLocation, session);
try {
transactionLogAccess.getMetadataEntry(tableSnapshot, session);
}
SchemaTableName systemTableName = new SchemaTableName(tableName.getSchemaName(), DeltaLakeTableName.tableNameWithType(name, tableType.get()));
catch (TrinoException e) {
if (e.getErrorCode().equals(DELTA_LAKE_INVALID_SCHEMA.toErrorCode())) {
return Optional.empty();
}
throw e;
}
Copy link
Contributor

@findinpath findinpath Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the purpose of this code because I have context to the PR.
For maintenance purposes I'd appreciate if these checks would be astracted

Suggested change
String tableLocation = table.get().location();
TableSnapshot tableSnapshot = getSnapshot(new SchemaTableName(systemTableName.getSchemaName(), tableName), tableLocation, session);
try {
transactionLogAccess.getMetadataEntry(tableSnapshot, session);
}
SchemaTableName systemTableName = new SchemaTableName(tableName.getSchemaName(), DeltaLakeTableName.tableNameWithType(name, tableType.get()));
catch (TrinoException e) {
if (e.getErrorCode().equals(DELTA_LAKE_INVALID_SCHEMA.toErrorCode())) {
return Optional.empty();
}
throw e;
}
String tableLocation = table.get().location();
if (isTableCorrupted(new SchemaTableName(systemTableName.getSchemaName(), tableName), tableLocation, session)) {
return Optional.empty();
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where isTableCorrupted method comes from. Can you elaborate on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I didn't express myself clearly.
I intended to suggest extracting the code to a method to encapsulate the logic related to seeing whether the table is or not corrupt.

@ebyhr ebyhr merged commit ba18e5b into trinodb:master Jul 14, 2023
@ebyhr ebyhr deleted the ebi/delta-get-raw-system-table branch July 14, 2023 05:27
@github-actions github-actions bot added this to the 423 milestone Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector
Development

Successfully merging this pull request may close these issues.

3 participants