Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Delta's DROP TABLE support for corrupted tables #16651

Merged
merged 4 commits into from
Mar 24, 2023

Conversation

findepi
Copy link
Member

@findepi findepi commented Mar 21, 2023

Previously, a corrupted table handle was represented with a
DeltaLakeTableHandle missing a MetadataEntry. When drop support for
corrupted tables was implemented, a connector could have only one table
handle class. This commit improves the distinction by introducing a
dedicated table handle class to represent corrupted tables. As a
necessity, this class is explicitly handled in multiple
DeltaLakeMetadata methods. This sets viable example to follow for
implementing drop support for corrupted Iceberg tables as a follow-up.

Note: testGetInsertLayoutTableNotFound test was removed, instead of
being updated, since ConnectorMetadata.getInsertLayout cannot be
reached for a corrupted table, as getting column list will fail earlier.

assertQuerySucceeds("ALTER TABLE " + tableName + " RENAME TO bad_person_some_new_name");
assertQuerySucceeds("ALTER TABLE bad_person_some_new_name RENAME TO " + tableName);
assertQueryFails("ALTER TABLE " + tableName + " ADD COLUMN foo int", "Metadata not found in transaction log for tpch.bad_person");
// TODO (https://github.com/trinodb/trino/issues/16248) ADD field
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected that the target of #16248 is only Iceberg connector. Could you replace the link with #16662?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i linked to the PR that adds syntax.
Once we have syntax supported we should have an uncommented test line here, even before the functionality is supported in Delta

assertQueryFails("TRUNCATE TABLE " + tableName, "This connector does not support truncating tables");
assertQueryFails("COMMENT ON TABLE " + tableName + " IS NULL", "Protocol entry not found in transaction log for table tpch.bad_person");
assertQueryFails("COMMENT ON COLUMN " + tableName + ".foo IS NULL", "Metadata not found in transaction log for tpch.bad_person");
assertQueryFails("CALL system.vacuum(CURRENT_SCHEMA, '" + tableName + "', '7d')", "\\QFailure when vacuuming tpch.bad_person with retention 7d: java.lang.NullPointerException: Cannot invoke \"java.util.List.stream()\" because \"recentVersions\" is null");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: NullPointerException here (before the refactor)

@@ -140,21 +140,20 @@ public void testCorruptedTableLocation()
assertQueryFails("ANALYZE " + tableName, "Metadata not found in transaction log for tpch.bad_person");
assertQueryFails("ALTER TABLE " + tableName + " EXECUTE optimize", "Metadata not found in transaction log for tpch.bad_person");
assertQueryFails("ALTER TABLE " + tableName + " EXECUTE vacuum", "Metadata not found in transaction log for tpch.bad_person");
assertQuerySucceeds("ALTER TABLE " + tableName + " RENAME TO bad_person_some_new_name");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: before the change we allowed rename. I think it was incidental

false);

assertThatThrownBy(() -> deltaLakeMetadataFactory.create(SESSION.getIdentity())
.getInsertLayout(SESSION, missingTableHandle))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: testGetInsertLayoutTableNotFound test was removed, instead of
being updated, since ConnectorMetadata.getInsertLayout cannot be
reached for a corrupted table, as getting column list will fail earlier.

@findepi findepi force-pushed the findepi/drop-delta-corrupted branch from bb00968 to 9afbe9f Compare March 21, 2023 16:26
@github-actions github-actions bot added the delta-lake Delta Lake connector label Mar 21, 2023
@findepi
Copy link
Member Author

findepi commented Mar 21, 2023

CI #16658

@findepi
Copy link
Member Author

findepi commented Mar 21, 2023

CI ... and some bigquery failures which i don't understand. cc @ebyhr @hashhar

@ebyhr
Copy link
Member

ebyhr commented Mar 21, 2023

The BigQuery CI failures are handled in #16603

assertQuerySucceeds("ALTER TABLE " + tableName + " RENAME TO bad_person_some_new_name");
assertQuerySucceeds("ALTER TABLE bad_person_some_new_name RENAME TO " + tableName);
assertQueryFails("ALTER TABLE " + tableName + " ADD COLUMN foo int", "Metadata not found in transaction log for tpch.bad_person");
// TODO (https://github.com/trinodb/trino/issues/16248) ADD field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected that the target of #16248 is only Iceberg connector. Could you replace the link with #16662?

@findepi
Copy link
Member Author

findepi commented Mar 22, 2023

Conflict with #16466, rebasing

Comment on lines 619 to 621
if (tableHandle instanceof CorruptedDeltaLakeTableHandle corruptedTableHandle) {
throw corruptedTableHandle.createException();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can extract throwIfCourrupted(ConnectorTableHandle handle) method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wanted to have throw in the calling code so that code flow analysis works, but i don't think there is much gain from that.

will change

Previously, a corrupted table handle was represented with a
`DeltaLakeTableHandle` missing a `MetadataEntry`.  When drop support for
corrupted tables was implemented, a connector could have only one table
handle class.  This commit improves the distinction by introducing a
dedicated table handle class to represent corrupted tables. As a
necessity, this class is explicitly handled in multiple
`DeltaLakeMetadata` methods. This sets viable example to follow for
implementing drop support for corrupted Iceberg tables as a follow-up.

Note: `testGetInsertLayoutTableNotFound` test was removed, instead of
being updated, since `ConnectorMetadata.getInsertLayout` cannot be
reached for a corrupted table, as getting column list will fail earlier.
@findepi
Copy link
Member Author

findepi commented Mar 23, 2023

CI #13995 and #16694 (fix, not issue)

@findepi findepi force-pushed the findepi/drop-delta-corrupted branch from 6045f41 to c23ec5e Compare March 23, 2023 12:18
assertQueryReturnsEmptyResult("SELECT column_name, data_type FROM information_schema.columns WHERE table_schema = CURRENT_SCHEMA AND table_name LIKE 'bad\\_perso_' ESCAPE '\\'");
assertQueryReturnsEmptyResult("SELECT column_name, data_type FROM system.jdbc.columns WHERE table_cat = CURRENT_CATALOG AND table_schem = CURRENT_SCHEMA AND table_name LIKE 'bad\\_perso_' ESCAPE '\\'");

// DROP TABLE should succeed so that users can remove their corrupted table
getQueryRunner().execute("DROP TABLE " + tableName);
assertFalse(getQueryRunner().tableExists(getSession(), tableName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also verify that the table location does not have any files after DROP TABLE execute?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed offline, in this test we already deleted all the files

       // break the table by deleting all its files including transaction log
        deleteRecursively(tableLocation, ALLOW_INSECURE);

i like the idea to improve the test. let's delete only transaction log and then add assertion that data files got removed.
since the PR is otherwise ready to go let's handle this as a follow-up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up PR #16764

@findepi
Copy link
Member Author

findepi commented Mar 24, 2023

CI #7535 (will address with #16707)

@findepi findepi merged commit d2534ab into master Mar 24, 2023
@findepi findepi deleted the findepi/drop-delta-corrupted branch March 24, 2023 10:14
@github-actions github-actions bot added this to the 411 milestone Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector
Development

Successfully merging this pull request may close these issues.

4 participants