Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for storing metadata to metastore in Delta Lake #21463

Merged
merged 6 commits into from
Aug 5, 2024

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Apr 9, 2024

Description

This PR adds a new delta.metastore.store-table-metadata config property (true by default) and stores the last transaction number and schema string to table parameter on the metastore when the property is enabled.

Conditions we update the metastore:

  • Add these table parameters during CREATE (OR REPLACE) TABLE (AS) statements. This is non-async operation because we have to create or replace metastore tables either.
  • Repalce metastore table in getTableHandle asynchronously if the transaction number doesn't exist in metastore or the number is older than _delta_log directory
  • Repalce metastore table in other DDL or DML asynchronously

The cached information is used when listing table comments or columns.
Test result with 100 tables having 7 transaction logs:

SELECT * FROM system.metadata.table_comments WHERE schema_name = 'xxx';
3:09 → 1:44 (55%)

SELECT * FROM information_schema.columns WHERE table_schema = 'xxx';
3:30 → 1:42 (48%)

We could improve more once we use Glue specific API call (AWSGlue#getTables). This can be handled in follow-up.

Release notes

(x) Release notes are required, with the following suggested text:

# Delta Lake
* Add support for caching table metadata to metastore. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Apr 9, 2024
@github-actions github-actions bot added docs delta-lake Delta Lake connector labels Apr 9, 2024
@ebyhr ebyhr force-pushed the ebi/delta-metastore-cache branch from 29de71c to 374a418 Compare April 9, 2024 07:34
@findinpath
Copy link
Contributor

Pls rebase to deal with code conflicts.

Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very promissing !!!

@@ -42,4 +42,6 @@ public interface DeltaLakeMetastore
void dropTable(SchemaTableName schemaTableName, String tableLocation, boolean deleteData);

void renameTable(SchemaTableName from, SchemaTableName to);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need compatibility tests with Delta Lake OSS where Spark writes to the table and Trino is constrained to update the metadata caching properties on any read/write operation.

@ebyhr ebyhr force-pushed the ebi/delta-metastore-cache branch 2 times, most recently from 8785fd6 to 0fc669a Compare April 12, 2024 09:59
@ebyhr
Copy link
Member Author

ebyhr commented Apr 12, 2024

Addressed comments partially. I will add more tests to TestDeltaLakeMetastoreAccessOperations.

@ebyhr ebyhr force-pushed the ebi/delta-metastore-cache branch 2 times, most recently from 3cb75c9 to 0d7fdfd Compare April 15, 2024 02:36
@ebyhr ebyhr marked this pull request as ready for review April 15, 2024 11:23
@findinpath findinpath requested review from alexjo2144 and pajaks April 16, 2024 20:05
}
tableComment.ifPresent(comment -> builder.put(TABLE_COMMENT, comment));
builder.put(TRINO_LAST_TRANSACTION_VERSION, Long.toString(version));
builder.put(TRINO_METADATA_SCHEMA_STRING, schemaString);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you taken into consideration base64(compress(schema))?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question.
Regardless of whether we compress or not, we should have a check on length.

boolean canPersistComment = (comment == null || comment.length() <= GLUE_TABLE_PARAMETER_LENGTH_LIMIT);
boolean canPersistColumnInfo = glueColumns.isPresent();
boolean canPersistMetadata = canPersistComment && canPersistColumnInfo;

docs/src/main/sphinx/connector/delta-lake.md Outdated Show resolved Hide resolved
@ebyhr ebyhr force-pushed the ebi/delta-metastore-cache branch 3 times, most recently from 303cf68 to 050edfa Compare April 26, 2024 02:44
@@ -626,6 +650,9 @@ public LocatedTableHandle getTableHandle(
return null;
}
verifySupportedColumnMapping(getColumnMappingMode(metadataEntry, protocolEntry));
if (cacheTableMetadata && endVersion.isEmpty() && !isCachedVersionSameAsLastTransaction(metastoreTable.get(), tableSnapshot)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of referencing multiple times the same table in a query, is this check supposed to avoid calling metastore table replace repeatedly for the same table?

I don't follow where we ensure that the call is made only once.

I'm thinking that this should be ensured with the help of a ConcurrentMap like in io.trino.plugin.deltalake.DeltaLakeMetadata#getSnapshot(io.trino.spi.connector.ConnectorSession, io.trino.spi.connector.SchemaTableName, java.lang.String, java.util.Optional<java.lang.Long>)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow where we ensure that the call is made only once.

good point. i don't think we do

we should maintain a sort of a queue of pending updates
and scheduling new update logically remove previous entry

the ordering of updates doesn't matter (we can deliver them out of order, as long as we deduplicate), and we want to deduplicate, so we only need a Map<SchemaTableName, /* update info */>

Moreover, the update should be propagated to the metastore after transaction is committed.

Thus

  • DeltaLakeMetadata should collect stuff to update (there shouldn't be any duplicates here, as DeltaLakeMetadata should not observe two different versions of one table, unless AS OF is used0)
  • on commit, it should move the collected stuff to update to update manager singleton, to perform actual update
  • the update manager should deduplicate (updates coming from different transactions) and perform actual updates

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(skimmed)

@@ -626,6 +650,9 @@ public LocatedTableHandle getTableHandle(
return null;
}
verifySupportedColumnMapping(getColumnMappingMode(metadataEntry, protocolEntry));
if (cacheTableMetadata && endVersion.isEmpty() && !isCachedVersionSameAsLastTransaction(metastoreTable.get(), tableSnapshot)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow where we ensure that the call is made only once.

good point. i don't think we do

we should maintain a sort of a queue of pending updates
and scheduling new update logically remove previous entry

the ordering of updates doesn't matter (we can deliver them out of order, as long as we deduplicate), and we want to deduplicate, so we only need a Map<SchemaTableName, /* update info */>

Moreover, the update should be propagated to the metastore after transaction is committed.

Thus

  • DeltaLakeMetadata should collect stuff to update (there shouldn't be any duplicates here, as DeltaLakeMetadata should not observe two different versions of one table, unless AS OF is used0)
  • on commit, it should move the collected stuff to update to update manager singleton, to perform actual update
  • the update manager should deduplicate (updates coming from different transactions) and perform actual updates


TrinoFileSystem fileSystem = fileSystemFactory.create(session);

Stream<RelationCommentMetadata> tables = streamTables(session, schemaName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like efficient operation, but it's not good API at least for Glue. Getting table names is as expensive as getting all table information (because that's what's happening under the covers)
We should add something like HiveMetastore.streamTables that would return full table objects in an iterative manner. I happen to have something like that implemented, so I can share the code later. For now, we can just use listTables, without extracting new streamTables method.

Comment on lines 514 to 516
.add(GET_TABLE)
.add(REPLACE_TABLE)
.build());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And without cache, we do only GET_TABLE, right?

would it be possible to have "do cache metadata" as a boolean variable in the test
so that the expected can be expressed as a conditional expression based on "do cache metadata"? this would make the test tell the story more directly

@ebyhr ebyhr force-pushed the ebi/delta-metastore-cache branch from 050edfa to 7114f75 Compare May 8, 2024 04:31
@ebyhr
Copy link
Member Author

ebyhr commented May 8, 2024

Just rebased on master to resolve conflicts. (No change)

@ebyhr ebyhr force-pushed the ebi/delta-metastore-cache branch 2 times, most recently from 810d795 to 57d174d Compare May 9, 2024 08:33
@ebyhr ebyhr force-pushed the ebi/delta-metastore-cache branch from 65c0f8e to e1e35b9 Compare July 23, 2024 10:51
@ebyhr
Copy link
Member Author

ebyhr commented Jul 23, 2024

Rebased on master to resolve logical conflicts.

@ebyhr ebyhr force-pushed the ebi/delta-metastore-cache branch 3 times, most recently from 71390ac to 69b1e5a Compare July 25, 2024 09:11
@ebyhr ebyhr requested a review from wendigo July 25, 2024 09:13
@ebyhr ebyhr force-pushed the ebi/delta-metastore-cache branch from 69b1e5a to ecaa8e7 Compare July 26, 2024 01:32
@ebyhr
Copy link
Member Author

ebyhr commented Jul 26, 2024

Rebased on master to resolve conflicts.

@@ -53,6 +53,11 @@ public void commit(ConnectorTransactionHandle transaction)
{
MemoizedMetadata deltaLakeMetadata = transactions.remove(transaction);
checkArgument(deltaLakeMetadata != null, "no such transaction: %s", transaction);
deltaLakeMetadata.optionalGet().ifPresent(metadata -> {
try (ThreadContextClassLoader ignored = new ThreadContextClassLoader(getClass().getClassLoader())) {
metadata.commit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated question: Why is ThreadContextClassLoader required for metadata.commit()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably to ensure that metadata.commit() call will be executed in the context of the delta lake plugin classloader

@ebyhr
Copy link
Member Author

ebyhr commented Jul 30, 2024

@wendigo Please take another look.

@ebyhr ebyhr force-pushed the ebi/delta-metastore-cache branch 2 times, most recently from 8295cc1 to 537164b Compare August 1, 2024 11:52
Copy link
Contributor

@wendigo wendigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % fixups

@ebyhr ebyhr force-pushed the ebi/delta-metastore-cache branch 2 times, most recently from 3afd97a to e010a3d Compare August 5, 2024 01:22
@ebyhr ebyhr force-pushed the ebi/delta-metastore-cache branch from e010a3d to 33f8454 Compare August 5, 2024 02:35
@ebyhr ebyhr merged commit 27f86ca into trinodb:master Aug 5, 2024
15 of 37 checks passed
@ebyhr ebyhr deleted the ebi/delta-metastore-cache branch August 5, 2024 02:41
@github-actions github-actions bot added this to the 454 milestone Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector docs hive Hive connector
Development

Successfully merging this pull request may close these issues.

7 participants