Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Table Concurrent query Failure handling in Delta Lake #24250

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vinay-kl
Copy link
Contributor

@vinay-kl vinay-kl commented Nov 25, 2024

Description

Create Table [as select] concurrent query failure handling

Additional context and related issues

Fixes #24153

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Delta-Lake
* Fixes Query Failure leading to table base-path deletion. ({issue}`24153 `)

@github-actions github-actions bot added the delta-lake Delta Lake connector label Nov 25, 2024
@vinay-kl vinay-kl self-assigned this Nov 25, 2024
@vinay-kl vinay-kl changed the title trino/hive: Create Table Concurrent query Failure handling trino/delta-lake: Create Table Concurrent query Failure handling Nov 25, 2024
@vinay-kl vinay-kl force-pushed the databricks-create-table-concurrent-fix branch from 078d491 to c7628b3 Compare November 25, 2024 17:12
@@ -1263,7 +1263,8 @@ public void createTable(ConnectorSession session, ConnectorTableMetadata tableMe
statisticsAccess.deleteExtendedStatistics(session, schemaTableName, location);
}
else {
setRollback(() -> deleteRecursivelyIfExists(fileSystem, deltaLogDirectory));
// deleteRecursivelyIfNothingExists ensures current CREATE TABLE doesn't delete directory if there's a conflict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't we checking in the catch clause whether we're dealing with a TransactionConflictException instead?
By doing this, we'd likely know whether we're in a concurrency situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findinpath the rollback happens in a different thread AFAIK, the exception context needs to be passed on as well. Also the rollback initialisation happens in beginCreateTable & createTable calls which is much prior to finishCreateTable which is later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're figuring out that another process already created the table only when trying to write the first transaction log file right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes @findinpath,


this is where the operation fails due Conflict.

@ebyhr ebyhr changed the title trino/delta-lake: Create Table Concurrent query Failure handling Create Table Concurrent query Failure handling in Delta Lake Nov 26, 2024
@vinay-kl vinay-kl force-pushed the databricks-create-table-concurrent-fix branch from c7628b3 to 32fae9e Compare November 29, 2024 11:32
@cla-bot cla-bot bot added the cla-signed label Nov 29, 2024
@vinay-kl vinay-kl marked this pull request as ready for review December 4, 2024 04:03
@vinay-kl vinay-kl force-pushed the databricks-create-table-concurrent-fix branch 2 times, most recently from 63cc0ed to a1ed23d Compare December 5, 2024 10:07
.build())
.forEach(MoreFutures::getDone);

// Verify table exists and has one row
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check that catalog doesn't have any files from failed attempts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean the failed table creations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the test case to ensure the files contained are from successful query execution and only

@vinay-kl vinay-kl force-pushed the databricks-create-table-concurrent-fix branch 3 times, most recently from 9ca102a to 7a9b13d Compare December 18, 2024 06:01
@vinay-kl vinay-kl force-pushed the databricks-create-table-concurrent-fix branch from 7a9b13d to c747a0b Compare December 18, 2024 08:02
@raunaqmorarka raunaqmorarka removed their request for review December 18, 2024 08:06
{
try {
fileSystem.deleteDirectory(path);
Location deltaLogDirectory = Location.of(getTransactionLogDir(tablePath.path()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code throws exception which is only logged:

2024-12-18T06:58:49.952-0600	ERROR	transaction-finishing-0	io.trino.metadata.CatalogMetadata	Connector threw exception on abort
java.lang.IllegalArgumentException: No scheme for file system location: var/folders/ns/drnmv_551mjcn3rd0ccw1tzm0000gn/T/catalog-dir5122020563593486373/tpch/test_concurrent_ctas_vnae2mtwc9/_delta_log

It causes probably CI failures.
It also visible in logs for testConcurrentCreateTableAsSelectSameTable

return null;
}).collect(toImmutableList())).forEach(MoreFutures::getDone);

// Verify table exists and has one row
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update comment

long pathMatchingSuccessfulQueryCount = (long) computeScalar("with a as (select distinct(\"$path\") as path from " + tableName + "), " +
"b as (select element_at(operation_parameters, 'queryId') as queryId from \"" + tableName + "$history\") " +
"select count(1) from a,b where a.path like '%' || b.queryId || '%'");
assertThat(pathCount).isEqualTo(pathMatchingSuccessfulQueryCount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert does not check files existence. $history and $path will return only files from succeeded queries.
You can use getTableFiles like here:

assertEventually(new Duration(5, SECONDS), () -> assertThat(getTableFiles(tableName)).isEmpty());

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector
4 participants