Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure "nessie.commit.id table" property is set when updating the table #19524

Merged
merged 1 commit into from
Dec 15, 2023

Conversation

ajantha-bhat
Copy link
Member

@ajantha-bhat ajantha-bhat commented Oct 25, 2023

Description

Spark sets the table property NESSIE_COMMIT_ID_PROPERTY in NessieTableOperations#loadTableMetadata. Then NessieIcebergClient.commitTable uses this property.

In Trino, this property is never set but used in NessieIcebergClient.commitTable as it is a common code. Hence, the commit id is old and doesn't allow new commits.

Use the common code (available From Iceberg 1.4.0) NessieUtil.updateTableMetadataWithNessieSpecificProperties in Trino, which handles setting the property like "nessie.commit.id".

Additional context and related issues

Fixes #17813

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Oct 25, 2023
@github-actions github-actions bot added tests:hive iceberg Iceberg connector labels Oct 25, 2023
@ajantha-bhat ajantha-bhat marked this pull request as draft October 25, 2023 11:49
@ajantha-bhat ajantha-bhat marked this pull request as ready for review October 30, 2023 04:48
onSpark().executeQuery("CREATE TABLE " + sparkTableName + " (a INT, b INT, c INT) USING ICEBERG");
onSpark().executeQuery("INSERT INTO " + sparkTableName + " VALUES (1, 2, 3)");

onTrino().executeQuery("INSERT INTO " + trinoTableName + " VALUES (4, 5, 6)");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to fail for Nessie and there was no test case to cover this case.

@ajantha-bhat
Copy link
Member Author

cc: @findepi, @ebyhr, @findinpath

@ajantha-bhat
Copy link
Member Author

also cc: @dimas-b, @nastra

@ebyhr ebyhr removed their request for review October 31, 2023 13:03
Copy link

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch, @ajantha-bhat !

@mosabua mosabua removed their request for review November 6, 2023 20:09
@ajantha-bhat
Copy link
Member Author

@findinpath and @findepi: Can I please get a review on this? This is a small PR.

String sparkTableName = sparkTableName(baseTableName);
String trinoTableName = trinoTableName(baseTableName);

onSpark().executeQuery("CREATE TABLE " + sparkTableName + " (a INT, b INT, c INT) USING ICEBERG");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can stick to one column - there's no need to use multiple columns to test this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

String trinoTableName = trinoTableName(baseTableName);

onSpark().executeQuery("CREATE TABLE " + sparkTableName + " (a INT, b INT, c INT) USING ICEBERG");
onSpark().executeQuery("INSERT INTO " + sparkTableName + " VALUES (1, 2, 3)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to ensure that the nessie.commit.id gets updated ?

    public static String getNessieCommitId(String tableName)
    {
        String propertiesTableName = "\"" + baseTableName + "$properties\"";
        String trinoPropertiesTableName = trinoTableName(propertiesTableName);

        return (String) onTrino()
                .executeQuery("SELECT value FROM " + trinoPropertiesTableName + " WHERE key = 'nessie.commit.id'")
                .getOnlyValue();
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is common for other catalogs (groups) too. So, I didn't want to have a catalog (group) specific checks.

@ajantha-bhat ajantha-bhat changed the title Fix Trino cannot write to Nessie managed Iceberg table after spark Fix Trino cannot write to Iceberg table after spark with Nessie catalog Nov 16, 2023
Comment on lines 997 to 999
onSpark().executeQuery("INSERT INTO " + sparkTableName + " VALUES (1)");

onTrino().executeQuery("INSERT INTO " + trinoTableName + " VALUES (2)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
onSpark().executeQuery("INSERT INTO " + sparkTableName + " VALUES (1)");
onTrino().executeQuery("INSERT INTO " + trinoTableName + " VALUES (2)");
onSpark().executeQuery("INSERT INTO " + sparkTableName + " VALUES 1");
onTrino().executeQuery("INSERT INTO " + trinoTableName + " VALUES 2");

Spark sets the table property NESSIE_COMMIT_ID_PROPERTY in
NessieTableOperations#loadTableMetadata.
Then NessieIcebergClient.commitTable uses this property.

In Trino, this property is never set but used in
NessieIcebergClient.commitTable as it is a common code.
Hence, the commit id is old and doesn't allow new commits.

Use the common code (available From Iceberg 1.4.0)
NessieUtil.updateTableMetadataWithNessieSpecificProperties in Trino,
which handles setting the property like "nessie.commit.id".
@ajantha-bhat ajantha-bhat changed the title Fix Trino cannot write to Iceberg table after spark with Nessie catalog Ensure "nessie.commit.id table" property is set when updating the table Nov 21, 2023
@ajantha-bhat
Copy link
Member Author

The failed test io.trino.tests.product.hive.TestHiveTransactionalTable.testReadFullAcidPartitioned is unrelated to the change.

@electrum electrum merged commit face6d8 into trinodb:master Dec 15, 2023
47 of 49 checks passed
@github-actions github-actions bot added this to the 436 milestone Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

iceberg connector cannot perform write operations when use Nessie catalog
5 participants