Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Delta Lake insertion issue by setting UTC timezone for timestamp with timezone partition column #16878

Closed
wants to merge 3 commits into from

Conversation

albericgenius
Copy link
Contributor

…with time zone type

Description

Fix #16822

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Apr 4, 2023
@github-actions github-actions bot added delta-lake Delta Lake connector hive Hive connector tests:hive labels Apr 4, 2023
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add both normal test and product test.

@albericgenius
Copy link
Contributor Author

Please add both normal test and product test.

Sure, still working on this.

@albericgenius albericgenius force-pushed the deltalake branch 2 times, most recently from 4cc4d47 to 3456fc5 Compare April 6, 2023 11:12
@albericgenius albericgenius changed the title Fix fail to insert for delta lake, when partition columnis timestamp … Fix Delta Lake insertion issue by setting UTC timezone for timestamp with timezone partition column Apr 8, 2023
@albericgenius
Copy link
Contributor Author

@ebyhr and @findinpath Welcome for any comments

@albericgenius
Copy link
Contributor Author

@ebyhr and @findinpath I guess this is close to getting merged :)

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash commits into one.

Comment on lines +186 to +187
if (position.getErrorIndex() == -1 && timestamp.length() == position.getIndex()) {
if (accessor.isSupported(NANO_OF_SECOND)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These conditions look hard to understand for me. Could you leave a code comment with the example?


public final class DeltaLakeWriteUtils
{
private DeltaLakeWriteUtils() {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference from HiveWriteUtils? Please leave a code comment.

String tableName = "test_dl_partitioned_insert_timestampTZ" + randomNameSuffix();
onTrino().executeQuery("" +
"CREATE TABLE delta.default." + tableName +
" (c1 INT, c2 TIMESTAMP WITH TIME ZONE)" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for partitioned by nested type with timestamp with time zone.

Comment on lines +146 to +147
onTrino().executeQuery("INSERT INTO delta.default." + tableName + " VALUES (1, TIMESTAMP '2023-04-05 10:00:00.666+01:00')");
onTrino().executeQuery("INSERT INTO delta.default." + tableName + " VALUES (2, TIMESTAMP '2023-04-06 10:00:00+01:00')");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: No need to split into two INSERT statements.

row(
2,
Timestamp.valueOf("2023-04-06 09:00:00")));
assertThat(onTrino().executeQuery("SELECT c1, CAST(c2 AS TIMESTAMP) FROM delta.default." + tableName + " ORDER BY c1 ASC"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redundant ORDER BY

@@ -130,6 +131,37 @@ public void testPartitionedInsertCompatibility()
}
}

@Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS})
public void testPartitionedInsertTimestampTZCompatibility()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void testPartitionedInsertTimestampTZCompatibility()
public void testPartitionedInsertTimestampWithTimeZoneCompatibility()

return partitionValues.build();
}

public static Object getField(DateTimeZone localZone, Type type, Block block, int position)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All types in this method are required in Delta Lake connector?

Comment on lines +189 to +193
else {
int picosOfMilli = block.getInt(position, SIZE_OF_LONG);
Instant instant = Instant.ofEpochMilli(millisUtc).plusNanos(picosOfMilli * 1000);
return new TimestampTZ(instant.getEpochSecond(), instant.getNano(), UTC_KEY.getZoneId());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this condition used in Delta Lake connector?

@ebyhr
Copy link
Member

ebyhr commented Jul 13, 2023

@albericgenius Are you still working on this?

@ebyhr
Copy link
Member

ebyhr commented Jul 20, 2023

Superseded by #18353

@ebyhr ebyhr closed this Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector hive Hive connector
Development

Successfully merging this pull request may close these issues.

Delta Lake connector can partition on timestamp with time zone type, but can't insert rows
3 participants