Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support writing timestamp with time zone type on partitioned column in Delta #18353

Merged
merged 1 commit into from
Jul 21, 2023

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Jul 20, 2023

Description

Fix #16822

Release notes

(x) Release notes are required, with the following suggested text:

# Delta Lake
* Add support for writing `timestamp with time zone` type on partitioned columns. ({issue}`16822`)

@@ -334,6 +338,10 @@ public static Object getField(DateTimeZone localZone, Type type, Block block, in
if (type instanceof TimestampType timestampType) {
return getHiveTimestamp(localZone, timestampType, block, position);
}
if (type instanceof TimestampWithTimeZoneType) {
checkArgument(type.equals(TIMESTAMP_TZ_MILLIS));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good functionally from my perspective.

Code wise, I'm concerned about mixing Delta Lake specific code within HiveWriteUtils.

Do you see anything against having DeltaLakeWriteUtils with a slight handling for timestamp tz and call HiveWriterUtils.getField ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We call here from io.trino.plugin.deltalake.AbstractDeltaLakePageSink#createPartitionValues.
For example i am not convinced this is the appropriate check for Delta

if (!CharMatcher.inRange((char) 0x20, (char) 0x7E).matchesAllOf(valueString)) {
throw new TrinoException(HIVE_INVALID_PARTITION_VALUE,
"Hive partition keys can only contain printable ASCII characters (0x20 - 0x7E). Invalid value: " +
base16().withSeparator(" ", 2).encode(valueString.getBytes(UTF_8)));
}

in delta context the value gets used to

  • make partition-like path
  • write to addEntry json (right?)

so the logic could indeed be Delta-specific.

AFAIAC, this can go follow-up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will send a follow-up PR.

@ebyhr ebyhr requested a review from findepi July 20, 2023 14:13
@@ -334,6 +338,10 @@ public static Object getField(DateTimeZone localZone, Type type, Block block, in
if (type instanceof TimestampType timestampType) {
return getHiveTimestamp(localZone, timestampType, block, position);
}
if (type instanceof TimestampWithTimeZoneType) {
checkArgument(type.equals(TIMESTAMP_TZ_MILLIS));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We call here from io.trino.plugin.deltalake.AbstractDeltaLakePageSink#createPartitionValues.
For example i am not convinced this is the appropriate check for Delta

if (!CharMatcher.inRange((char) 0x20, (char) 0x7E).matchesAllOf(valueString)) {
throw new TrinoException(HIVE_INVALID_PARTITION_VALUE,
"Hive partition keys can only contain printable ASCII characters (0x20 - 0x7E). Invalid value: " +
base16().withSeparator(" ", 2).encode(valueString.getBytes(UTF_8)));
}

in delta context the value gets used to

  • make partition-like path
  • write to addEntry json (right?)

so the logic could indeed be Delta-specific.

AFAIAC, this can go follow-up

"(null, null, null, null, 4.0, null, null)");

assertThat((String) computeScalar("SELECT \"$path\" FROM " + tableName + " WHERE id = 1"))
.contains("/part=__HIVE_DEFAULT_PARTITION__/");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "HIVE_DEFAULT_PARTITION" also the Delta's convention for paths for null part key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's verified in a product test:

assertThat(onTrino().executeQuery("SELECT \"$path\" FROM delta.default." + tableName + " WHERE a_number IS NULL").column(1))
.hasSize(2)
.allMatch(path -> ((String) path).contains("/a_number=__HIVE_DEFAULT_PARTITION__/"));

@ebyhr ebyhr merged commit d7fcc65 into trinodb:master Jul 21, 2023
@ebyhr ebyhr deleted the ebi/delta-timestamptz-partition branch July 21, 2023 10:37
@github-actions github-actions bot added this to the 423 milestone Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector hive Hive connector
Development

Successfully merging this pull request may close these issues.

Delta Lake connector can partition on timestamp with time zone type, but can't insert rows
3 participants