Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect timestamp coercion when coercing complex column #17900

Conversation

Praveen2112
Copy link
Member

@Praveen2112 Praveen2112 commented Jun 14, 2023

Description

Previously MILLISECONDS precision was leaked when corecion applied on complex column even if coercion was not applied on timestamp column.

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
(x) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Respect timestamp precision when coercing complex column

Use == for enum comparison
Get type directly from HiveColumnHandle instead of generating them
from HiveType.
@cla-bot cla-bot bot added the cla-signed label Jun 14, 2023
@github-actions github-actions bot added hive Hive connector tests:hive labels Jun 14, 2023
@Praveen2112 Praveen2112 force-pushed the praveen/timestamp_on_coerced_complex_column branch 3 times, most recently from 93da480 to 0631344 Compare June 21, 2023 14:13
@Praveen2112 Praveen2112 force-pushed the praveen/timestamp_on_coerced_complex_column branch from 0631344 to 7cc5d79 Compare June 22, 2023 02:32
@Praveen2112 Praveen2112 changed the title Fix timestamp coercion when coercing complex column Respect timestamp coercion when coercing complex column Jun 22, 2023
@Praveen2112 Praveen2112 force-pushed the praveen/timestamp_on_coerced_complex_column branch from 7cc5d79 to fd5118e Compare June 22, 2023 05:52
@Praveen2112 Praveen2112 requested a review from phd3 June 22, 2023 05:56
"""
INSERT INTO %s
SELECT
(CAST(ROW (timestamp_value, -1) AS ROW(keep TIMESTAMP(9), si2i SMALLINT))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the point of si2i field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a non TIMESTAMP column which we are coercing so as to reproduce this issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But does the comment help here - in the next set of PRs we would be adding additional column like timestamp which will be coerced.

@@ -108,6 +129,21 @@ private static HiveTableDefinition.HiveTableDefinitionBuilder tableDefinitionBui
"STORED AS " + fileFormat);
}

private static HiveTableDefinition.HiveTableDefinitionBuilder tableDefinitionForTimestampCoercionBuilder(String fileFormat, Optional<String> recommendTableName, Optional<String> rowFormat)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe recommendTableName -> tableNamePrefix?

" timestamp_map_to_map MAP<SMALLINT, STRUCT<keep: TIMESTAMP, si2i: SMALLINT>>" +
") " +
"PARTITIONED BY (id BIGINT) " +
rowFormat.map(s -> format("ROW FORMAT %s ", s)).orElse("") +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "ROW FORMAT " + s

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to have a unified representation across this class.

Comment on lines +398 to +400
onHive().executeQuery(format("ALTER TABLE %s CHANGE COLUMN timestamp_row_to_row timestamp_row_to_row struct<keep:timestamp, si2i:int>", tableName));
onHive().executeQuery(format("ALTER TABLE %s CHANGE COLUMN timestamp_list_to_list timestamp_list_to_list array<struct<keep:timestamp, si2i:int>>", tableName));
onHive().executeQuery(format("ALTER TABLE %s CHANGE COLUMN timestamp_map_to_map timestamp_map_to_map map<int,struct<keep:timestamp, si2i:int>>", tableName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: uppercase data types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking on handling it as a follow-up once the changes related to coercion are implemented. WDYT ?

Previously MILLISECONDS precision was leaked when corecion applied on complex column even if coercion was not applied on timestamp column.
@Praveen2112 Praveen2112 force-pushed the praveen/timestamp_on_coerced_complex_column branch from fd5118e to a538773 Compare June 22, 2023 14:42
@Praveen2112
Copy link
Member Author

@krvikash AC

@Praveen2112 Praveen2112 merged commit 97153aa into trinodb:master Jun 28, 2023
@github-actions github-actions bot added this to the 421 milestone Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector
Development

Successfully merging this pull request may close these issues.

4 participants