Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for variable precision timestamps in Hive connector (read path) #4953

Closed
wants to merge 5 commits into from

Conversation

aalbu
Copy link
Member

@aalbu aalbu commented Aug 24, 2020

Hive represents timestamps with nanosecond precision, but Presto currently maps them to timestamp(3) columns. This PR introduces the ability to specify the precision of timestamp columns in Presto. It can be done through a catalog or session property. This is not yet supported for write operations - they must use the default precision (3).

relates to: #3977

@cla-bot cla-bot bot added the cla-signed label Aug 24, 2020
@aalbu aalbu force-pushed the hive-timestamp-nanos branch 3 times, most recently from bda3d8d to 3e1c765 Compare August 24, 2020 14:25
@findepi findepi requested a review from martint August 24, 2020 21:34

private long shortTimestamp(Timestamp value, Type type)
{
//noinspection unchecked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the annotation version

@aalbu aalbu force-pushed the hive-timestamp-nanos branch 7 times, most recently from 46ab785 to a910574 Compare September 9, 2020 22:43
@aalbu aalbu requested a review from findepi September 10, 2020 01:37
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(still at skimming)

Comment on lines +51 to +52
int nanosOfSecond = (int) round(decodedTimestamp.getNanosOfSecond(), 9 - type.getPrecision());
return micros + nanosOfSecond / NANOSECONDS_PER_MICROSECOND;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still struggling to accept rounding as the correct behavior... cc @martint

@aalbu do you have test cases covering round up/down behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some in the product test.

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great!

@findepi
Copy link
Member

findepi commented Sep 15, 2020

@aalbu you have merge conflict so you need to rebase in order for CI to run next time

please rebase and apply comments separately (eg fixups) for easier re-rev

@aalbu aalbu force-pushed the hive-timestamp-nanos branch 2 times, most recently from cdd8016 to 8357ae5 Compare September 17, 2020 04:40
@aalbu aalbu force-pushed the hive-timestamp-nanos branch from 8357ae5 to 1f1df50 Compare September 17, 2020 20:50
@aalbu aalbu force-pushed the hive-timestamp-nanos branch from 1f1df50 to 6d3e022 Compare September 18, 2020 01:33
@aalbu aalbu force-pushed the hive-timestamp-nanos branch 2 times, most recently from c3f5917 to 6a212d3 Compare September 18, 2020 16:32
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%

@@ -254,20 +254,23 @@ else if (type instanceof DateType && columnStatistics.getDateStatistics() != nul
return createDomain(type, hasNullValue, columnStatistics.getDateStatistics(), value -> (long) value);
}
else if ((type.equals(TIMESTAMP_MILLIS) || type.equals(TIMESTAMP_MICROS)) && columnStatistics.getTimestampStatistics() != null) {
// ORC timestamp statistics are truncated to millisecond precision, regardless of the precision of the actual data column.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a regression test covering this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestHiveStorageFormats has a test that verifies the behavior is correct when doing predicate push-down.

@aalbu aalbu force-pushed the hive-timestamp-nanos branch 2 times, most recently from b745c68 to 6a24566 Compare September 23, 2020 01:02
@aalbu aalbu force-pushed the hive-timestamp-nanos branch from 6a24566 to 413c1b8 Compare September 23, 2020 02:28
@findepi
Copy link
Member

findepi commented Sep 23, 2020

Squashed and merged as 86fcd1f, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants