Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use kudu connector to query values of UNIXTIME_MICROS type, time zone didn't take effect #6744

Open
lupengfei-johnson opened this issue Jan 28, 2021 · 7 comments
Labels
bug Something isn't working syntax-needs-review

Comments

@lupengfei-johnson
Copy link

In presto-server-0.228, time zone=Asia/Shanghai , the result is
2021-01-28 11:32:41.000
2021-01-28 11:32:41.000
2021-01-28 11:32:41.000
2021-01-28 11:32:38.000
2021-01-28 11:32:38.000
2021-01-28 11:32:38.000
2021-01-28 11:31:41.000
2021-01-28 11:31:41.000
2021-01-28 11:31:41.000
2021-01-28 11:31:38.000
(10 rows)

In trino-server-351, time zone=Asia/Shanghai , the result is
2021-01-28 03:32:41.000
2021-01-28 03:32:41.000
2021-01-28 03:32:41.000
2021-01-28 03:32:38.000
2021-01-28 03:32:38.000
2021-01-28 03:32:38.000
2021-01-28 03:31:41.000
2021-01-28 03:31:41.000
2021-01-28 03:31:41.000
2021-01-28 03:31:38.000
(10 rows)

The data in kudu is UTC time, in trino-server-351 ,the result is UTC time even though I used time zone=Asia/Shanghai

@Praveen2112
Copy link
Member

Hi !! In Kudu connector UNIXTIME_MICROS is mapped to TIMESTAMP type in Trino

TIMESTAMP does not represent specific point in time, but rather a reading of a wall clock+calendar. Selecting values from TIMESTAMP column should return same result set no matter what is the client's session timezone.

I guess the result remains the same irrespective of the timezone set in the client

Ref : #37

@findepi
Copy link
Member

findepi commented Jan 28, 2021

@lupengfei-johnson if Kudu's UNIXTIME_MICROS is supposed to represent a point in time (as the name suggests, but better double check than assume), we should map it to timestamp with time zone.

@martint
Copy link
Member

martint commented Jan 28, 2021

Unfortunately, this is all Kudu documentation has to say about it:

unixtime_micros (64-bit microseconds since the Unix epoch)

It doesn't explain the semantics of the type, just its encoding. In Impala, they've mapped it to TIMESTAMP, but I've seen some discussions where people expressed similar concerns, so it may make more sense to map it to TIMESTAMP WITH TIME ZONE.

@lupengfei-johnson
Copy link
Author

@findepi I expect that the values of Kudu's UNIXTIME_MICROS is mapped to timestamp in Asia/Shanghai time zone , because our sql scripts used Asia/Shanghai time to write sql like where event_time = timestamp '2021-01-20 00:00:00' , our sql scripts work well in presto-server-0.228 , but can't work in trino-server-351, I plan to reslove it by adding a kudu.timezone configuration in etc/catalog/kudu.properties , and converting Kudu's UNIXTIME_MICROS to time at kudu.timezone when KuduRecordCursor reads data from Kudu .

@findepi
Copy link
Member

findepi commented Apr 15, 2021

IF value represents a point in time, the connector should return it as such.
From what i see above it seems Kudu UNIXTIME_MICROS is a data point, but i am certainly not a Kudu expert.

cc @lzeiming

@lzeiming
Copy link
Contributor

lzeiming commented Apr 16, 2021

We also find the timestamp value from our Kudu cluster in the query result is not compatible when we upgrade from 332 to 347
We have to keep the compatibility of our system, so we modified the Kudu connector. We invoke convertUTCToLocal of our system's timezone when reading from Kudu as Hive connector reading ORC file. We also invoke convertLocalToUTC when writing into Kudu.
I think we could add a config property like orcLegacyTimeZone in HiveConfig.

@martint
Copy link
Member

martint commented May 12, 2021

Based on the discussion here and elsewhere, it seems to me that we should map this type to timestamp with time zone instead of timestamp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment