Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timestamp column with Parquet format do not compatible with Hive #11867

Closed
maxwellzdm opened this issue Nov 7, 2018 · 6 comments
Closed

timestamp column with Parquet format do not compatible with Hive #11867

maxwellzdm opened this issue Nov 7, 2018 · 6 comments
Labels

Comments

@maxwellzdm
Copy link

maxwellzdm commented Nov 7, 2018

Steps to reproduce:

  1. Set up Presto with Hive connector

  2. Create Table a partitioned table with parquet format
    create table tmp.presto_test_parquet_v2 ( event_time timestamp, event_time_str varchar, dt varchar) with ( partitioned_by = array['dt'], format = 'parquet')

  3. insert data with current timestamp
    insert into tmp.presto_test_parquet_v2 values ( localtimestamp , cast(localtimestamp as varchar), '0');

  4. select from Presto

presto:xxx> select * from tmp.presto_test_parquet_v2;
event_time | event_time_str | dt
-------------------------+-------------------------+----
2018-11-07 14:34:37.453 | 2018-11-07 14:34:37.453 | 0
(1 row)

But When I select data from Hive, I got data with wrong timezone

hive> select * from tmp.presto_test_parquet_v2;
OK
presto_test_parquet_v2.event_time presto_test_parquet_v2.event_time_str presto_test_parquet_v2.dt
2018-11-07 22:34:37.453 2018-11-07 14:34:37.453 0
Time taken: 0.107 seconds, Fetched: 1 row(s)

Presto version: 0.212
Hive version: 1.1.0-cdh5.13.3

@findepi
Copy link
Contributor

findepi commented Nov 7, 2018

This is a known issue. See #7122 and the roadmap linked there.
You may want to try Presto 0.208e with set session legacy_timestamp = false;. 208e contains experimental fixes for timestamp semantics in Hive connector.
Please let me know how does it work for you.

@maxwellzdm
Copy link
Author

Sorry, I didn't find any official release of 0.208e. Do you mean I need to try 0.208e for Starburst Distribution of Presto?

@findepi
Copy link
Contributor

findepi commented Nov 8, 2018

Apologies for being to brief. Yes, this Starburst Presto 0.208e is what i meant.
And the Hive connector changes i mentioned are in https://github.com/starburstdata/presto/blob/epic/timestampp/6-candidate/presto-hive/src/main/java/com/facebook/presto/hive/TimestampRewriter.java

@maxwellzdm please note that i view this patch as still in experimental phase (therefore you need to enable it explicitly with set session legacy_timestamp = false;). It may or may not work for you. I would love to hear feedback from you.

@maxwellzdm
Copy link
Author

@findepi I've tried

presto:xxxx> set session legacy_timestamp=false;
SET SESSION

But It still seems to be not work.
And I found that the result of the show session does not contains legacy_timestamp flag.

@maxwellzdm
Copy link
Author

By the way, this issue only occurs when I insert into a Hive table with parquet format.
For text formatted Hive Table, everything goes fine.

@stale
Copy link

stale bot commented Nov 7, 2020

This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.

@stale stale bot added the stale label Nov 7, 2020
@stale stale bot closed this as completed Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants