-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use and record writer time zone in ORC files #212
Conversation
Can you share rationale for not using UTC when writing? For timestamp values, using some other zone than UTC may mean that some values cannot be represented, right? |
Could you give an example? I don't think there are gaps in time in terms of missing milliseconds from relative epoch. |
We should switch in the future, but if we did it now, older versions of Presto would return the wrong answer for these files (if
The time zone is for the epoch, not the timestamp itself. The only potential (already existing) problem is if the epoch ( |
4fe3800
to
1cb8879
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment, but otherwise looks good
presto-orc/src/main/java/io/prestosql/orc/metadata/OrcMetadataWriter.java
Outdated
Show resolved
Hide resolved
Early versions of the Apache ORC writer made the mistake of recording timestamps from an epoch that was relative to the time zone of the writer. This was fixed in later versions by recording the writer time zone in the stripe footer. Hive 3.1 always writes using UTC. Presto used a global configuration for the writer time zone, which was needed to handle old files, but was never updated to use the time zone from the stripe footer. On read, Presto now uses the stripe value if present, otherwise it uses the configured value. On write, Presto continues to write timestamps using the configured time zone, but now records this value when writing files.
1cb8879
to
f086b52
Compare
Early versions of the Apache ORC writer made the mistake of recording
timestamps from an epoch that was relative to the time zone of the
writer. This was fixed in later versions by recording the writer time
zone in the stripe footer. Hive 3.1 always writes using UTC.
Presto used a global configuration for the writer time zone, which
was needed to handle old files, but was never updated to use the time
zone from the stripe footer.
On read, Presto now uses the stripe value if present, otherwise it
uses the configured value. On write, Presto continues to write
timestamps using the configured time zone, but now records this value
when writing files.