Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TIMESTAMP behaviour does not match sql standard #7122

Closed
losipiuk opened this issue Jan 18, 2017 · 10 comments
Closed

TIMESTAMP behaviour does not match sql standard #7122

losipiuk opened this issue Jan 18, 2017 · 10 comments
Assignees

Comments

@losipiuk
Copy link
Contributor

losipiuk commented Jan 18, 2017

It seems like meaning of TIMESTAMP and TIMESTAMP WITH TIMEZONE datatypes in Presto is totally not what is specified by SQL standard (and what other databases do).

This is my understanding of SQL 2003 standard (4.6.2 Datetimes):

TIMESTAMP WITH TIMEZONE represents absolute point in time. Typically databases store it internally as seconds since epoch in some fixed timezone (usually UTC). When querying TIMESTAMP WITH TIMEZONE data the values are presented to user in session timezone (yet session timezone is used just for presentation purposes).

TIMESTAMP does not represent specific point in time, but rather a reading of a wall clock+calendar. Selecting values from TIMESTAMP column should return same result set no matter what is the client's session timezone.

While Presto semantics is different:

TIMESTAMP seems to do what TIMESTAMP WITH TIMEZONE should.

TIMESTAMP WITH TIMEZONE encodes explicit timezone information to each value stored in table. The sql standard does not define a type like that. But it does not seem very practical. Assuming that values selected from TIMESTAMP WITH TIMEZONE are presented to user in session timezone anyway, the per-row timezone information can be stripped away and all values can be stored in some arbitrary fixed timezone (e.g. UTC).

Please comment on the semantics. It seems wrong. Why the choice - as it is hard to believe that it was not done intentionally.

cc: @dain, @martint, @fiedukow, @cawallin

Edit: roadmap: #10326

@martint
Copy link
Contributor

martint commented Jan 18, 2017

Indeed. We've been talking about this for a while in our team. It was due to misinterpretation of the spec.

We need to fix it, but haven't had the time to figure out what needs to be updated and the impact.

@losipiuk
Copy link
Contributor Author

We are happy to do the work. We just have settle how to handle the backward incompatibility this change implies.
Do we we want to have some transition-phase when old semantics can be enforced with configuration property (I guess it would be tricky and ugly to implement).
Or we can just change the semantics of types and do not care about backward compatibility.

@martint
Copy link
Contributor

martint commented Jan 18, 2017

Ideally, yes, but it depends on how ugly it gets.

@losipiuk
Copy link
Contributor Author

We will look into that and try to figure out some implementation path.

@haozhun
Copy link
Contributor

haozhun commented Jan 18, 2017

@losipiuk I agree with you that Timestamp w/o TZ in Presto is broken. I do NOT agree that Timestamp w TZ should behave like Instant. I believe it should also have an associated time zone. (In other word, I believe Timestamp w TZ is implemented correctly today.) Below is an excerpt of something I wrote early last year that summarizes the current behavior and my understanding.


To summarize how things work today:

  • Timestamp w TZ = DateTime in joda = ZonedDateTime in java8
  • Timestamp w/o TZ = Instant in joda = Instant in java8.
    • In other words, Timestamp w/o TZ represents an instant in time, they just don't have a timezone associated with them.
    • When you print them out for human consumption, you need to resort to user's session time zone.
    • When you turn it into a Timestamp w TZ, you just stick that time zone to it without changing the instant.

The way I understand it

  • Timestamp w TZ = DateTime in joda = ZonedDateTime in java8
  • Timestamp w/o TZ = LocalDateTime in joda = LocalDateTime in java8.
    • In other words, Timestamp w/o TZ represents a specific Year/Month/Day/Hour/Minute/Second. But it doesn't represent a specific instant. (Side note: this concept does not need expensive representation. It can still be represented as millis/nanos since epoch, observing only chronology rules and no tz rules)
    • When you print it out, it should be printed out as is.
    • If you want to turn it into a Timestamp w TZ, you need to resort to the user's session time zone for the missing piece of information (and this is not always possible because Timestamp w TZ has gaps).

Here is the reason I believe the first understanding is inconsistent. I can only think of one possible interpretation for the other 3 concepts:

  • Time w TZ = no single class correspondence in joda = OffsetTime in java8
    • Offset and Zone is different in that only constant offset exists for OffsetX. Political zone is not allowed.
    • ZonedTime doesn't make sense and doesn't exist. ZonedDateTime and OffsetDateTime both exists.
  • Time w/o TZ = LocalTime in joda = LocalTime in java8
  • Date = LocalDate in joda = LocalDate in java8

Note here the inconsistency between interpretation of Timestamp w/o TZ and Time w/o TZ if we adopt the first interpretation of Timestamp w/o TZ (Instant vs LocalTime). Whereas under the second interpretation, it will be consistent (LocalDateTime vs LocalTime).

I went to SQL spec for the definitive answer:

  • Abbreviations
    • SV is the source value, TV is the target value
    • UTC is the UTC component of SV or TV (if and only if the source or target has time zone)
    • TZ is the timezone displacement of SV or TV (if and only if the source or target has time zone)
    • STZD is the SQL-session default time zone displacement
  • To convert Timestamp w/o TZ to Timestamp w/ TZ: TV.UTC = SV - STZD; TV.TZ = STZD
  • To convert Timestamp w/ TZ to Timestamp w/o TZ: TV = SV.UTC + SV.TZ

I believe these two rules proves that SQL spec agrees with my interpretation. Let's consider cast from Timestamp w/o TZ to Timestamp w/ TZ

  • Cast from Timestamp w/o TZ
    • Given, SV = 3600000 millis (1 hour)
  • Cast to Timestamp w TZ in MPK
    • STZD = America/Los_Angeles
    • TV.UTC = 3600000 - (-28800000) = 32400000, TV.TZ=America/Los_Angeles
    • When written out in human format, TV is: 1970-01-01 01:00:00 America/Los_Angeles, TV.UTC is 1970-01-01 09:00:00
  • Cast to Timestamp w TZ in Shanghai
    • STZD = Asia/Shanghai
    • TV.UTC = 3600000 - 28800000 = -25200000, TV.TZ=Asia/Shanghai
    • When written out in human format, TV is: 1970-01-01 01:00:00 Asia/Shanghai, TV.UTC is 1969-12-31 17:00:00

Under first interpretation, these two cast should have yield results that are equal. Under second interpretation, they would produce different result. The rule in SQL spec produces two different results.

Lastly, a side note from me. Both interpretation can produce results that is dependent on user session time zone:

  • Under current Presto interpretation of Timestamp w/o TZ:
    • Same storage representation. Let's say 60000 milliseconds.
    • Prints out as 1969-12-31 16:01:00 in MPK, and 1970-01-01 08:01:00 in China. They are different.
    • When cast to Timestamp w TZ, become 1969-12-31 16:01:00 America/Los_Angeles in MPK, and 1970-01-01 08:01:00 Asia/Shanghai in China. They are equal.
  • Under SQL spec interpretation of Timestamp w/o TZ:
    • Same storage representation. Let's say 60000 milliseconds.
    • Prints out as 1970-01-01 00:01:00 in MPK, and 1970-01-01 00:01:00 in China. They are the same.
    • When cast to Timestamp w TZ, become 1970-01-01 00:01:00 America/Los_Angeles in MPK, and 1970-01-01 00:01:00 Asia/Shanghai in China. They are different.

Under the SQL spec, cast from timestamp w/o TZ to timestamp w/ TZ can produce different results based on user time zone. As a result, I guess this cast probably should NOT have been implicit.

@haozhun
Copy link
Contributor

haozhun commented Jan 18, 2017

I'm assigning this back to @losipiuk because we still need to clarify the semantics. I just talked to @martint and @dain. They agree with my understanding.

@haozhun
Copy link
Contributor

haozhun commented Jan 18, 2017

@losipiuk, @dain, and I reached agreement:

  • Timestamp with Timezone in Presto is implemented properly today (like DateTime in joda, ZonedDateTime in jdk8).
  • Timestamp in Presto is like Instant in joda/jdk8 today. It should be like LocalDateTime in joda/jdk8.
  • Extracting hour from 2016-01-01 12:00:00 <TZ> should return 12 no matter what <TZ> is put in template.
  • As part of fixing Timestamp in Presto, we should remove implicit coercion from Timestamp to Timestamp with Timezone because the result value is environment dependent.

@fiedukow
Copy link
Member

fiedukow commented Jan 24, 2017

I'm currently working on that.

@dain
Copy link
Contributor

dain commented Jan 24, 2017

As part of this project, we will need to identify any backward incompatible behavior and document it

fiedukow added a commit to Teradata/presto that referenced this issue Feb 15, 2017
This is flag for switching between legacy TIME/TIMESTAMP semantic
and new one beeing aligned with ANSI SQL. See: prestodb#7122
fiedukow added a commit to Teradata/presto that referenced this issue Feb 15, 2017
This is flag for switching between legacy TIME/TIMESTAMP semantic
and new one beeing aligned with ANSI SQL. See: prestodb#7122
fiedukow added a commit to Teradata/presto that referenced this issue Feb 16, 2017
This is flag for switching between legacy TIME/TIMESTAMP semantic
and new one beeing aligned with ANSI SQL. See: prestodb#7122
fiedukow added a commit to Teradata/presto that referenced this issue Feb 20, 2017
This is flag for switching between legacy TIME/TIMESTAMP semantic
and new one being aligned with ANSI SQL. See: prestodb#7122
fiedukow added a commit to Teradata/presto that referenced this issue Feb 21, 2017
This is flag for switching between legacy TIME/TIMESTAMP semantic
and new one being aligned with ANSI SQL. See: prestodb#7122
fiedukow added a commit to Teradata/presto that referenced this issue Feb 21, 2017
This is flag for switching between legacy TIME/TIMESTAMP semantic
and new one being aligned with ANSI SQL. See: prestodb#7122
losipiuk pushed a commit to losipiuk/prestodb that referenced this issue Nov 13, 2017
This is flag for switching between legacy TIME/TIMESTAMP semantic
and new one being aligned with ANSI SQL. See: prestodb#7122
losipiuk pushed a commit to starburstdata/facebook-presto that referenced this issue Nov 23, 2017
This is flag for switching between legacy TIME/TIMESTAMP semantic
and new one being aligned with ANSI SQL. See: prestodb#7122
losipiuk pushed a commit to losipiuk/prestodb that referenced this issue Nov 27, 2017
This is flag for switching between legacy TIME/TIMESTAMP semantic
and new one being aligned with ANSI SQL. See: prestodb#7122
losipiuk pushed a commit to starburstdata/facebook-presto that referenced this issue Dec 5, 2017
This is flag for switching between legacy TIME/TIMESTAMP semantic
and new one being aligned with ANSI SQL. See: prestodb#7122
losipiuk pushed a commit to losipiuk/prestodb that referenced this issue Dec 22, 2017
This is flag for switching between legacy TIME/TIMESTAMP semantic
and new one being aligned with ANSI SQL. See: prestodb#7122
losipiuk pushed a commit that referenced this issue Dec 22, 2017
This is flag for switching between legacy TIME/TIMESTAMP semantic
and new one being aligned with ANSI SQL. See: #7122
tongboh pushed a commit to checkr/presto that referenced this issue Apr 9, 2018
This is flag for switching between legacy TIME/TIMESTAMP semantic
and new one being aligned with ANSI SQL. See: prestodb#7122
@findepi findepi closed this as completed Jan 23, 2019
facebook-github-bot pushed a commit to facebookincubator/velox that referenced this issue Mar 23, 2022
Summary:
Pull Request resolved: #1230

From discussions in [github issue](prestodb/presto#7122 (comment)), it says "Extracting hour from 2016-01-01 12:00:00 <TZ> should return 12 no matter what <TZ> is put in template.".  This diff implement that behavior.

Reviewed By: kagamiori

Differential Revision: D34944701

fbshipit-source-id: 4f9bad2de01432dce0ae158f84462a994ff5ddcc
shiyu-bytedance pushed a commit to shiyu-bytedance/velox-1 that referenced this issue Aug 18, 2022
…incubator#1230)

Summary:
Pull Request resolved: facebookincubator#1230

From discussions in [github issue](prestodb/presto#7122 (comment)), it says "Extracting hour from 2016-01-01 12:00:00 <TZ> should return 12 no matter what <TZ> is put in template.".  This diff implement that behavior.

Reviewed By: kagamiori

Differential Revision: D34944701

fbshipit-source-id: 4f9bad2de01432dce0ae158f84462a994ff5ddcc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants