Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TIME/TIMESTAMP W/O TIME ZONE semantics fix (#7122) #7378

Closed
wants to merge 44 commits into from

Conversation

fiedukow
Copy link
Member

@fiedukow fiedukow commented Feb 15, 2017

This is first peek at #7122 solution, full documentation will be provided.
Commits are kept small to show all changes, but as always squashes are welcome.

First three commits are kinda separate from PR, if you'll insist I can separate that, but I would say it has to be fixed anyway.

@fiedukow fiedukow requested a review from findepi February 15, 2017 15:10
@fiedukow fiedukow changed the title TIME/TIMESTAMP W/O TIME ZONE semantic fix TIME/TIMESTAMP W/O TIME ZONE semantic fix (#7122) Feb 15, 2017
@electrum
Copy link
Contributor

electrum commented Feb 15, 2017 via email

ISOChronology chronology = getChronology(session.getTimeZoneKey());
return utcMillis - chronology.getZone().getOffset(utcMillis);
// date is encoded as milliseconds at midnight in UTC
// convert to midnight if the session timezone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/if/in

Copy link
Contributor

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might need to reduce verbosity... first cmt digested.

public SqlTimeWithTimeZone(long timeWithTimeZone)
{
millisUtc = unpackMillisUtc(timeWithTimeZone);
if (millisUtc > MAX_MILLIS_TIME) {
throw new RuntimeException("millisUtc must be time in millisecond that past since midnight UTC.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • s/past/passed/ ?
  • I wouldn't use trailing dot in the msg
  • consider including the millisUtc value in msg

@@ -16,30 +16,64 @@
import com.fasterxml.jackson.annotation.JsonValue;

import java.time.Instant;
import java.time.LocalTime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cmt msg

  • s/Instance.atZone/Instant.ofEpochMilli(..).atZone
  • s/will always use TZ/\0 offset/
  • s/chaning/changing
  • s/missunderstanding/misunderstanding

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmt msg cont:

  • remove "ID" from the msg, i.e.: "Fix TZ issue in TIME WITH TZ for TZ ID changing offset over time"

import java.time.format.DateTimeFormatter;
import java.util.Objects;
import java.util.TimeZone;

import static com.facebook.presto.spi.type.DateTimeEncoding.unpackMillisUtc;
import static com.facebook.presto.spi.type.DateTimeEncoding.unpackZoneKey;

/**
* This type represent instant of time in some day and timezone, where timezone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

represents

public final class SqlTimeWithTimeZone
{
private static final DateTimeFormatter formatter = DateTimeFormatter.ofPattern("HH:mm:ss.SSS VV");

/**
* As we dont have negative timestamps and max value is midnight in timezone that is ~(UTC -12:00),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"midnight" might be interpreted as time 00:00 where 23:59:59.(9) should probably be understood.

Isn't max value in ~ UTC +12 (~easternmost time zone)?

s/dont/don't

Copy link
Member Author

@fiedukow fiedukow Feb 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With UTC +12, 23:00:00 millisUTC equals 1000*60*60*(23-12) = x
With UTC -12, 23:00:00 millisUTC equals 1000*60*60*(23+12) = y
y > x, or am I wrong at some point?

Also, the part about negative millisUtc was wrong. I removed that.

private static final long MAX_MILLIS_TIME = 2 * 24 * 60 * 60 * 1000;

/**
* Milliseconds relative to midnight UTC of the given time (millisUtc < MAX_MILLIS_TIME).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is MIN_MILLIS_TIME ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~(-MAX_MILLIS_TIME), do you think we should double sanity checks here and check that as well? I thought that MAX is enough sanity as representing dates before 1970-01-01 in timestamp is rare and I didn't seen this constraint broken in Presto codebase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8bd8ff87b4e3db7423bfd92f2ed65f504beafd60

private final long millisUtc;
private final TimeZoneKey timeZoneKey;

/**
* FIXME This is used only to determine last of TZ shifts for TIME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me neither ;)

private final long millisUtc;
private final TimeZoneKey timeZoneKey;

/**
* FIXME This is used only to determine last of TZ shifts for TIME
* This is workaround to avoid chaning semantic of millisUTC being part of SPI
Copy link
Contributor

@findepi findepi Feb 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing

* to +5:45, without usage of this timestamp, we would use TZ shift valid
* on 1970-01-01, which is invalid today.
*/
private final long timezoneOffsetReferenceTimestampUtc = System.currentTimeMillis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When SqlTimeWithTimeZone objects are instantiated? Does keeping this on a field give any more stable results than consulting System.currentTimeMillis whenever current time zone offset is to be queried?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really was only thinking about making this more static (like having singleton for all Presto that will just provide that value). Just to avoid potential change in some weird scenarios.

Think about that, if we would call System.currentTimeMillis in toString, we would have DTO with toString (potentially) morphing over time.

I'm strongly against making this calculation later and more "dynamic", I would rather talk about making it earlier and more "static".

// FIXME This is hack that we need to use as the timezone interpretation depends on date (not only on time)
// Additionally we cannot just grab current TZ offset from timezoneOffsetReferenceTimestampUtc and use it
// as our formatting library will fail to display expected TZ symbol.
long currentMillisOfDay =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"currentMillisOfDayInZone" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what zone? This is UTC, right?


Instant utcInstantInCurrentDay = Instant.ofEpochMilli(timeMillisUtcInCurrentDay);
ZonedDateTime zonedDateTime = utcInstantInCurrentDay.atZone(ZoneId.of(timeZoneKey.getId()));
return zonedDateTime.format(formatter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, looks correct! Do we have a test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots and lots of test, sadly mostly indirect :(
I think we could add some testing to that, especially checking if this is aligned with TIME WITH TIME ZONE to VARCHAR cast, which is expected but not guaranteed by the code.

I'll add that to TODO for later or escalating to further tickets - depending on circumstances.

long currentMillisOfDay =
LocalTime.from(
Instant.ofEpochMilli(timezoneOffsetReferenceTimestampUtc)
.atZone(ZoneId.of(TimeZoneKey.UTC_KEY.getId())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZoneOffset.UTC could be used here?

LocalTime.from(
Instant.ofEpochMilli(timezoneOffsetReferenceTimestampUtc)
.atZone(ZoneId.of(TimeZoneKey.UTC_KEY.getId())))
.toNanoOfDay() / 1_000_000L; // nanos to millis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you do here in those several lines, seems equivalent to "timezoneOffsetReferenceTimestampUtc % (86400 * 1000)". See "Java Time-Scale" @ https://docs.oracle.com/javase/8/docs/api/java/time/Instant.html

Or, as you wrote in DTF.currentTime: UTC_CHRONOLOGY.millisOfDay().get(timezoneOffsetReferenceTimestampUtc).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timezoneOffsetReferenceTimestampUtc % (86400 * 1000) - I know that is correct, but this code is hard to understand without weird magic constants.
As for other proposition this would require to to introduce Joda to SPI and as far as we understand we avoid introducing libraries to SPI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sure, Joda dep is not cool (especially that I'd like to see Joda go away in some time:), so what about:

/*
 * Equivalent to using Joda: UTC_CHRONOLOGY.millisOfDay().get(timezoneOffsetReferenceTimestampUtc)
 */
long millisInADay = 86400 * 1000; // or TimeUnit.DAYS.toMillis(1) if you wish
currentMillisOfDay = timezoneOffsetReferenceTimestampUtc % millisInADay;

?

By naming this constant (and adding an explanatory comment), you deprive it of any magic it had.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, Joda-based code can be transliterated to java.time this way:

currentMillisOfDay = ChronoField.MILLI_OF_DAY.getFrom(
    Instant.ofEpochMilli(timezoneOffsetReferenceTimestampUtc).atZone(ZoneOffset.UTC))

no constants involved this time.

// but the start of the day is relative to the current time zone.
long millis = getChronology(session.getTimeZoneKey()).millisOfDay().get(session.getStartTime());
// We do all calculation in UTC, as session.getStartTime() is in UTC
// and we need to have UTC milis for packDateTimeWithZone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: millis

@@ -132,7 +131,8 @@ public static long currentTimestamp(ConnectorSession session)
@SqlType(StandardTypes.TIMESTAMP)
public static long localTimestamp(ConnectorSession session)
{
return session.getStartTime();
ISOChronology localChronology = getChronology(session.getTimeZoneKey());
return localChronology.getZone().convertUTCToLocal(session.getStartTime());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope you're aware that testLocalTimestamp doesn't pass at this commit. I assume your strategy is not to be green at every step.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just messed with that to much while rebasing I will check that.

@@ -119,8 +120,13 @@ public void testCurrentDateTimezone()
throws Exception
{
TimeZoneKey kievTimeZoneKey = getTimeZoneKey("Europe/Kiev");
for (long instant = new DateTime(2000, 6, 15, 0, 0).getMillis(); instant < new DateTime(2016, 6, 15, 0, 0).getMillis(); instant += TimeUnit.HOURS.toMillis(1)) {
TimeZoneKey montrealTimeZoneKey = getTimeZoneKey("America/Montreal");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment why adding Montreal here increases cases coverage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't comment in commit message enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure it is. I missed that

TimeZoneKey montrealTimeZoneKey = getTimeZoneKey("America/Montreal");
// We expect UTC millis later on so we have to use UTC chronology
for (long instant = ISOChronology.getInstanceUTC().getDateTimeMillis(2000, 6, 15, 0, 0, 0, 0);
instant < ISOChronology.getInstanceUTC().getDateTimeMillis(2016, 6, 15, 0, 0, 0, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While at it... Test method is usually executed once in JVM lifetime. Can we expect JVM to pull ISOChronology.getInstanceUTC().getDateTimeMillis(2016, 6, 15, 0, 0, 0, 0) "constant" before the loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we try to optimise tests to work better with JVM now?

// We expect UTC millis later on so we have to use UTC chronology
for (long instant = ISOChronology.getInstanceUTC().getDateTimeMillis(2000, 6, 15, 0, 0, 0, 0);
instant < ISOChronology.getInstanceUTC().getDateTimeMillis(2016, 6, 15, 0, 0, 0, 0);
instant += TimeUnit.HOURS.toMillis(1)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would our testing be better if we checked not only whole (UTC) hours? Like with instant += 57 minutes ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it 53 as primes are fancier :P

@@ -69,6 +69,7 @@
public static final String REORDER_WINDOWS = "reorder_windows";
public static final String ITERATIVE_OPTIMIZER = "iterative_optimizer_enabled";
public static final String EXCHANGE_COMPRESSION = "exchange_compression";
public static final String LEGACY_TIMESTAMP = "legacy_timestamp";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmt msg s/beeing/being/

false),
booleanSessionProperty(
LEGACY_TIMESTAMP,
"Use legacy TIMESTAMP, TIME & DATE semantic",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

semantics

public static boolean isLegacyTimestamp(Session session)
{
return session.getSystemProperty(LEGACY_TIMESTAMP, Boolean.class);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add \n

@@ -134,7 +134,7 @@ public void testCurrentDateTimezone()
private static void assertCurrentDateAtInstant(TimeZoneKey timeZoneKey, long instant)
{
long expectedDays = epochDaysInZone(timeZoneKey, instant);
long dateTimeCalculation = currentDate(new TestingConnectorSession("test", timeZoneKey, US, instant, ImmutableList.of(), ImmutableMap.of()));
long dateTimeCalculation = currentDate(new TestingConnectorSession("test", timeZoneKey, US, instant, ImmutableList.of(), ImmutableMap.of(), true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider assigning the true to a variable first, so that it has a name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion but as it's just a test so I think it's better to keep it short.

TimeZoneKey timestampZone,
ConnectorSession session)
{
if (session.isLegacyTimestamp()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really reviewing this yet, but...

"Introduce sqlTime(stamp)Of method to avoid direct constructor calls" cmt msg suggests a refactor and here you seem to add logic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that new msg is fine for you ;)

@findepi findepi changed the title TIME/TIMESTAMP W/O TIME ZONE semantic fix (#7122) TIME/TIMESTAMP W/O TIME ZONE semantics fix (#7122) Feb 16, 2017
@fiedukow fiedukow force-pushed the af/7122/1 branch 5 times, most recently from 8c69bf8 to 5a351fe Compare February 16, 2017 16:32

Instant utcInstantInCurrentDay = Instant.ofEpochMilli(timeMillisUtcInCurrentDay);
ZonedDateTime zonedDateTime = utcInstantInCurrentDay.atZone(ZoneId.of(timeZoneKey.getId()));
return zonedDateTime.format(formatter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing. While range of allowed values is > 24h (48h? infinite?), toString() output only within 24h. That means two different values (and different by 24h or more) will be indistinguishable. :/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope it's more clear after 8bd8ff87b4e3db7423bfd92f2ed65f504beafd60 :-)

@@ -19,30 +19,47 @@
import java.time.ZoneId;
import java.time.format.DateTimeFormatter;
import java.util.Objects;
import java.util.Optional;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmt msg ("Introduce new TIME/TIMESTAMP semantic to SqlTypes")

  • semantics
  • SqlTypes or SqlTime or "SQL types" ?

{
return sessionTimeZoneKey;
}

@Deprecated
private boolean isDeprecatedTimestamp()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isDeprecatedTime stamp?, usesDeprecatedTimeSemantics?

Copy link
Member Author

@fiedukow fiedukow Feb 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this name is not perfect in this (and many other) contexts.
I would still keep this name aligned across the code as it will be easier to remove all those functions and variables when we decide to drop backward compatibility.

@@ -19,30 +19,47 @@
import java.time.ZoneId;
import java.time.format.DateTimeFormatter;
import java.util.Objects;
import java.util.Optional;

import static com.facebook.presto.spi.type.TimeZoneKey.UTC_KEY;

public final class SqlTime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While at it... consider adding a javadoc, like there is in SqlTimeWithTimeZone. E.g. you might likewise explain range of possible values for the millisUtc field.

* is known and day is not.
*/
public final class SqlTimeWithTimeZone
{
private static final DateTimeFormatter formatter = DateTimeFormatter.ofPattern("HH:mm:ss.SSS VV");

/**
* As we dont have negative timestamps and max value is midnight in timezone that is ~(UTC -12:00),
* just not to care about precision we do use ~(UTC -24:00).
* We know that max allowed value is midnight in timezone that is ~(UTC -12:00)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still... "midnight" means 00:00:00 or ~23:59:59 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8bd8ff87b4e3db7423bfd92f2ed65f504beafd60

return Instant.ofEpochMilli(millisUtc).atZone(ZoneId.of(sessionTimeZoneKey.get().getId())).format(formatter);
}
else {
// FIXME this should be exact equivalent of TIMESTAMP to VARCHAR cast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its just a note at this point.

}
else {
// FIXME this should be exact equivalent of TIMESTAMP to VARCHAR cast
return Instant.ofEpochMilli(millisUtc).atZone(ZoneId.of(UTC_KEY.getId())).format(formatter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this works, Instant.ofEpochMilli(millisUtc) creates an instant in time, which is "over interpretation" of the millisUtc which stores "millis of the day". Would

LocalTime.ofNanoOfDay(...).format(formatter)

do?

(If not, use ZoneOffset.UTC ...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About this over interpretation.

Semantically, TIMESTAMP W/O TZ (as well as TIME W/O TZ) is just set of integers YEAR (1), MONTH (2), DAY (3), HOUR (4), MINUTE (5), SECOND (6), MILLIS (7).
Our representation encodes that fields into single long that can be interpreted in following way: "If you interpret that long as Instance (epoch in millis), than UTC wall time of this instance (1-7) gives you exact value of (1-7) fields.". This kind of encoding is weird but as far as I understand should work perfectly fine, as each value of such long maps to exactly one UTC wall time and each wall time maps to exactly one value of this representation. Also maths on that is easy for fields (4-7) and it's hard for (1-3) but can be easily done using Instance class.

This is why I think it's perfectly fine (in fact is is the only valid way for fields (1-3)) to use this value as Instance in UTC. For fields (4-7), there is choice to make, and we can use Instance logic or simple duration logic (including basic division/modulo operations) as we know that each day contains exactly 1000*60*60*24 milliseconds.

I use both approaches. When working directly on long value I use TimeUnits for converting values (4-7). I use Instance for everything else as we did use that before and I think that having less changes is better in such huge refactor.

I think that LocalTime is used only in some tests at this point, as you helped me with eliminating that from SqlTimeWithTimeZone.toString(), so I would just not add yet another class to understand our codebase (even though it's class closest to our semantic).

Maybe it is worth considering to drop usage of Instance in all TIME/TIMESTAMP W/O TZ contexts. This is huge change though.

Please let me know if you agree with that approach, as it will (should be) visible across all changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about leap seconds. Isn't modulo approach broken if we take them into consideration?

Copy link
Member Author

@fiedukow fiedukow Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.oracle.com/javase/8/docs/api/java/time/Instant.html
Also SQL standard does leave leap second support to implementation so using Java approach seems fine for us

@fiedukow fiedukow force-pushed the af/7122/1 branch 2 times, most recently from c5bedea to adaef11 Compare February 17, 2017 13:31
return new SqlTimestamp(new DateTime(year, monthOfYear, dayOfMonth, hourOfDay, minuteOfHour, secondOfMinute, millisOfSecond, baseZone).getMillis(), timestampZone);
}
else {
return new SqlTimestamp(new DateTime(year, monthOfYear, dayOfMonth, hourOfDay, minuteOfHour, secondOfMinute, millisOfSecond, getDateTimeZone(UTC_KEY)).getMillis());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/getDateTimeZone(UTC_KEY)/DateTimeZone.UTC/ ?

@@ -548,4 +552,52 @@ else if (parsePos < 0) {
return ~bestInvalidPos;
}
}

public static SqlTimestamp sqlTimestampOf(int year,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is used only in tests. Could this be in tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be in presto-main/testing as those are helpers for different modules tests as well.

@@ -734,6 +734,11 @@ public ConnectorPageSource createPageSource(Session session, Split split, List<C
}
}

public Session getSession()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor, but I'd put this method higher in the class, not between static inner classes

assertFunction("TIME '03:04'", TIME, new SqlTime(new DateTime(1970, 1, 1, 3, 4, 0, 0, DATE_TIME_ZONE).getMillis(), TIME_ZONE_KEY));
assertFunction("TIME '03:04:05.321'", TIME, sqlTimeOf(1970, 1, 1, 3, 4, 5, 321, DATE_TIME_ZONE, TIME_ZONE_KEY, functionAssertions.getSession().toConnectorSession()));
assertFunction("TIME '03:04:05'", TIME, sqlTimeOf(1970, 1, 1, 3, 4, 5, 0, DATE_TIME_ZONE, TIME_ZONE_KEY, functionAssertions.getSession().toConnectorSession()));
assertFunction("TIME '03:04'", TIME, sqlTimeOf(1970, 1, 1, 3, 4, 0, 0, DATE_TIME_ZONE, TIME_ZONE_KEY, functionAssertions.getSession().toConnectorSession()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you could have extracted functionAssertions.getSession().toConnectorSession() to a method for readability.

EDIT just as you did in TestTimestamp.java

@@ -67,22 +68,27 @@ private void assertFunction(String projection, Type expectedType, Object expecte
functionAssertions.assertFunction(projection, expectedType, expected);
}

private ConnectorSession connectorSession()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

assertFunction("cast('\n\t 2001-1-22 03:04:05.321 \t\n' as timestamp)",
TIMESTAMP,
new SqlTimestamp(new DateTime(2001, 1, 22, 3, 4, 5, 321, DATE_TIME_ZONE).getMillis(), TIME_ZONE_KEY));
assertFunction("cast('2001-1-22 03:04:05.321' as timestamp)", TIMESTAMP, sqlTimestampOf(2001, 1, 22, 3, 4, 5, 321, DATE_TIME_ZONE, TIME_ZONE_KEY, connectorSession()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seems to have unnecessarily changed formatting here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean avoiding line break I think it's fine

int secondOfMinute,
int millisOfSecond,
DateTimeZone baseZone,
TimeZoneKey timestampZone,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When timestampZone ≠ session.getTimeZoneKey? Similarly, when baseZone ≠ DateTimeZoneIndex.getDateTimeZone(timestampZone) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eg. In old semantic when literal WITH TZ is casted to TIMESTAMP W/O TZ and TZ of literal is different than session TZ.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not lost, this was about ...#sqlTimestampOf(int, int, int, int, int, int, int, org.joda.time.DateTimeZone, com.facebook.presto.spi.type.TimeZoneKey, com.facebook.presto.spi.ConnectorSession)

  • this is test-only (now)
  • i looked briefly through usages, and it seems timestampZone = session.getTimeZoneKey in all cases.. didn't check for sure, though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestTimestampWithTimeZone.java:

        assertFunction("cast(TIMESTAMP '2001-1-22 03:04:05.321 +07:09' as timestamp)",
                TIMESTAMP,
                sqlTimestampOf(2001, 1, 22, 3, 4, 5, 321, WEIRD_ZONE, session.getTimeZoneKey(), session.toConnectorSession()));


public static SqlTime sqlTimeOf(int year,
int monthOfYear,
int dayOfMonth,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need year, monthOfYear, dayOfMonth to construct SqlTime ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not) surprisingly, the first args to this method are always 1970, 1, 1 :)

int secondOfMinute,
int millisOfSecond,
DateTimeZone baseZone,
TimeZoneKey timestampZone,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same q as for sqlTimestampOf

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answered

@@ -678,7 +679,6 @@ private static SqlDate sqlDate(int year, int month, int day)

private static SqlTimestamp sqlTimestamp(int year, int month, int day, int hour, int minute, int second)
{
DateTime dateTime = new DateTime(year, month, day, hour, minute, second, 0, UTC);
return new SqlTimestamp(dateTime.getMillis(), UTC_KEY);
return sqlTimestampOf(year, month, day, hour, minute, second, 0, UTC, UTC_KEY, SESSION);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See how this private static SqlTimestamp sqlTimestamp method is used. Probably a better fix for it would be
to detach it completely from SqlTimestamp, like

long timestampUtc(...) {
    return new DateTime(year, month, day, hour, minute, second, 0, UTC).getMillis()
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that way, but it would be unrelated change. Especially I don't want to mix that with this commit content.

@@ -207,7 +208,14 @@ private void testRoundTripNumeric(Iterable<? extends Number> values)
tester.testRoundTrip(
javaTimestampObjectInspector,
writeValues.stream()
.map(timestamp -> new SqlTimestamp(timestamp, UTC_KEY))
.map(timestamp -> {
if (SESSION.isLegacyTimestamp()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know SESSION is has UTC time zone key, so you could simply sqlTimestampOf(timestamp, SESSION), right?

@@ -159,7 +160,14 @@ public void testTimestampSequence()
TIMESTAMP,
intsBetween(-31_234, 31_234).stream()
.filter(i -> i % 19 == 0)
.map(timestamp -> new SqlTimestamp(timestamp, UTC_KEY))
.map(timestamp -> {
if (SESSION.isLegacyTimestamp()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know SESSION is has UTC time zone key, so you could simply sqlTimestampOf(timestamp, SESSION), right?

@@ -141,7 +142,12 @@ private static SqlDate sqlDate(DateTime from)

private static SqlTimestamp toTimestamp(DateTime dateTime)
{
return new SqlTimestamp(dateTime.getMillis(), SESSION.getTimeZoneKey());
if (isLegacyTimestamp(SESSION)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return sqlTimestampOf(dateTime.getMillis(), SESSION)?

return getChronology(session.getTimeZoneKey()).monthOfYear().add(right, left);
}
else {
return MONTH_OF_YEAR_UTC.add(right, left);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't change it, I know, but the variables here are quite intuitively named.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean right & left? I think they are perfect as well ;-)

@@ -276,8 +276,14 @@ public static long timestampWithTimeZoneMinusIntervalYearToMonth(@SqlType(Standa
return updateMillisUtc(dateTimeWithTimeZone, left);
}

@Deprecated
public static int modulo24Hour(ISOChronology chronology, long millis)
{
return chronology.millisOfDay().get(millis) - chronology.getZone().getOffset(millis);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't change anything here, I know, but why oh why the module method isn't modulo....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is modulo - TZ shift to get into SqlTime long range and semantic.

// convert to midnight if the session timezone
ISOChronology chronology = getChronology(session.getTimeZoneKey());
return utcMillis - chronology.getZone().getOffset(utcMillis);
// date is encoded as milliseconds at midnight in UTC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimeUnit.DAYS.toMillis(value) suggests that "date is encoded as epoch days ...", right?

Copy link
Member Author

@fiedukow fiedukow Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit is from legacy implementation and it regards utcMillis variable in this context. It was quite clear to me what author meant when I was reading that first time, so I thought this comment is good enough in this legacy context.

This is true that DATE value is expressed in epoch days.

import static io.airlift.slice.Slices.utf8Slice;

public final class TimeOperators
{
private static final long REFERENCE_TIMESTAMP_UTC = System.currentTimeMillis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zonk. This cannot be const :/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will be globally solved in separate commit.

LocalTime.from(
Instant.ofEpochMilli(REFERENCE_TIMESTAMP_UTC)
.atZone(ZoneId.of(TimeZoneKey.UTC_KEY.getId())))
.toNanoOfDay() / 1_000_000L; // nanos to millis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified, just the same as in the first commit.


// This cast does treat TIME as wall time in session TZ. This means that in order to get
// its UTC representation we need to shift the value by the offset of TZ.
return packDateTimeWithZone(value - localChronology.getZone().getOffset(timeMillisUtcInCurrentDay), session.getTimeZoneKey());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to understand why return packDateTimeWithZone(value, session.getTimeZoneKey()); wasn't good. I think you must fix me before I go any further

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I fixed myself myself.


// This cast does treat TIMESTAMP as wall time in session TZ. This means that in order to get
// its UTC representation we need to shift the value by the offset of TZ.
return packDateTimeWithZone(modulo24Hour(value - localChronology.getZone().getOffset(value)), session.getTimeZoneKey());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this equivalent to casting to TIMESTAMP_WITH_TZ first and that casting to TIME_WITH_TZ by truncating y,d,m info? Do we have tests covering it is always so? Perhaps even regardless of legacy switch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getOffset(value) -- same comment as in castToTimestampWithTimeZone:

localChronology.getZone()#getOffset expects millisUtc and you give it a millisUtcShiftedToOtherTimeZone as input. If I'm correct, that means within timezone's (avg) offset of DST change you will be giving incorrect result.

There seems to be DateTimeZone#getOffsetFromLocal for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've talked about that it will be fixed according to the standard (using same "hack" as in TIME).

timeZoneKey = unpackZoneKey(timeWithTimeZone);

validateMillisRange();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you did this(unpackMillisUtc(timeWithTimeZone), unpackZoneKey(timeWithTimeZone)) you would be able to retain input validation directly in the (other) ctor.

Copy link
Member Author

@fiedukow fiedukow Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, on the other hand that way it's easier to get rid of legacy code later. I did such choice in multiple places.

Instant.ofEpochMilli(timezoneOffsetReferenceTimestampUtc)
.atZone(ZoneOffset.UTC))
.toNanoOfDay() / 1_000_000L; // nanos to millis
.toNanoOfDay());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder: i still think that this can be written simply as

currentMillisOfDay = ChronoField.MILLI_OF_DAY.getFrom(
    Instant.ofEpochMilli(timezoneOffsetReferenceTimestampUtc).atZone(ZoneOffset.UTC))

{
ZoneId id = ZoneId.of(timeZoneKey.getId());
ZoneOffset offset = id.getRules().getOffset(getTimeInstantInCurrentDay());
return millisUtc + TimeUnit.SECONDS.toMillis(offset.getTotalSeconds());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or

ChronoField.MILLI_OF_DAY.getFrom(getTimeInstantInCurrentDay().atZone(id))

?

Note: there should be parallelism between toString() and getLocalMillis() as they both effectively work out what current local time is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why they both use getTimeInstantInCurrentDay(), the implementation is forked afterwards.


// This cast does treat TIMESTAMP as wall time in session TZ. This means that in order to get
// its UTC representation we need to shift the value by the offset of TZ.
return packDateTimeWithZone(value - localChronology.getZone().getOffset(value), session.getTimeZoneKey());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

localChronology.getZone()#getOffset expects millisUtc and you give it a millisUtcShiftedToOtherTimeZone as input. If I'm correct, that means within timezone's (avg) offset of DST change you will be giving incorrect result.

There seems to be DateTimeZone#getOffsetFromLocal for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've talked about that it will be fixed according to the standard (using same "hack" as in TIME).

long date = chronology.dayOfYear().roundFloor(value);
// date is currently midnight in timezone of the session
// convert to UTC
long millis = date + chronology.getZone().getOffset(date);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm dying to know what is the point to query TZ's offset in the first one third of the first second of January 1, 1970.

(ACK this is not yours)

Copy link
Member Author

@fiedukow fiedukow Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the question (if there is any).


// This cast does treat TIMESTAMP as wall time in session TZ. This means that in order to get
// its UTC representation we need to shift the value by the offset of TZ.
return packDateTimeWithZone(modulo24Hour(value - localChronology.getZone().getOffset(value)), session.getTimeZoneKey());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getOffset(value) -- same comment as in castToTimestampWithTimeZone:

localChronology.getZone()#getOffset expects millisUtc and you give it a millisUtcShiftedToOtherTimeZone as input. If I'm correct, that means within timezone's (avg) offset of DST change you will be giving incorrect result.

There seems to be DateTimeZone#getOffsetFromLocal for that.

long timeMillisUtcInCurrentDay = REFERENCE_TIMESTAMP_UTC - currentMillisOfDay + unpackMillisUtc(value);

ISOChronology chronology = getChronology(unpackZoneKey(value));
return unpackMillisUtc(value) + chronology.getZone().getOffset(timeMillisUtcInCurrentDay);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • unpackMillisUtc(value) is twice, could be assigned to a variable
  • we talked about fixed offset for zone for the query duration.. getOffset(start of the query) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix that globally as we talked about that.

LocalTime.from(
Instant.ofEpochMilli(REFERENCE_TIMESTAMP_UTC)
.atZone(ZoneId.of(TimeZoneKey.UTC_KEY.getId())))
.toNanoOfDay() / 1_000_000L; // nanos to millis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplify

@Test
public void testTimeZoneGapIsNotApplied()
{
// Behavior is different in legacy mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this "different" is easily understood in the context of code review, but later, if looking at this code, one could probably have problem digesting it

@@ -34,6 +34,7 @@
import java.util.Locale;

import static com.facebook.presto.operator.scalar.QuarterOfYearDateTimeField.QUARTER_OF_YEAR;
import static com.facebook.presto.spi.StandardErrorCode.FUNCTION_NOT_FOUND;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmt msg ("Remove to_iso8601 function for new TIMESTAMP semantics"): s/(complet)(ly)/\1e\2/g

}
else {
// FIXME: Simply remove this function when legacyTimestamp flag will be dropped
throw new PrestoException(FUNCTION_NOT_FOUND, "to_iso8601(timestamp) does not exist");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe https://en.wikipedia.org/wiki/ISO_8601 is not authoritative, but quoting

If no UTC relation information is given with a time representation, the time is assumed to be in local time.

So, there is ISO 8601 format suitable for timestamp, it just does not contain the tz info. Thus, an alternative to disabling the method would be changing its semantics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have serious doubts about wikipedia reliablity based on other sources including Java libraries implementations. I've requested access to ISO 8601 to confirm that, but for now leaving in "as is" state.

@@ -79,6 +80,18 @@ protected void assertDecimalFunction(String statement, SqlDecimal expectedResult
expectedResult);
}

protected void assertThrowsPrestoException(String projection, StandardErrorCode errorCode, String message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other place you called the very similar method assert*Function*ThrowsPrestoException

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key to this mystery is in your comment:

In the other place

{
try {
evaluateInvalid(projection);
fail("Expected to throw an INVALID_FUNCTION_ARGUMENT exception with message " + message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/INVALID_FUNCTION_ARGUMENT/errorCode.toErrorCode()/ ?

fiedukow added 25 commits March 1, 2017 18:01
VARCHAR casted to TIMESTAMP cannot contain TIMEZONE in new semantic
To avoid problem with statically read function list, the excepion
FUNCTION_NOT_FOUND will be thrown when isLegacyTimestamp = false.

This is temporary workaround, function should be removed completely
when backward compatibility flag will be dropped.
AT TIME ZONE does not take into account the fact, that TIME WITH TIME ZONE
does not represent real millisecond UTC in millisUtc field.
In fact, this field contains millisUtc assuming offset of TIME ZONE that was
valid on 1970-01-01. Such representation allows to simply represent local time
with time zone id. That means that TIME WITH TIME ZONE that represents eg.
'10:00:00.000 Asia/Kathmandu' will always represent this exact value.
However mapping of such value to other TZ (including UTC) may differ over time.
Eg. Asia/Kathmandu switched time zone offset on 1986 from +5:30 to +5:45.
Result of query like:
`SELECT time_with_tz_column FROM table;`
Will always be the same, however:
`SELECT time_with_tz_column AT TIME ZONE 'UTC' FROM table;`
Will yail differnet value in 1980 and 2000 after changes from this commit.

This is done to use current offset of TZ as function that stucked in 1970-01-01
offsets seems useless.
This is not perfect solution and is not fully aligned with standard, but standard
behavior cannot be achieved with current TIME WITH TIME ZONE representation, as
we are not able to read TZ offset from TIME WITH TIME ZONE itselve (at least not
in all cases).
@fiedukow
Copy link
Member Author

fiedukow commented Mar 1, 2017

Continued here: #7480

@fiedukow fiedukow closed this Mar 1, 2017
@fiedukow fiedukow deleted the af/7122/1 branch May 4, 2017 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants