-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: include precision information in PrecisionTimestamp and PrecisionTimestampTZ literals #659
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -814,10 +814,8 @@ message Expression { | |||||||||||||
VarChar var_char = 22; | ||||||||||||||
bytes fixed_binary = 23; | ||||||||||||||
Decimal decimal = 24; | ||||||||||||||
// If the precision is 6 or less then this is the microseconds since the UNIX epoch | ||||||||||||||
// If the precision is more than 6 then this is the nanoseconds since the UNIX epoch | ||||||||||||||
uint64 precision_timestamp = 34; | ||||||||||||||
uint64 precision_timestamp_tz = 35; | ||||||||||||||
PrecisionTimestamp precision_timestamp = 34; | ||||||||||||||
PrecisionTimestamp precision_timestamp_tz = 35; | ||||||||||||||
Struct struct = 25; | ||||||||||||||
Map map = 26; | ||||||||||||||
// Timestamp in units of microseconds since the UNIX epoch. | ||||||||||||||
|
@@ -859,6 +857,13 @@ message Expression { | |||||||||||||
int32 scale = 3; | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
message PrecisionTimestamp { | ||||||||||||||
// Sub-second precision, 0 means the value given is in seconds, 3 is milliseconds, 6 microseconds, 9 is nanoseconds | ||||||||||||||
int32 precision = 1; | ||||||||||||||
// Time passed since 1970-01-01 00:00:00.000000 in UTC for PrecisionTimestampTZ and unspecified timezone for PrecisionTimestamp | ||||||||||||||
int64 value = 2; | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, I fixed the value to be uint64. The precision is int32 in the type so I think it's better to keep it as such? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. Keeping precision as int32 seems fine. |
||||||||||||||
} | ||||||||||||||
|
||||||||||||||
message Map { | ||||||||||||||
message KeyValue { | ||||||||||||||
Literal key = 1; | ||||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,7 +36,7 @@ message Type { | |
FixedBinary fixed_binary = 23; | ||
Decimal decimal = 24; | ||
PrecisionTimestamp precision_timestamp = 33; | ||
PrecisionTimestampTZ precision_timestamp_tz = 34; | ||
PrecisionTimestampTZ precision_timestamp_tz = 34; // value is since UNIX epoch in UTC | ||
|
||
Struct struct = 25; | ||
List list = 27; | ||
|
@@ -164,13 +164,15 @@ message Type { | |
} | ||
|
||
message PrecisionTimestamp { | ||
// Sub-second precision, 0 means the value given is in seconds, 3 is milliseconds, 6 microsecods, 9 is nanoseconds | ||
// Defaults to 6 | ||
int32 precision = 1; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not related to these changes, but how does
Should we remove this entirely? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah I did mean to remove that, as I don't think it works, like you say - 666aa60. Hope that's better! (The commit did remove your approvals though @vbarua @westonpace ) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch. I agree that a "default" can't exist here. The default of 6 is really "if you see a |
||
uint32 type_variation_reference = 2; | ||
Nullability nullability = 3; | ||
} | ||
|
||
message PrecisionTimestampTZ { | ||
// Sub-second precision, 0 means the value given is in seconds, 3 is milliseconds, 6 microsecods, 9 is nanoseconds | ||
// Defaults to 6 | ||
int32 precision = 1; | ||
uint32 type_variation_reference = 2; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, one small clarification. Timestamps are always in UTC, regardless of whether or not they are
Timestamp
orTimestampTZ
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm is that right? I think the docs for Timestamp and TimestampTZ disagree, also that’s just the difference between the TZ version and the non-TZ version. The TZ version doesn’t carry any timezone with it, but it counts time since utc epoch so it specifies a definite point in time - while the non-TZ timestamp doesn’t, and is only useful relative to another no -TZ timestamp.
At least that’s the way I understand them, and it aligns with Spark and DataFusion timestamp types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that you can still do certain things to a non-TZ timestamp:
In order for a consumer to do either of those things there needs to be some meaning behind the literal beyond "just some random number that has appropriate relative distance from other values in this column"
In pretty much all cases I'm aware of the non-TZ timstamps are still stored as UTC instants where the wall clock (in a UTC time zone) would display the desired value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked Spark, DataFusion, Parquet and Postgres docs, and as far as I understand them they all say the non-TZ timestamp bears no notion of timezone at all (ie. it's time passed since some epoch, not UTC epoch), while the yes-TZ timestamp is specifically time passed since UTC epoch.
Yes, the non-TZ timestamp does indicate wall clock time - but it will represent the same wall clock time of e.g. 10PM on Thursday, July 11th, 2024 no matter in which timezone you look at it. And you can convert it to a -TZ timestamp by providing information about which timezone's wall clock time it corresponds to.
Well yes, it's not just a random number, but time passed since 1.1.1970.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. So if a producer encodes the literal "10PM on Thursday, July 11th, 2024" as 1720735200 (seconds since 1.1.1970 in UTC) and my consumer encodes the literal "10PM on Thursday, July 11th, 2024" as 1720760400 (seconds since 1.1.1970 in PT) then I think we are going to have a problem.
Often people say "seconds since the UNIX epoch" but the UNIX epoch is defined as 1.1.1970 UTC and not 1.1.1970 in some arbitrary time zone.
Correct. We are supposed to think of these values semantically as a wall clock time. However, we still need to agree on how we encode wall clock times into a proto literal. If we represent wall clock times as ISO-8601 strings (with no TZ part) then it would be fine. There would be no need to mention time zones. However, if we choose to encode wall clock times as a u64 then we must mention how that works. The way that works is by encoding "the wall clock time X" as "seconds since the epoch (UTC) where a wall clock in a UTC time zone would display X"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well then. You are a consumer. I have sent you a plan with a
PrecisionTimestamp
literal. The proto is:Can you please format this as a string of the form
MONTH/DAY/YEAR HOURS:MINUTES:SECONDS
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand where you're getting with this, but I believe that'd be
July 12, 2024 14:21:44
(well, with a slightly different format string).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that answer is wrong. The correct answer is
July 12, 2024 7:21:44 AM
.I believe I followed the encoding rules correctly. 1720794104 is the number of seconds that have passed since 1.1.1970 PT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think I see my mistake. I'm thinking we are interpreting this as
"The time shown on a wall clock in UTC after X seconds have passed since Y" (in which case Y's timezone matters)
You are perhaps interpreting it as
"The time shown on a wall clock in timezone Z after X seconds have passed since Y" (in which case Y's time zone doesn't matter as long as Z matches it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right in how I interpret it 👍