-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: include precision information in PrecisionTimestamp and PrecisionTimestampTZ literals #659
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,7 +36,7 @@ message Type { | |
FixedBinary fixed_binary = 23; | ||
Decimal decimal = 24; | ||
PrecisionTimestamp precision_timestamp = 33; | ||
PrecisionTimestampTZ precision_timestamp_tz = 34; | ||
PrecisionTimestampTZ precision_timestamp_tz = 34; // value is since UNIX epoch in UTC | ||
|
||
Struct struct = 25; | ||
List list = 27; | ||
|
@@ -164,13 +164,15 @@ message Type { | |
} | ||
|
||
message PrecisionTimestamp { | ||
// Sub-second precision, 0 means the value given is in seconds, 3 is milliseconds, 6 microsecods, 9 is nanoseconds | ||
// Defaults to 6 | ||
int32 precision = 1; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not related to these changes, but how does
Should we remove this entirely? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah I did mean to remove that, as I don't think it works, like you say - 666aa60. Hope that's better! (The commit did remove your approvals though @vbarua @westonpace ) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch. I agree that a "default" can't exist here. The default of 6 is really "if you see a |
||
uint32 type_variation_reference = 2; | ||
Nullability nullability = 3; | ||
} | ||
|
||
message PrecisionTimestampTZ { | ||
// Sub-second precision, 0 means the value given is in seconds, 3 is milliseconds, 6 microsecods, 9 is nanoseconds | ||
// Defaults to 6 | ||
int32 precision = 1; | ||
uint32 type_variation_reference = 2; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, one small clarification. Timestamps are always in UTC, regardless of whether or not they are
Timestamp
orTimestampTZ
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm is that right? I think the docs for Timestamp and TimestampTZ disagree, also that’s just the difference between the TZ version and the non-TZ version. The TZ version doesn’t carry any timezone with it, but it counts time since utc epoch so it specifies a definite point in time - while the non-TZ timestamp doesn’t, and is only useful relative to another no -TZ timestamp.
At least that’s the way I understand them, and it aligns with Spark and DataFusion timestamp types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that you can still do certain things to a non-TZ timestamp:
In order for a consumer to do either of those things there needs to be some meaning behind the literal beyond "just some random number that has appropriate relative distance from other values in this column"
In pretty much all cases I'm aware of the non-TZ timstamps are still stored as UTC instants where the wall clock (in a UTC time zone) would display the desired value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked Spark, DataFusion, Parquet and Postgres docs, and as far as I understand them they all say the non-TZ timestamp bears no notion of timezone at all (ie. it's time passed since some epoch, not UTC epoch), while the yes-TZ timestamp is specifically time passed since UTC epoch.
Yes, the non-TZ timestamp does indicate wall clock time - but it will represent the same wall clock time of e.g. 10PM on Thursday, July 11th, 2024 no matter in which timezone you look at it. And you can convert it to a -TZ timestamp by providing information about which timezone's wall clock time it corresponds to.
Well yes, it's not just a random number, but time passed since 1.1.1970.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. So if a producer encodes the literal "10PM on Thursday, July 11th, 2024" as 1720735200 (seconds since 1.1.1970 in UTC) and my consumer encodes the literal "10PM on Thursday, July 11th, 2024" as 1720760400 (seconds since 1.1.1970 in PT) then I think we are going to have a problem.
Often people say "seconds since the UNIX epoch" but the UNIX epoch is defined as 1.1.1970 UTC and not 1.1.1970 in some arbitrary time zone.
Correct. We are supposed to think of these values semantically as a wall clock time. However, we still need to agree on how we encode wall clock times into a proto literal. If we represent wall clock times as ISO-8601 strings (with no TZ part) then it would be fine. There would be no need to mention time zones. However, if we choose to encode wall clock times as a u64 then we must mention how that works. The way that works is by encoding "the wall clock time X" as "seconds since the epoch (UTC) where a wall clock in a UTC time zone would display X"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well then. You are a consumer. I have sent you a plan with a
PrecisionTimestamp
literal. The proto is:Can you please format this as a string of the form
MONTH/DAY/YEAR HOURS:MINUTES:SECONDS
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand where you're getting with this, but I believe that'd be
July 12, 2024 14:21:44
(well, with a slightly different format string).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that answer is wrong. The correct answer is
July 12, 2024 7:21:44 AM
.I believe I followed the encoding rules correctly. 1720794104 is the number of seconds that have passed since 1.1.1970 PT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think I see my mistake. I'm thinking we are interpreting this as
"The time shown on a wall clock in UTC after X seconds have passed since Y" (in which case Y's timezone matters)
You are perhaps interpreting it as
"The time shown on a wall clock in timezone Z after X seconds have passed since Y" (in which case Y's time zone doesn't matter as long as Z matches it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right in how I interpret it 👍