-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cast Kernel Ignores Timezone #1936
Comments
I'd like to work on this. And I think |
I would recommend writing up the expected behaviour first, as timezone handling is notoriously messy, and once we have consensus we can move forward with implementing that. |
@tustvold thank you for pinging me, i'm working on these things now as well. @doki23 it would be great if you could help ❤️ some hints that might help 1 arrow-rs/arrow/src/compute/kernels/cast.rs Line 1284 in 3bf6eb9
to make casting function consider timezone, we have to fix the second _ identifier and check whether it's None or Some
2 arrow-rs/arrow/src/array/array_primitive.rs Line 209 in 3bf6eb9
while using fmt to print, we first convert it to NaiveDateTime (from chrono-rs) which contains no timezone info so that you could only see timestamp without timezone
|
@doki23 are you planning to draft the proposal? |
sure, plz wait me hours |
We consider tz only if from_type and to_type both needs it. For example, ignoring tz is ok when we cast ts to i64, because i64 array doesn't care about timezone. So, there're 2 situations:
I noticed that timestamp array is always treat as make_type!(
TimestampSecondType,
i64,
DataType::Timestamp(TimeUnit::Second, None)
); So, we may need a |
what's expected behavior for casting timestamp with timezone to timestamp without time zone? e.g. if the timestamp with timezone is i recommend that listing these ambiguous cases and your proposed behavior so we could discuss btw i tested it on pyarrow, it simply changes the datatype but not change the underline timestamp ( |
The intuitive answer would be to convert it to UTC. I think postgres effectively casts it to server (local) time. |
I believe this section of the arrow schema definition is relevant - https://github.com/apache/arrow-rs/blob/master/format/Schema.fbs#L280 In particular
Given this is the only possibility enumerated in the schema, I feel this is probably the one we should follow unless people feel strongly otherwise. My 2 cents is that anything relying on the local timezone of the system is best avoided if at all possible, it just feels fragile and error-prone. |
Yes - these RecordBatches could be part of Flights, yes? In which case the whole point is to send them around to different computers that may be in different timezones, so it kind of forces our hand here. And if we are doing it this way in arrow where we don't have the luxury of following postgres, maybe this is also where we break postgres compatibility in DataFusion. Just because postgres did it wrong doesn't mean we should follow... |
Tz has no affect to the value of timestamp, it's just used for display. |
The specification states
As stated above, given this is the only possibility enumerated I think we should follow this. The inverse operation, i.e. removing a timezone, I would therefore expect to do the reverse i.e. |
thank you @tustvold , didn't aware this spec before btw, i think |
Yeah let's do the "hard" operation in the cast kernel, and if people don't like it, they can perform a metadata-only cast using |
#3794 is related to this, and implements the necessary machinery for timezone aware parsing of strings |
Reopening as there are still issues around casting between timestamps with timezones #4201 |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The beginnings of timezone support were added in #824, however, this is currently ignored by the cast kernel
Describe the solution you'd like
Timezones should be correctly handled by the cast kernel
Describe alternatives you've considered
We could not support timezones
Additional context
Noticed whilst investigating #1932
The text was updated successfully, but these errors were encountered: