-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support SET
timezone to non-UTC time zone
#4106
Comments
@alamb @liukun4515 @avantgardnerio @tustvold would you mind helping to check whether this make sense? |
i added a POC to showcase some functionalities perhaps we should break it to several pr later |
@waitingkuo thank you for your clear communication as always! You have the most reviewable PRs and issues I've seen 😄 . The issue looks good to me, but I would propose tackling in stages:
This is due to the extra complication of what to do when someone does a |
💯 agree -- thank you so much @waitingkuo |
❤️
PostgreSQL's to_timestamp ( text, text ) → timestamp with time zone
Converts string to time stamp according to the given format.
to_timestamp('05 Dec 2000', 'DD Mon YYYY') → 2000-12-05 00:00:00-05 willy=# set timezone to 'America/Denver';
SET
willy=# select to_timestamp('05 Dec 2000', 'DD Mon YYYY');
to_timestamp
------------------------
2000-12-05 00:00:00-07
(1 row) it shows the fixed offset timezone. i think there's no ambiguity in this approach. The |
i think we might need to refactor the
i.e. 2. now() returns the Timestamp<TimeUnit::Nanosecond, Some("SOME_TIMEZONE") according to the time zone we have. it's currently fixed as "UTC" ❯ set time zone to '+08:00';
0 rows in set. Query took 0.000 seconds.
❯ select now();
+----------------------------------+
| now() |
+----------------------------------+
| 2022-11-05T11:11:51.713660+00:00 |
+----------------------------------+
1 row in set. Query took 0.003 seconds.
❯ select now()::timestamptz;
+----------------------------------+
| now() |
+----------------------------------+
| 2022-11-05T19:11:14.245835+08:00 |
+----------------------------------+
1 row in set. Query took 0.003 seconds. |
I disagree. The problem is that Unfortunately, this Sunday (2022-11-06), MT will be switching from UTC-6 to UTC-7 at 02:00 MDT, which will cause the clocks to go to 01:00 MST, and about half an hour later it will be 1:30AM MT for the second time this year. So while I agree there is no ambiguity for
There is no way to infer if this means there is no way to know if that corresponds with |
@avantgardnerio thank you. I didn't aware this before. never live in the area that has the timezone switch. I did some research in Postgrseql and Chrono-tz MST timezone offset: -7 In 2022, MDT willy=# set timezone to 'America/Denver';
SET this is valid (right before the timezone shift): willy=# select to_timestamp('2022-03-13 01:00', 'YYYY-MM-DD HH24:MI');
to_timestamp
------------------------
2022-03-13 01:00:00-07
(1 row)
willy=# select to_timestamp('2022-03-13 01:59', 'YYYY-MM-DD HH24:MI');
to_timestamp
------------------------
2022-03-13 01:59:00-07
(1 row) begin from 2:00, it switches to MDT (-6) willy=# select to_timestamp('2022-03-13 02:00', 'YYYY-MM-DD HH24:MI');
to_timestamp
------------------------
2022-03-13 03:00:00-06
(1 row)
willy=# select to_timestamp('2022-03-13 02:30', 'YYYY-MM-DD HH24:MI');
to_timestamp
------------------------
2022-03-13 03:30:00-06
(1 row)
willy=# select to_timestamp('2022-03-13 02:59', 'YYYY-MM-DD HH24:MI');
to_timestamp
------------------------
2022-03-13 03:59:00-06
(1 row) and then the next hour it's parsed as MDT willy=# select to_timestamp('2022-03-13 03:00', 'YYYY-MM-DD HH24:MI');
to_timestamp
------------------------
2022-03-13 03:00:00-06
(1 row) In November 6th 2am MDT, time zone is switched back to MST (1am MST) let's begin with something unambiguous
for the next 1 hour, it's ambiguous since both MST and MDT has 1 am. and this is what postgresql does, it uses MST even though 1am MDT is valid as well willy=# select to_timestamp('2022-11-06 01:00', 'YYYY-MM-DD HH24:MI');
to_timestamp
------------------------
2022-11-06 01:00:00-07
(1 row)
willy=# select to_timestamp('2022-11-06 01:59', 'YYYY-MM-DD HH24:MI');
to_timestamp
------------------------
2022-11-06 01:59:00-07
(1 row) after that, there's no ambiguity willy=# select to_timestamp('2022-11-06 02:00', 'YYYY-MM-DD HH24:MI');
to_timestamp
------------------------
2022-11-06 02:00:00-07
(1 row) conclusion for Postgrseql: when it's ambiguous or invalid, it's parsed as MST, and then switches to MDT if needed now let's see the behavior for let tz: Tz = "America/Denver".parse().unwrap();
let dt = tz.datetime_from_str("2022-03-13T01:59", "%Y-%m-%dT%H:%M");
println!("2022-03-13T01:59 -> {:?} / {}", dt, dt.unwrap().to_rfc3339());
let dt = tz.datetime_from_str("2022-03-13T02:00", "%Y-%m-%dT%H:%M");
println!("2022-03-13T02:00 -> {:?}", dt);
let dt = tz.datetime_from_str("2022-03-13T02:59", "%Y-%m-%dT%H:%M");
println!("2022-03-13T02:59 -> {:?}", dt);
let dt = tz.datetime_from_str("2022-03-13T03:00", "%Y-%m-%dT%H:%M");
println!("2022-03-13T03:00 -> {:?} / {}", dt, dt.unwrap().to_rfc3339()); 2022-03-13T01:59 -> Ok(2022-03-13T01:59:00MST) / 2022-03-13T01:59:00-07:00
2022-03-13T02:00 -> Err(ParseError(Impossible))
2022-03-13T02:59 -> Err(ParseError(Impossible))
2022-03-13T03:00 -> Ok(2022-03-13T03:00:00MDT) / 2022-03-13T03:00:00-06:00 let dt = tz.datetime_from_str("2022-11-06T00:59", "%Y-%m-%dT%H:%M");
println!("2022-11-06T00:59 -> {:?} / {}", dt, dt.unwrap().to_rfc3339());
let dt = tz.datetime_from_str("2022-11-06T01:00", "%Y-%m-%dT%H:%M");
println!("2022-11-06T01:00 -> {:?}", dt);
let dt = tz.datetime_from_str("2022-11-06T01:59", "%Y-%m-%dT%H:%M");
println!("2022-11-06T01:59 -> {:?}", dt);
let dt = tz.datetime_from_str("2022-11-06T02:00", "%Y-%m-%dT%H:%M");
println!("2022-11-06T02:00 -> {:?} / {}", dt, dt.unwrap().to_rfc3339()); 2022-11-06T00:59 -> Ok(2022-11-06T00:59:00MDT) / 2022-11-06T00:59:00-06:00
2022-11-06T01:00 -> Err(ParseError(NotEnough))
2022-11-06T01:59 -> Err(ParseError(NotEnough))
2022-11-06T02:00 -> Ok(2022-11-06T02:00:00MST) / 2022-11-06T02:00:00-07:00
|
i'll limit this pr to support fixed offset timezone only. thank you @avantgardnerio |
@waitingkuo I think this is good advice :) Hopefully the US ends this nonsense next year. I find it troubling that postgres happily parses ambiguous timestamps and chooses a default arbitrarily, but it seems like the emerging trend for DataFusion is "just do what postgres does", so perhaps we just copy their flawed implementation. What's even more troubling than it defaulting an ambiguous timestamp is that it happily parses a non-existent I guess the only real choice we have is:
Finally, I even wonder how chrono deals with all this? DST didn't always start or end in March/November, so a database is required to keep track of when various legal jurisdictions implement/modify/retract DST. In unix-based OSes I think this is a tzinfo file, but I know of no equivalent on Windows. If we do embrace |
@avantgardnerio
IANA timezone database maintains the offset & leapseconds this repo has the database and parser (written in c) chrono-tz use |
perhaps we could force user to add the timezone offset while casting string to timestamptz as the initial pr for extending named timezone e.g. note that if we set time zone as 'America/Denver' incorrect |
Very informative, ty! |
That would definitely eliminate the ambiguity. I'd love to see if others have opinions about this. |
@avantgardnerio adding these options looks great as well. |
This would be my preference I think -- no one likes how postgres handles things. What I would really prefer is that DataFusion always deals with UTC internally (like all internal math, expressions like now(), etc always use a UTC timestamp). I think this is what happens now If the input data doesn't specify a timezone, we should treat it like UTC (I know that is not what the arrow spec seems to say) If the user wants to see times / dates in their current locale's timezone, that can be handled by the end client (e.g. datafusion-cli) rather than forcing the entire processing chain to handle arbitrary timezones just for output display. |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I'd like to add the support for
SET TIMEZONE
.I'd like to have
now()
andTimestampTz
use the timezone in config_options instead of the fixed UTCDescribe the solution you'd like
SET TIME TO [SOME_TIMEZONE]
It's currently disabled on purpose
https://github.com/apache/arrow-datafusion/blob/7e944ede86457fe0f43be44e0e5550229ecaf008/datafusion/sql/src/planner.rs#L2484-L2489
i'd like to remove it (probably replaced by some tz verification) and have this as the result
now()
returns theTimestamp<TimeUnit::Nanosecond, Some("SOME_TIMEZONE")
according to the time zone we have. it's currently fixed as "UTC"https://github.com/apache/arrow-datafusion/blob/7e944ede86457fe0f43be44e0e5550229ecaf008/datafusion/physical-expr/src/datetime_expressions.rs#L176-L186
Some("UTC".to_owned())
should be modified. we need to passconfig_options
ortime zone
into here this functione.g.
current version
becomes
TimestampTz
should use the timezone fromconfig_options
https://github.com/apache/arrow-datafusion/blob/7e944ede86457fe0f43be44e0e5550229ecaf008/datafusion/sql/src/planner.rs#L2832-L2841
we currently fixed
TimestampTz
as "UTC" without consideringconfig_options
's timezone.to enable it to consider
config_options
, we need to3-1. make
convert_simple_data_type
as a method ofSqlToRel
https://github.com/apache/arrow-datafusion/blob/7e944ede86457fe0f43be44e0e5550229ecaf008/datafusion/sql/src/planner.rs#L2811-L2817
3-2. Add
get_config_option
inContextProvider
so that we could get the time zonehttps://github.com/apache/arrow-datafusion/blob/7e944ede86457fe0f43be44e0e5550229ecaf008/datafusion/sql/src/planner.rs#L2832-L2841
3-3 then we can use the timezone in
config_options
here to replace fixed UTChttps://github.com/apache/arrow-datafusion/blob/7e944ede86457fe0f43be44e0e5550229ecaf008/datafusion/sql/src/planner.rs#L2832-L2841
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: