Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial timezone support #903

Merged
merged 25 commits into from
May 5, 2024
Merged

Initial timezone support #903

merged 25 commits into from
May 5, 2024

Conversation

billylanchantin
Copy link
Contributor

@billylanchantin billylanchantin commented Apr 28, 2024

Description

Adds timezone support.

datetimes_with_timezones =
  ~U[2024-01-01T13:00:00.000000Z]
  |> DateTime.shift_zone!("America/New_York")
  |> List.wrap()

series = Explorer.Series.from_list(datetimes_with_timezones)
# #Explorer.Series<
#   Polars[1]
#   datetime[μs, America/New_York] [2024-01-01 07:00:00.000000-05:00 EST America/New_York]
# >

series.dtype
# {:datetime, :microsecond, "America/New_York"}

Explorer.Series.to_list(series)
# [#DateTime<2024-01-01 07:00:00.000000-05:00 EST America/New_York>]

Dtype changes

New dtype:

  • {:datetime, precision, time_zone}

And the old naive datetime dtype has become:

  • {:naive_datetime, precision}

See discussion below.

To-do

  • Support %DateTime{} literals in macros
  • Backward-compatibility: alias {:datetime, _} for {:naive_datetime, _}
    • Decided not to

@billylanchantin
Copy link
Contributor Author

billylanchantin commented Apr 28, 2024

The main issue is that the new {:datetime, _, time_zone} dtype can't be easily enumerated in code since time_zone can be any valid time-zone string.

Before we were able to list literally every non-recursive dtype. That's no longer possible.

Also, as discussed on slack, I'll try to make {:datetime, _} be a valid alias for the new {:naive_datetime, _} dtype, possibly with a deprecation warning?

raise(
ArgumentError,
"Explorer.Series.#{function} not implemented for dtype #{inspect(dtype)}. " <>
"Valid " <> Shared.inspect_dtypes(valid_dtypes, with_prefix: true)
"Valid dtypes are any subtype of #{inspect(valid_super_dtypes)}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My attempt to handle the fact that we can't list all the dtypes easily anymore. Open to ideas on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplification makes sense to me 👍

@josevalim
Copy link
Member

Also, as discussed on slack, I'll try to make {:datetime, _} be a valid alias for the new {:naive_datetime, _} dtype, possibly with a deprecation warning?

In my opinion it is fine to nuke it. It is a very easy change for folks to update in their own apps. Alternatively, keep it as a shortcut for {:datetime

mix.exs Outdated Show resolved Hide resolved
@billylanchantin billylanchantin marked this pull request as ready for review April 29, 2024 15:07
@billylanchantin billylanchantin changed the title [WIP] Initial timezone support Initial timezone support Apr 29, 2024
Comment on lines 411 to 415
// TODO-BILLY: finish this.
impl<'a> From<DateTime<Tz>> for ExDateTime<'a> {
fn from(dt_tz: DateTime<Tz>) -> ExDateTime<'a> {
let & time_zone = dt_tz.offset().tz_id();
let & zone_abbr = dt_tz.offset().abbreviation();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philss (or anyone!) I've hit the limit of my ability to cargo cult Rust code 😭.

The ExDateTime struct currently looks like this:

#[derive(NifStruct, Copy, Clone, Debug)]
#[module = "DateTime"]
pub struct ExDateTime<'a> {
    // ...
    pub time_zone: &'a str,
    // ...
    pub zone_abbr: &'a str,
}

I need to derive these fields from the DateTime object, so they're not known at compile time. AFAICT I've got two options: they can have type &'a str or String. I can implement this if they have type String, but then we can no longer #[derive(Copy)]. So I've tried to implement it with &'a str, but I haven't figured out how yet.

Any pointers would be appreciated!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh man, I really don't know how to solve this easily. Sorry :/
Do we need to support Copy? I would try to use String if we don't need this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to support Copy?

I dunno! I don't know what a lot of this code does... 😅

I'm gonna switch to String. If we end up needing Copy for some reason we can try a different approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8f5be24

Ok, so we did end up needing Copy because of how I wrote s_from_list_datetime. Specifically, this breaks:

val: Vec<Option<ExDateTime>>
// ...
val.iter()
    .map(|dt| dt.map(|dt| dt.into()))
//            ^^^^^^^^^^^^^^^^^^^^^^ this implicitly requires Copy
    .collect::<Vec<Option<i64>>>(),

I've reverted to &'a str for now since all the tests pass. I suspect something we try with %DateTime{} will eventually break because of this, but I say let's address it then.

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No thoughts for now :/
Let's go with this version, and we improve in the future :D

lib/explorer/polars_backend/shared.ex Outdated Show resolved Hide resolved
raise(
ArgumentError,
"Explorer.Series.#{function} not implemented for dtype #{inspect(dtype)}. " <>
"Valid " <> Shared.inspect_dtypes(valid_dtypes, with_prefix: true)
"Valid dtypes are any subtype of #{inspect(valid_super_dtypes)}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplification makes sense to me 👍

lib/explorer/shared.ex Show resolved Hide resolved
"temporal",
"timezones",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we guarantee that the timezones are the same between Elixir and Rust?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great call out! No, largely because Elixir's Calendar module is configurable. This is something which I think we'll want to be very clear about in the docs.

It also presents a validation challenge. As written, we aren't able to check that a time-zone string is valid on the Elixir side. We essentially have to assume it's fine and wait for Rust to tell us it didn't work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. Yeah, I agree that if we document that, this is fine :)

Comment on lines 411 to 415
// TODO-BILLY: finish this.
impl<'a> From<DateTime<Tz>> for ExDateTime<'a> {
fn from(dt_tz: DateTime<Tz>) -> ExDateTime<'a> {
let & time_zone = dt_tz.offset().tz_id();
let & zone_abbr = dt_tz.offset().abbreviation();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh man, I really don't know how to solve this easily. Sorry :/
Do we need to support Copy? I would try to use String if we don't need this.

Copy link
Member

@philss philss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great addition! 🚀

@billylanchantin
Copy link
Contributor Author

Decided to go with what José suggested and nuke :datetime in favor of :naive_datetime. We can add :datetime back in as an alias before the next release if the change seems too disruptive.

@billylanchantin billylanchantin merged commit a9ac048 into main May 5, 2024
4 checks passed
@billylanchantin billylanchantin deleted the bl-datetime-timezone branch May 5, 2024 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants