Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does ZoneVariant need to be an open enum? #5745

Closed
robertbastian opened this issue Oct 30, 2024 · 8 comments
Closed

Does ZoneVariant need to be an open enum? #5745

robertbastian opened this issue Oct 30, 2024 · 8 comments
Labels
C-time-zone Component: Time Zones

Comments

@robertbastian
Copy link
Member

TZDB inherently only supports two values for the zone variant, DST is a bool in its data model.

In ICU4X we currently model this more openly, and say thing like "concepts such as Standard, Summer, Daylight, and Ramadan time". However, it is unclear whether CLDR will ever adopt more variants without TZDB being able to model them.

Making this enum exhaustive lets us reduce data size for specific display names, as we will need fewer ZeroMaps.

@robertbastian robertbastian added the discuss-priority Discuss at the next ICU4X meeting label Oct 30, 2024
robertbastian added a commit that referenced this issue Oct 30, 2024
This could be even smaller if we had one map for standard names, and one
for daylight names. #5745
@sffc
Copy link
Member

sffc commented Oct 30, 2024

The TZDB concept of daylight saving time is distinct from the CLDR concept of a zone variant. It's coincidental that one is often derivable from the other.

I think one example of the difference is that the CLDR daylight variant is always a positive offset from the standard variant, whereas TZDB daylight savings time could be positive or negative.

But maybe we could push to make the two concepts align. That's a decision for CLDR, not for us, I think.

@robertbastian
Copy link
Member Author

I think one example of the difference is that the CLDR daylight variant is always a positive offset from the standard variant, whereas TZDB daylight savings time could be positive or negative.

You can choose whether to enable negative DST when building TZDB. I think they are the same concept.

@Manishearth
Copy link
Member

  • @robertbastian If we don't need to support more than the 2 options of standard time and daylight time, then we can move from a 2-level map to just having 2 maps. Doing so should save a lot of data size.
  • @sffc I'm in support of your latest PR. However, I'm not convinced that there is a large data size difference
  • @robertbastian There is a 5-8% data difference. Also, the code gets smaller. This is also a conceptual thing. Currently, in our code, we have logic to say "if standard, do the standard thing. else, do the daylight thing"
  • @sffc If the zone variant is not standard or not known, then we fallback to the next zone format
  • @robertbastian Yeah, we should use the fallback, but it would be nice to go to a known variant
  • @sffc TZDB has its own way of thinking about how daylight savings works. But it's not CLDR's job to match that model. It's totally reasonable, and it some cases it can happen, for time zones to have 3 display names. Certain countries have standard time, daylight time, and Ramadan time. Ramadan time is the same as standard time. Colloquially, Ramadan time is used to referred to standard time. Which variant should be used? -- TZDB won't tell you. We could have code to enforce that if the switch over happens in July, then it's probably Ramadan time, but that's not good.
  • @robertbastian What we do is assign display names to TZDB. We don't use windows names or anything, we are 100% in the TZDB universe. Now, neither CLDR nor TZDB support ramadan time, it would be great if they did, but we don't have to have this unused code path on our API because at some point in the future
  • @robertbastian One alternative to a closed enum is to have a non-exhaustive enum.
  • @Manishearth I'm fine with either. This is time zone stuff. Their release cycle is much slower than our processes. Exhaustive enum is the least of the problems, here.
  • @sffc I disagree with the premise that adding Ramadan time requires changing TZDB.
  • @robertbastian It requires changing CLDR.
  • @sffc Changing CLDR has a much lower bar.
  • @robertbastian Introducing Ramadan time requires a lot of changing. IXDTF would need a lot of code changes. These are things that would not happen before ICU4X v2.0
  • @sffc I don't agree with that statement. If we add Ramadan time, then our zone calculator can be tweaked to support it. You can guess it from the offset and the time of year.
  • @Manishearth How would you guess it from the time of year?
  • @robertbastian You would have to load the Ramadan times in every year
  • @sffc We can load that data and have it in a lookup.
  • @robertbastian Okay, so we add Ramadan time. Do we need an open enum? Do we need to allow for arbitrary zone variants from the user?
  • @sffc We generally took the approach that
  • @Manishearth That's fine, but why did we choose a 2 byte enum?
  • @sffc We had 4 bytes, and we went down to 2 bytes.
  • @Manishearth We could save space by just using a u8.
  • @robertbastian We can have a non-exhaustive enum with invariants.
  • @sffc Right now, you can take a zone variant and use it in the key of the map, but if you have a non-exhaustive enum, then you need to validate the value before using it in a lookup
  • @robertbastian We have to validate it in the Zeromap lookup. It would be nice to have an unvalidate type.
  • @Manishearth We should have an Unvalidated<T>. I don't think we need to go down this route right now.
  • @sffc My preference for open enum vs. non-exhaustive closed enum is not that strong. I am happy to go in one direction based on technical advantanges.
  • @Manishearth I'm fine with non-exhaustive closed enums too, but I think we can do better here, and it's not that much work.
  • @sffc Given the v2.0 deadline, can we just reduce the representation size from 2 bytes to 1 byte?
  • @Manishearth I think everyone is on board with making this a non-exhaustive closed enum.

Conclusion:

  • Change ZoneVariant to a non-exhaustive closed enum
  • In the future, improve deserialization efficiency via deferred validation. Exact mechanism is not known at this time.

LGTM: @sffc @robertbastian @Manishearth

@Manishearth
Copy link
Member

To write down specific ways deserialization efficiency can be improved:

  • ProviderZoneVariant(u8) which has a TryInto: this will only force validation on the zones you are getting, not the entire map
  • Unvalidated<T> which generically lets one Unvalidate a ULE type. Useful in maps and vecs

@robertbastian
Copy link
Member Author

#5660

@sffc
Copy link
Member

sffc commented Nov 4, 2024

Counter-example to the notion that TZDB only reports a DST bit: from the docs in

https://web.cs.ucla.edu/~eggert/tz/tz-how-to.html

A time zone definition could look like:

#Rule NAME FROM TO    -   IN  ON        AT   SAVE LETTER/S
Rule  US   1918 1919  -   Mar lastSun  2:00  1:00 D
Rule  US   1918 1919  -   Oct lastSun  2:00  0    S
Rule  US   1942 only  -   Feb 9        2:00  1:00 W # War
Rule  US   1945 only  -   Aug 14      23:00u 1:00 P # Peace
Rule  US   1945 only  -   Sep 30       2:00  0    S
Rule  US   1967 2006  -   Oct lastSun  2:00  0    S
Rule  US   1967 1973  -   Apr lastSun  2:00  1:00 D
Rule  US   1974 only  -   Jan 6        2:00  1:00 D
Rule  US   1975 only  -   Feb 23       2:00  1:00 D
Rule  US   1976 1986  -   Apr lastSun  2:00  1:00 D
Rule  US   1987 2006  -   Apr Sun>=1   2:00  1:00 D
Rule  US   2007 max   -   Mar Sun>=8   2:00  1:00 D
Rule  US   2007 max   -   Nov Sun>=1   2:00  0    S

The "LETTER" column has more than just daylight and standard time: it also defines "war" and "peace" time.

I think it's well-defined for CLDR to declare display names for those zone variants.

Likewise, Ramadan getting its own modifier letter like "R" seems plausible.

This changes my position to thinking that an open enum was the right call originally, but I'm not going to push to revert the change given that it already landed and the non-exhaustiveness gives us flexibility.

@robertbastian
Copy link
Member Author

The letters get inserted into the TZ name, which is something like P%T. So at runtime you will have the strings PDT, PST, PWT, PPT available. We could have a data struct that resolves these strings into zone variants.

@sffc sffc added this to the ICU4X 2.0 ⟨P1⟩ milestone Jan 9, 2025
@sffc sffc added the C-time-zone Component: Time Zones label Jan 9, 2025
@sffc
Copy link
Member

sffc commented Jan 9, 2025

I don't think there's any more action on this issue.

@sffc sffc closed this as completed Jan 9, 2025
@sffc sffc removed the discuss-priority Discuss at the next ICU4X meeting label Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-time-zone Component: Time Zones
Projects
None yet
Development

No branches or pull requests

3 participants