-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Date/DateTime support through the Time module #5328
Conversation
I'm excited about getting this into Base. That is a lot of new exports, however. What I'd propose is that Base can export only the bare minimum to work with dates and times plus the |
Yeah, I wasn't sure on the exports, so I should re-exported everything from the module. I think it makes sense to keep a lot of it inside though. I just pushed some export reductions. |
Fully agree: great functionality, way too many exports in Base :) I think there are still too many. It should be a small handful, or just Do we really need both As usual, I think How does this relate to our old friends |
The What is the This makes me think we should take another crack at making |
Some answers: -I took the exports down to just dt = Date(2009,1,1)
dt2 = Date(2013,1,1)
ff = recur(dt:dt2) do x
# Get dates of Thanksgiving between two dates
dayofweek(x)==Thursday &&
month(x)==November &&
dayofweekinmonth(x)==4
end |
Ah, that is cool. I'm not sure what to call it, but it seems more like a set or sieve of dates. Probably doesn't qualify as a I see your point about Feel free to hack and slash as necessary to remove subsumed stuff. Nothing worse than crufty stuff lying around that nobody's sure whether it's ok to use. Of course the help text will have to be updated. I appreciate all the tests; they look great. |
# A is sorted array of lookup values | ||
# R is return values for each index of A | ||
# i.e. R[1] is returned for values < A[1], R[2] for < A[2], etc. | ||
function searchsortedfirsttree(A,R,var) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this could be more generalized and live in the sort.jl file or if it's special-cased enough to just stay here. It's not really Datetime specific. It could probably also use a better name.
Watching this development. Great stuff! |
Yeah, maybe someone else can opine as to the Sorry to be a bit rambling. My vote would be to remove the TmStruct type and related functions and if someone so desires, a CTime.jl package could house that code (and probably flesh it out some more). |
Can we keep |
Yeah, we would have to keep the |
Wait, can't we just simulate the behavior of |
Definitely. If you look down around line 558 or so, the interface is |
Ok, I've set up the TimeZone package here: https://github.com/karbarcca/TimeZone.jl Don't expect it to work quite right yet though. Still some bugs to iron out. |
I'm not sure I follow. Here are two simple use cases: how would you handle them?
In these two very common cases, it seems to me that you need both the UTC time and the time zone information. Am I missing something? |
@nalimilan, for the first case, if you were in the U.S. Eastern Time Zone, for example, you input as In the 2nd case, there are a few different options, one of which would be use of the |
Ok, I've pushed all changes I've talked about and finished the documentation (I ended up just doing a module page similar to I'd appreciate any other code review. @StefanKarpinski, @JeffBezanson |
# RFC: Month is the only Period we check because it can throw a | ||
# BoundsError() in construction (see totaldays()) | ||
# all other Periods "roll over"; should we make these consistent? | ||
0 < m < 13 || error("Month out of range") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a more descriptive error message? "Month out of range (1:12)"
Also, give the recent discussions about better error handling, throw an ArgumentError instead of error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good idea. I haven't stayed completely up to date on the latest, but I'll see what I can push.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, just pushed changes.
@karbarcca Thanks, this makes sense for the first case. For the second case, everything depends on how the date/time data with time zone information (like "2014-02-06T12:30:00 EST") is parsed by default (e.g. when importing a CSV file or data from the Web). Discarding time zone information and considering it as "unspecified" doesn't sound like a very good default (and that doesn't work if the input times are using several time zones). So the options are converting to UTC (but then you don't allow to user to find out what was the original time zone, so you cannot aggregate by local day), or use a My point is, I think you will have to bite the bullet and support time zones (including arithmetic in local time) out of the box anyway at some point. From a quick inspection, Joda time, Numpy datetime64 and lubridate all store time zone information in |
@nalimilan – I kind of suspect this may be the case. Let me lay out a philosophical perspective here. It makes sense to talk about a point in time without any notion of timezone. Let's call such a thing a A particular time of day during a particular date at a particular location on the planet Earth is quite a different concept. That is, of course, inherently linked with a location. For convenience, we quantize those locations into timezones. Whenever we want to ask about the hour of the day or the day or the week when something occurred, we need to know what timezone we're talking about because without a timezone, those questions don't make sense. Let's call such a thing a In particular, the kinds of operations you want to do with time fall pretty clearly into those that belong to Regarding representing the timezone of a
Using the type to represent the timezone takes twice as much storage, and of course completely destroys data locality and makes it a nightmare to pass the data to another programming language system. And then there are the problems it causes for code generation. tl;dr: There should be two separate levels of time values – |
@nalimilan, I'm not saying these other implementations are necessarily wrong, but I did point you to several of these package's author's posts about how uncomfortable/unsatisfied they are with how these things are dealt with, even in their own packages. The point is, handling timezones and operations you may want to perform with them is really tricky business and there is far from consensus with how to do it. Most packages I've seen basically take the approach of providing an 80% solution. Sure, you can create a DateTime object with a timezone, but the implementation isn't going to guard you against some of the gotchas I mentioned earlier or perhaps not even provide defined behavior. I happen to think this is unfortunate and really hope to find a way to do better, without sacrificing simplicity and performance. I've based all this work on these packages, so I'm hoping we're headed in the right direction! For now though, instead of putting a half-baked I think we really achieve a great balance of simplicity and conceptual correctness with the current implementation. From @StefanKarpinski's great philosophical layout, you can see the underlying |
@StefanKarpinski I compeltely agree with you points (except that @karbarcca It sounds indeed very reasonable to implement the most important features first, and leave the hardest cases for later. But what I don't understand is why you don't make As I said above, to me, even more important than arithmetic (which is clearly the tricky part), is the ability to parse (or create from Julia code) a date which contains time zone information, and to present it to the user (print, plot...) in the same time zone by default, because that's what makes sense to people. You don't need to support anything fancy for that: you may even require people to explicitly convert their Anyway, you seem to master these things, so I'm not really worried. ;-) |
0 < m < 13 || error("Month out of range") | ||
rata = ms + 1000*(s + 60mi + 3600h + 86400*totaldays(y,m,d)) | ||
return UTCDateTime(Millisecond( | ||
rata + (s == 60 ? setleapsecond(rata) : setleaps(rata)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're converting UTC -> atomic time here, you should throw an error if the conversion is undefined, i.e., before 1972 or after the latest Bulletin C expires (currently 31 Dec 2014).
Although technically i suppose you could extend back to 1961, but that's a bit more complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow you. Why is the conversion undefined? The date algorithms are based on rata die days/milliseconds and the proleptic gregorian calendar, so the conversions between date representations and "instants" is compatible from typemin(Date)
to typemax(Date)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to further explain myself, the idea of the proleptic gregorian calendar, is that we take the current gregorian and apply it retroactively through time. This is spelled out clearly in the docs that we aren't messing with missing days in 1582 or anything when calendars were switched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proleptic gregorian exists, but proleptic UTC doesn't. You'd have to go back and define a fake schedule of leap seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's silly to throw an error for dates before 1972. We could display 1970-01-01T00:00:00 GMT
for dates before 1972, or even not display any abbreviation, but that all seems a little far-reaching. My vote is that we use proleptic UTC. I also don't see any reason to mess with things in the future. If using UTC
makes anyone feel to0 queasy or uneasy, then we can change everything to UT
or GMT
to denote that we're using a sensible default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not really relevant. The problem isn't with the time, it's with the conversion to a single number of SI seconds. Since we're not doing the conversion correctly, we should just be honest about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not following you, you'll have to explain yourself in more detail. From what I understand, the current implementation is following ISO 8601 standards, using the proleptic gregorian calendar and proleptic UTC (if you will) to represent instants of time retroactively. I don't see where we're being dishonest if we tell people that's the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dug into this a little more and it turns out this is pretty much the behavior everyone else follows as well (proleptic UTC). Checking Joda, Noda, and lubridate all showed they use UTC, even for pre-1972 dates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, this stuff is a bit complicated.
Roughly speaking, there are two major time standards: UT & TT. UT is "astronomer" time, based on the rotation of the earth. TT is "physicist" time, based on ticking clocks. They have different units: a UT second is really an angle, and a TT second is the usual SI second.
UTC is a clever attempt to bridge these two systems.
When Joda says "leap seconds are not supported", what they mean is "we implement UT".
When you say "leap seconds are supported", what you mean is "we implement TT". As a result, you can define
GPSTime(t::UTCDateTime) = t.instant.ms - 62451648009000
Passing a timestamp like "1939-10-28 09:00:00 GMT" to Joda works because GMT is part of the UT family. Passing the same timestamp to your code requires a conversion from UT -> TT. This correction is called ΔT, and it isn't being applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what would we need to change? It seems like we just need to further document the kind of construction/conversion we're doing so that if people are really interested, they can find out what exactly we're supporting. From a routine user's perspective though, I don't think they're going to think twice that we call our DateTime
s UTC, support leap seconds, and inputting dates pre 1972 isn't anything different. Again, I'm definitely open to changing the default abbreviation here if that simplifies things. Or not have an abbreviation at all since it seems like it could lead to confusion. Maybe that's best: no abbreviation and we clearly document how construction/conversion works.
|
||
Periods are a human view of discrete, sometimes irregular durations of time. Consider 1 month; it could represent, in days, a value of 28, 29, 30, or 31 depending on the year and month context. Or 1 year could represent 365 or 366 days in the case of a leap year. These points are relevant when ``Date/DateTime-Period`` arithmetic is considered (which is discussed below). Despite their irregular nature, ``Period`` types in Julia are useful when working with ``TimeType``\s. ``Period`` types are simple wrappers of a ``Int64`` type and constructed by wrapping any ``Integer`` type, i.e. ``Year(1)`` or ``Month(3)``. Arithmetic between ``Period``\s of the same type behave like ``Integer``\s, and ``Period-Real`` arithmetic promotes the ``Real`` to the ``Period`` type:: | ||
|
||
julia> y1 = Time.Year(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small possible enhancement: why not have a constant years
equal to Year(1)
, and the same for all periods? This would allow writing the quite neat 1 years + 2 months + 4 days
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks kind of nice, but I would object to reserving such useful variable names for this kind of pretty syntax. You could hide them in a module that the user more might import with using
, or just list the 6 const declarations in the documentation for users to copy into their code when they need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm.......that is pretty nice, but I think agree with @ivarne that those are valuable variable names to take and pretty trivial to define on your own. The constructors for periods used to be years(1); months(1)
, but I actually changed them to be more in line with the thinking in #1470 (i.e. always use Proper-case names for type constructors). While this technically isn't constructing the Period types, I think it'd still be better to leave as is in Base.
Agreed those |
Closing for now. |
What happened? It doesn't appear that this was merged. |
Stefan said he'd like to take some more time to look things over. In the mean time, I've cleaned up some more code and it's currently available on the Pkg.add("Datetime")
Pkg.checkout("Datetime","dev")
include(Pkg.dir() * "/Datetime/src/Dates.jl")
using Dates My thinking is also evolving with regards to what constitutes "Base" code vs. default packages. See this issue for more discussion. The main idea is that code like DateTime functionality wouldn't be in "Base", but could be included as a default package shipped with the main julia distro, but it's easier for people who don't need the functionality to deselect or exclude the code they don't need by creating their own distro with just the default packages they need (probably not as big a deal with this package because there are no binary deps or non-julia code involved, but much more useful for openblas code, fftw, mpfr, etc.). |
As discussed in #3524 and @karbarcca/Datetime#12.
This is most of the Datetime package reworked to simplify, speed up, and polish some functionality (e.g. date parsing). Notably missing is full Timezone support, which I've also been reworking. I figured I'd submit this PR and things can get ironed out a little more before all the Timezone code is finalized. I also have the Datetime manual and function reference which can be folded in fairly easily once things settle down.
I've included a few RFC, TODOs, and some comments on certain pieces that I'd appreciate feedback on. Also any feedback on how to fold this into Base would be appreciated as I've never pushed anything like this before.
Cheers!