-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed up date time parsing #19545
speed up date time parsing #19545
Conversation
|
||
""" | ||
DateTime(dt::AbstractString, df::DateFormat) -> DateTime | ||
|
||
Construct a `DateTime` by parsing the `dt` date string following the pattern given in | ||
the [`DateFormat`](@ref) object. Similar to | ||
the [`DateFormat`](:func:`Dates.DateFormat`) object. Similar to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad rebase here
@@ -1009,6 +1009,8 @@ export | |||
# dates | |||
Date, | |||
DateTime, | |||
DateFormat, | |||
@dateformat_str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be added to the manual index if exported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see contributing.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you meant I need to add this to the manual, I have. I was wondering if there's some kind of list I need to update when you said "manual index"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not talking about doc/manual, talking about doc/stdlib for the docstring. yes there is a list, again see contributing.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay. Will do.
Amazing work, @shashi! I would note that this still doesn't preclude having even more efficient custom code for the three or so common date formats. |
@StefanKarpinski that's right, and you don't even have to give them a name and add them to a type hierarchy. :) Just dispatch on |
Can this be backported to 0.5? |
It breaks compatibility with the previous |
@@ -268,7 +275,7 @@ f = "y m d" | |||
@test_throws ArgumentError Dates.Date(" 1 1 32",f) | |||
@test_throws ArgumentError Dates.Date("# 1 1 32",f) | |||
# can't find 1st space delimiter,s o fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment no longer applies?
rata = xs[7] + 1000*(xs[6] + 60*xs[5] + 3600*xs[4] + 86400*Base.Dates.totaldays(xs[1],xs[2],xs[3])) | ||
return DateTime(Base.Dates.UTM(rata)) | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the advantage of having this duplicate constructor versus:
DateTime(t::NTuple{7,Int}) = DateTime(t[1], t[2], t[3], t[4], t[5], t[6], t[7])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current implementation will run into issues with integer overflow on 32-bit systems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a cost to splatting so I added this, forgot to remove / define one with the other. Was going to do this, thanks!
The tests are failing on 32 bit machines because of overflow. I modified this method to convert the input tuple to Int32s but wasn't able to reproduce the failure. I'm wondering how tests passed in the previous implementation if the constructor is the problem. Do you have an idea of where exactly it breaks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just DateTime(t...)
return R(reorder_args(parts, fmt.field_order, fmt.field_defaults, err_idx)::NTuple{7,Int}) | ||
|
||
@label error | ||
return R((err_idx,state,0,0,0,0,0), true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to indicate null
here you want false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, added a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was a 0.6 change, so may give different results than 0.5
end | ||
end | ||
|
||
_create_timeobj(tup, T::Type{DateTime}) = T(tup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not T(tup...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed changed this in a later commit. this review business is a bit confusing. Thanks for reviewing!
This looks pretty awesome. A few minor changes mostly pointed out by @omus. |
@label error | ||
return R((err_idx,state,0,0,0,0,0), true) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtjnash do you know of a way to make this not generated, but still get the same performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the classic way of doing it in a static language (hardcoding a jump table / switch / vtable), but afaik there isn't really a good way to express that optimization in Julia right now (other than by doing a generated function like this). fwiw, since this only uses @nexpr
, it appears to be a completely valid generated function.
@@ -1009,6 +1009,8 @@ export | |||
# dates | |||
Date, | |||
DateTime, | |||
DateFormat, | |||
@dateformat_str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I do this?
rata = xs[7] + 1000*(xs[6] + 60*xs[5] + 3600*xs[4] + 86400*Base.Dates.totaldays(xs[1],xs[2],xs[3])) | ||
return DateTime(Base.Dates.UTM(rata)) | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a cost to splatting so I added this, forgot to remove / define one with the other. Was going to do this, thanks!
The tests are failing on 32 bit machines because of overflow. I modified this method to convert the input tuple to Int32s but wasn't able to reproduce the failure. I'm wondering how tests passed in the previous implementation if the constructor is the problem. Do you have an idea of where exactly it breaks?
return R(reorder_args(parts, fmt.field_order, fmt.field_defaults, err_idx)::NTuple{7,Int}) | ||
|
||
@label error | ||
return R((err_idx,state,0,0,0,0,0), true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, added a test
end | ||
end | ||
|
||
_create_timeobj(tup, T::Type{DateTime}) = T(tup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed changed this in a later commit. this review business is a bit confusing. Thanks for reviewing!
I am concerned about these changes as they break support for custom parser slots which don't return integers. Specifically, these new changes won't work with parsing TimeZone types. |
is used. This object is passed as the last argument to | ||
`tryparsenext` and `format` defined for each `AbstractDateToken` type. | ||
""" | ||
immutable DateLocale{lang} end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly recommend against baking lang
into a type parameter. This will actually make the API more awkward to extend than presenting it to the user as a dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I can make this type hold two dictionaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to think about how this can be generalised to other locale-based print functions (e.g. decimal separators), though that may be better in a different PR.
@label error | ||
return R((err_idx,state,0,0,0,0,0), true) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the classic way of doing it in a static language (hardcoding a jump table / switch / vtable), but afaik there isn't really a good way to express that optimization in Julia right now (other than by doing a generated function like this). fwiw, since this only uses @nexpr
, it appears to be a completely valid generated function.
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
Might it make sense to do #16928 as part of this as well? |
I don't see any reason to open a new PR. |
Interesting: I wonder why the printing is now slower? |
@simonbyrne I did not benchmark printing speeds, but had to redo it to accomodate the new DateFormat. There should be easy speedups available. |
This is some excellent work. Real julia wizardry on display here! |
Thanks @JeffBezanson 😄 |
@omus I have merged your changes in here with the required conflict resolutions. Do pull them, and let me know when you have more updates so that I can pull them into this PR. |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
I tried reverting the change. On my machine there's barely any change. I don't know how #19545 (comment) gave a green tick for these benchmarks. |
I'm not sure, but it's possible this might be hitting problems related to #20025 ? |
That good set of results was just before we merged the llvm 3.9 upgrade, so newer llvm changes the performance characteristics of the patches here w.r.t. those benchmarks. |
Wait, so what's the status here? The string change is so kickass that it makes these speed ups irrelevant? |
@StefanKarpinski No, there's a regression in converting DateTime to string which I'm trying to track down. Parsing is still much much faster with this. |
dateformat"YYYY-mm-dd\THH:MM:SS" | ||
else | ||
dateformat"YYYY-mm-dd\THH:MM:SS.s" | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wait. this is not really a problem...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad it's not a problem. If we were to change this I think I would prefer "YYYY-mm-dd\THH:MM:SS.sss"
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
|
write(io, str) | ||
else | ||
l = endof(str) | ||
write(io, SubString(str,l-(n-1),l)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably save endof(str)
before the if.
if millisecond(dt) == 0 | ||
format(dt, dateformat"YYYY-mm-dd\THH:MM:SS", 24) | ||
else | ||
format(dt, dateformat"YYYY-mm-dd\THH:MM:SS.s", 26) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are milliseconds in a DateTime only ever a single digit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, s
stands for any number of digits e.g. s
gives .5
or .05
or .005
while .sss
would give .500
, .050
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so is the 26 a minimum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's usually 23 chars, I added 3 more bytes to keep this going fast even when the year is -10000
;) I'm not sure if this really matters, doesn't julia allocate the buffer backing an array in multiples of 16 bytes or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. As long as this is just a rough-guess buffer preallocation and things work fine if the actual output is longer or shorter then I'll just go with whatever the benchmarks say.
51a7cbc
to
cbe5577
Compare
Date formatting (to arbitrary formats) is 5x faster than before. So in a bit of a cop out, I changed @nanosoldier |
end | ||
end | ||
|
||
function Base.string(dt::Date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to delete the now-duplicate one at line 46 then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes, I should overlooked that when resolving merge conflicts.
cbe5577
to
2a86b75
Compare
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
format the `tok` token from `dt` and write it to `io`. The formatting can | ||
be based on `locale`. | ||
|
||
all subtypes of `AbstractDateToken` must define this method in order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First word in a sentence should be capitalized
Performance improvements introduced in Base.Dates no longer allow for TimeZones to extend the functionality. JuliaLang/julia#19545
Performance improvements introduced in Base.Dates no longer allow for TimeZones to extend the functionality. JuliaLang/julia#19545
Performance improvements introduced in Base.Dates no longer allow for TimeZones to extend the functionality. JuliaLang/julia#19545
This PR builds on @simonbyrne's #18000 and makes the (120x 😆) speed up available for custom date formats. It also re-does the DateFormat type to be sufficiently parameterized so that an efficient parsing function can be "generated". This also incidentally makes it easier to add new features to the parser.
before:
after:
Also works fast for custom date format strings.
fixes #15888, #13644
supersedes #18000