-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: added no exception versions of '_string_to_dts' and 'parse_iso_8601_datetime' functions #26220
Conversation
Codecov Report
@@ Coverage Diff @@
## master #26220 +/- ##
==========================================
- Coverage 91.97% 91.97% -0.01%
==========================================
Files 175 175
Lines 52379 52379
==========================================
- Hits 48178 48175 -3
- Misses 4201 4204 +3
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26220 +/- ##
==========================================
- Coverage 91.98% 91.98% -0.01%
==========================================
Files 175 175
Lines 52379 52374 -5
==========================================
- Hits 48183 48174 -9
- Misses 4196 4200 +4
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the speedup here is purely replacing the "except? -1" clause in the cdef with manual error checking right?
@WillAyd you're right. |
OK thanks. Do you know if the question mark in that statement has a performance hit? Not against this change just surprised to see it make that big of a difference. Certainly being able to leverage the built-in Cython functionality would be simpler and preferable if removing the question mark gets us closer (hence my question) |
The performance degradation from this is unimportant because most likely this is one additional
The main performance increase is that we stop throwing a |
@anmyachev any idea how many more formats/functions you plan to optimize like this? In the past we've been trying to get code out of the C files and to stay python-idiomatic where possible. It looks like you're making an effort to keep the non-idiomatic code contained to np_datetime, which is good. Could the raising of ValueError be made more efficient upstream in cython? Just spitballing: if the return-code based version is really equivalent to the raising-based version, it seems like cython should be able to figure that out. |
pandas/_libs/tslib.pyx
Outdated
iresult[i] = NPY_NAT | ||
string_to_dts_failed = _string_to_dts_noexc( | ||
val, &dts, &out_local, | ||
&out_tzoffset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think cython now supports specifying typed tuples for return types, e.g. cdef (bint, int, int) _string_to_dts_no_exc(...)
. That would be more idiomatic than pointer-passing. Thoughts @WillAyd ?
At the moment, nothing more in this way. This function was "hot" when profiling, and the main slowdown was just in throwing an exception. Other functions with exactly the same problem have not been identified yet.
Probably, but the main problem is the speed of exceptions in the python itself, and not how сython generates it. In the first case, we need the function to throw an exception, and in the other it does not, depending on our choice(for this we use If the existence of functions "_string_to_dts_noexc" and "_string_to_dts" is not very good for you, then I can try to make one function, but with an additional parameter (to throw an exception or not) while leaving with the instruction "except? -1". What do you say? |
Not a problem at all. Why would we ever use the slow version?
_string_to_dts is |
This is possible if desired, we'd just have to make The exceptions themselves are not that slow, but in this particular case they were taking way more time than As for Cython being smart there - I think it might be possible to make a keyword or something that would tell Cython to generate some error-code-based error checking and hide it pretending it's exceptions (like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is adding a lot of complexity for a relativily small gain. If you want to remove the exception handling, then just do it and force a check everytime it is used. having 2 function is way more complex code.
@jreback "noexc" version of functions removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this actually have a non-trivial improvemet?
@jreback I simplified changes. |
030ffa0
to
543257a
Compare
thanks @anmyachev |
This seems to have slowed down |
@TomAugspurger I'll see what's the problem. |
git diff upstream/master -u -- "*.py" | flake8 --diff
asv continuous -f 1.05 origin/master HEAD -b ^io.csv -b ^timeseries -a warmup_time=1 -a sample_time=1
: