-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible performance regression in 9a42cbe85461c28417a5130bc80b035044c5575a #26639
Comments
@TomAugspurger I should add that that #23167 came about mainly because @jreback was adamant (in several PRs I had concerning Actually, I had to degrade the performance of Coming back to the perf-hit of #23167: I'm fine with reverting, it that's the consensus. From my reading, the perf hit is almost exclusively the cost of inferring dtypes (to be able to exclude some where deemed necessary) - the question is if people are happy with that trade-off. One other option might be to cache the inferred dtype somewhere, but that would likely have to be on the Series itself, since the StringMethods-object gets reconstructed on every call. |
This may be a blocker. Verifying that the type checking is the cause of the slowdown now. |
As a basic test, I simply commented out the |
@TomAugspurger I guess it would be possible to shift the inference into the wrapper, and call that wrapper more sparingly. |
@h-vetinari I asked u to update the benchmarks in #26605 |
@jreback @TomAugspurger |
Didn't look in detail to #23167, but if we want to remove the call to
I don't think this is possible? We only have object dtype (when a bytes dtype is passed in to Series, we convert it to object dtype) |
Actually, we seem to keep the bytes dtype, never realised we do that. That might be a bug actually:
|
Although Tom, you were saying that undoing the infer_type doesn't improve the performance? |
There are IMO parts to keep, like #23551, but I guess I could also re-add that separately. In any case, a reversion will probably create merge conflicts.
If the behaviour has been there long enough, it's now a "feature"... (similar to the behaviour that
As I mentioned above, he didn't undo the |
IMO, the most reasonable solution would be to remove the |
I don't think we currently want to tie us to the bytes dtype. It also seems "relatively" recent, eg in 0.18.0 it was still converted to object dtype (but didn't check other versions) |
I'd be happy to help with solving this, just have to add that I'll be away for the next two weeks. In case you urgently need a solution, I won't be able to help, but I can have a look afterwards. |
Maybe not worth worrying about at this point? For |
I think this is likely correct. Not to mention #30202 should improve the perf on infer_dtype |
@jbrockmendel #30202 improves performance for only numpy dtypes when |
I think we'll close this, since StringDtype avoids it. |
Possible regression in 9a42cbe
ASV: http://pandas.pydata.org/speed/pandas/index.html#strings.Methods.time_normalize?commits=9a42cbe85461c28417a5130bc80b035044c5575a
A bunch of benchmarks are ~20% slower. Not sure if that was talked about in the issue.
cc @h-vetinari
The text was updated successfully, but these errors were encountered: