-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Regressions since 0.20.3 #17861
Comments
cc @jbrockmendel if u have any ideas here |
I'll take a look at this. |
Best guess is the change in
New:
|
Hmm in %timeit reverting this makes a pretty big difference for the |
Does asv have something analogous to |
Sorry, those commit hashes in the original post look entirely incorrect... Let me rerun them. |
Updated:
verifying for correctness and then I'll start digging into them. |
are real.
http://pandas.pydata.org/speed/pandas/#inference.DtypeInfer.time_datetime64 points to ~ eef810ef (cc @jreback) |
DtypeInfer:
SeriesArthmetic.time_add_offset*
These are handled by #17980 |
categoricals.Categoricals3.time_rank_string_cat_ordered
Seems like this is ranking the objects, instead of codes. This is handled by #17982 |
Fixed by #17980 |
join_merge:
Ah dang, http://pandas.pydata.org/speed/pandas/#join_merge.Align.time_series_align_int64_index points to the import delay PR: 2310faa1 I think something strange is going on w/ the benchmark... On 0.20.3 and 0.21.0rc1, I'm seeing the numexpr import affecting the benchmark time. The first run is always ~120ms. Subsequent runs are ~25-30ms. I'l come back to this later. |
Looks like #17295 for this one http://pandas.pydata.org/speed/pandas/#join_merge.Align.time_series_align_int64_index cc @jreback not sure if there's anything to be done there. |
I believe this was due to 3fae8dd |
|
|
|
|
For morale purposes: is there any good news? |
:) Yeah there were quite a few:
|
For |
Are you sure its |
I think it's in |
Hmm that may be it: diff --git a/pandas/_libs/period.pyx b/pandas/_libs/period.pyx
index cf6ef91d3..f28ee2aac 100644
--- a/pandas/_libs/period.pyx
+++ b/pandas/_libs/period.pyx
@@ -718,7 +718,7 @@ cdef class _Period(object):
(type(self).__name__, type(other).__name__))
def __hash__(self):
- return hash((self.ordinal, self.freqstr))
+ return hash((self.ordinal, self.freq))
def _add_delta(self, other):
if isinstance(other, (timedelta, np.timedelta64, offsets.Tick)):
( master:
with that diff:
Let's see if that breaks anything. |
Can you try that with a handful of different frequencies? I'd be shocked if |
FWIW, it doesn't seem promising w.r.t. the benchmark (see the 2nd and 4th) lines. I rebuilt the c extensions between these two. My branch with the new hash is on top, master is on bottom.
|
@jreback if you have time to look at the indexing ones, I'd appreciate it: #17861 (comment) and #17861 (comment) I'm not sure there's a whole lot to be done though. The first is because we have to check for missing values with list-like indexers now, to raise the warning. |
Oh, I'm dumb. Mixed up nanoseconds and microseconds in #17861 (comment) |
I think we decided in #17574 that the sparse changes were OK? |
I think everything has been addressed. |
These are fairly reproducible on my machine:
The text was updated successfully, but these errors were encountered: