-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DatetimeIndex selection with .loc is orders of magnitude slower than [] on ordered frame #17754
Comments
is this the case on master as well? |
Yes INSTALLED VERSIONScommit: ad359f6 pandas: 0.21.0.dev+528.gad359f6 |
Ok, I think now I went to the master branch. It's still an issue
INSTALLED VERSIONS
------------------
commit: f797408
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.21.0.dev+568.gf797408 |
I can replicate this on master. Went back to 0.18.1 and looks like it was happening back then too. Using
Full output for
Looks like
And for additional comparison if it's useful
|
Timings on master look better now, with sorted about 3x faster than not: >>> %timeit df.loc['2011-6-11']
1.49 ms ± 18.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df_sort.loc['2011-6-11']
550 µs ± 8.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) The behavior from the 1st 2 examples is deprecated now, though while df_sort['2011-6-11'] gives a df['2011-6-11'] raises: ---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-28-abc338eb08f8> in <module>
----> 1 df['2011-6-11']
~/Code/contrib/pandas-mzeitlin11/pandas/core/frame.py in __getitem__(self, key)
3005 # either we have a slice or we have a string that can be converted
3006 # to a slice for partial-string date indexing
-> 3007 return self._slice(indexer, axis=0)
3008
3009 # Do we have a (boolean) DataFrame?
~/Code/contrib/pandas-mzeitlin11/pandas/core/generic.py in _slice(self, slobj, axis)
3815 Slicing with this method is *always* positional.
3816 """
-> 3817 assert isinstance(slobj, slice), type(slobj)
3818 axis = self._get_block_manager_axis(axis)
3819 result = self._constructor(self._mgr.get_slice(slobj, axis=axis))
AssertionError: <class 'numpy.ndarray'> |
@mzeitlin11 do we have asvs for this? |
I don't see one, will open an issue for it |
Since the performance has greatly improved now going to close |
Code Sample, a copy-pastable example if possible
Problem description
When using the
.loc
indexer on a large frame with sorted datetimeindex, selection is ~2500 times slower than just the indexing operator itself. It's also ~300 times slower than the unsorted .loc lookup.Slicing appears to work as expected
Expected Output
Sorted frame should be faster when using
.loc
The text was updated successfully, but these errors were encountered: