Skip to content

Commit

Permalink
BUG/ENH: provide better .loc based semantics for float based indicies…
Browse files Browse the repository at this point in the history
…, continuing not to fallback (related to GH236)

API: stub-implementation of Float64Index (meaning its the 'same' as an Index at this point), related GH263

BUG: make sure that mixed-integer-float is auto-converted to Float64Index

BUG: (GH4860), raise KeyError on indexing with a float (and not a floating index)

CLN/ENH: add _convert_scalar_indexer in core/index.py

CLN/ENH: add _convert_slice_indexer in core/index.py

TST: slice indexer test changes in series/frame

BUG: fix partial multi-indexing

DOC: release notes / v0.13.0
TST: tests for doc example

CLN: core/indexing.py clean
DOC: add to indexing.rst

DOC: small corections

API: make Float64Index getitem/ix/loc perform the same (as label based) for scalar indexing

TST: Float64Index construction & tests

TST: misc tests corrected in test_series/test_reshape/test_format

TST: Float64Index tests indexing via lists

DOC: additional example for float64index
  • Loading branch information
jreback committed Sep 25, 2013
1 parent fac7d1d commit 60efe85
Show file tree
Hide file tree
Showing 19 changed files with 1,080 additions and 317 deletions.
109 changes: 98 additions & 11 deletions doc/source/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1199,22 +1199,109 @@ numpy array. For instance,
dflookup = DataFrame(np.random.rand(20,4), columns = ['A','B','C','D'])
dflookup.lookup(list(range(0,10,2)), ['B','C','A','B','D'])
Setting values in mixed-type DataFrame
--------------------------------------
.. _indexing.float64index:

.. _indexing.mixed_type_setting:
Float64Index
------------

.. versionadded:: 0.13.0

Setting values on a mixed-type DataFrame or Panel is supported when using
scalar values, though setting arbitrary vectors is not yet supported:
By default a ``Float64Index`` will be automatically created when passing floating, or mixed-integer-floating values in index creation.
This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the
same.

.. ipython:: python
df2 = df[:4]
df2['foo'] = 'bar'
print(df2)
df2.ix[2] = np.nan
print(df2)
print(df2.dtypes)
indexf = Index([1.5, 2, 3, 4.5, 5])
indexf
sf = Series(range(5),index=indexf)
sf
Scalar selection for ``[],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)

.. ipython:: python
sf[3]
sf[3.0]
sf.ix[3]
sf.ix[3.0]
sf.loc[3]
sf.loc[3.0]
The only positional indexing is via ``iloc``

.. ipython:: python
sf.iloc[3]
A scalar index that is not found will raise ``KeyError``

Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS positional with ``iloc``

.. ipython:: python
sf[2:4]
sf.ix[2:4]
sf.loc[2:4]
sf.iloc[2:4]
In float indexes, slicing using floats is allowed

.. ipython:: python
sf[2.1:4.6]
sf.loc[2.1:4.6]
In non-float indexes, slicing using floats will raise a ``TypeError``

.. code-block:: python
In [1]: Series(range(5))[3.5]
TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index)
In [1]: Series(range(5))[3.5:4.5]
TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index)
Using a scalar float indexer will be deprecated in a future version, but is allowed for now.
.. code-block:: python
In [3]: Series(range(5))[3.0]
Out[3]: 3
Here is a typical use-case for using this type of indexing. Imagine that you have a somewhat
irregular timedelta-like indexing scheme, but the data is recorded as floats. This could for
example be millisecond offsets.
.. ipython:: python
dfir = concat([DataFrame(randn(5,2),
index=np.arange(5) * 250.0,
columns=list('AB')),
DataFrame(randn(6,2),
index=np.arange(4,10) * 250.1,
columns=list('AB'))])
dfir
Selection operations then will always work on a value basis, for all selection operators.
.. ipython:: python
dfir[0:1000.4]
dfir.loc[0:1001,'A']
dfir.loc[1000.4]
You could then easily pick out the first 1 second (1000 ms) of data then.
.. ipython:: python
dfir[0:1000]
Of course if you need integer based selection, then use ``iloc``
.. ipython:: python
dfir.iloc[0:5]
.. _indexing.view_versus_copy:
Expand Down
4 changes: 4 additions & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,10 @@ API Changes
add top-level ``to_timedelta`` function
- ``NDFrame`` now is compatible with Python's toplevel ``abs()`` function (:issue:`4821`).
- raise a ``TypeError`` on invalid comparison ops on Series/DataFrame (e.g. integer/datetime) (:issue:`4968`)
- Added a new index type, ``Float64Index``. This will be automatically created when passing floating values in index creation.
This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the same.
Indexing on other index types are preserved (and positional fallback for ``[],ix``), with the exception, that floating point slicing
on indexes on non ``Float64Index`` will raise a ``TypeError``, e.g. ``Series(range(5))[3.5:4.5]`` (:issue:`263`)

Internal Refactoring
~~~~~~~~~~~~~~~~~~~~
Expand Down
66 changes: 66 additions & 0 deletions doc/source/v0.13.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,72 @@ Indexing API Changes
p
p.loc[:,:,'C']

Float64Index API Change
~~~~~~~~~~~~~~~~~~~~~~~

- Added a new index type, ``Float64Index``. This will be automatically created when passing floating values in index creation.
This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the
same. See :ref:`the docs<indexing.float64index>`, (:issue:`263`)

Construction is by default for floating type values.

.. ipython:: python

index = Index([1.5, 2, 3, 4.5, 5])
index
s = Series(range(5),index=index)
s

Scalar selection for ``[],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)

.. ipython:: python

s[3]
s.ix[3]
s.loc[3]

The only positional indexing is via ``iloc``

.. ipython:: python

s.iloc[3]

A scalar index that is not found will raise ``KeyError``

Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS positional with ``iloc``

.. ipython:: python

s[2:4]
s.ix[2:4]
s.loc[2:4]
s.iloc[2:4]

In float indexes, slicing using floats are allowed

.. ipython:: python

s[2.1:4.6]
s.loc[2.1:4.6]

- Indexing on other index types are preserved (and positional fallback for ``[],ix``), with the exception, that floating point slicing
on indexes on non ``Float64Index`` will now raise a ``TypeError``.

.. code-block:: python

In [1]: Series(range(5))[3.5]
TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index)

In [1]: Series(range(5))[3.5:4.5]
TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index)

Using a scalar float indexer will be deprecated in a future version, but is allowed for now.

.. code-block:: python

In [3]: Series(range(5))[3.0]
Out[3]: 3

HDFStore API Changes
~~~~~~~~~~~~~~~~~~~~

Expand Down
3 changes: 1 addition & 2 deletions doc/source/v0.4.x.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,7 @@ New Features
with choice of join method (ENH56_)
- :ref:`Added <indexing.get_level_values>` method ``get_level_values`` to
``MultiIndex`` (:issue:`188`)
- :ref:`Set <indexing.mixed_type_setting>` values in mixed-type
``DataFrame`` objects via ``.ix`` indexing attribute (:issue:`135`)
- Set values in mixed-type ``DataFrame`` objects via ``.ix`` indexing attribute (:issue:`135`)
- Added new ``DataFrame`` :ref:`methods <basics.dtypes>`
``get_dtype_counts`` and property ``dtypes`` (ENHdc_)
- Added :ref:`ignore_index <merging.ignore_index>` option to
Expand Down
2 changes: 1 addition & 1 deletion pandas/computation/tests/test_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -778,7 +778,7 @@ def check_chained_cmp_op(self, lhs, cmp1, mid, cmp2, rhs):
class TestAlignment(object):

index_types = 'i', 'u', 'dt'
lhs_index_types = index_types + ('f', 's') # 'p'
lhs_index_types = index_types + ('s',) # 'p'

def check_align_nested_unary_op(self, engine, parser):
skip_if_no_ne(engine)
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from pandas.core.categorical import Categorical, Factor
from pandas.core.format import (set_printoptions, reset_printoptions,
set_eng_float_format)
from pandas.core.index import Index, Int64Index, MultiIndex
from pandas.core.index import Index, Int64Index, Float64Index, MultiIndex

from pandas.core.series import Series, TimeSeries
from pandas.core.frame import DataFrame
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -2050,7 +2050,7 @@ def eval(self, expr, **kwargs):
kwargs['local_dict'] = _ensure_scope(resolvers=resolvers, **kwargs)
return _eval(expr, **kwargs)

def _slice(self, slobj, axis=0, raise_on_error=False):
def _slice(self, slobj, axis=0, raise_on_error=False, typ=None):
axis = self._get_block_manager_axis(axis)
new_data = self._data.get_slice(
slobj, axis=axis, raise_on_error=raise_on_error)
Expand Down
Loading

0 comments on commit 60efe85

Please sign in to comment.