Skip to content

Commit

Permalink
merge
Browse files Browse the repository at this point in the history
  • Loading branch information
jbandlow committed Feb 6, 2018
2 parents af37225 + 93c86aa commit 2fb23d6
Show file tree
Hide file tree
Showing 59 changed files with 1,521 additions and 932 deletions.
17 changes: 17 additions & 0 deletions asv_bench/benchmarks/index_object.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,11 @@ def setup(self, dtype):
self.idx = getattr(tm, 'make{}Index'.format(dtype))(N)
self.array_mask = (np.arange(N) % 3) == 0
self.series_mask = Series(self.array_mask)
self.sorted = self.idx.sort_values()
half = N // 2
self.non_unique = self.idx[:half].append(self.idx[:half])
self.non_unique_sorted = self.sorted[:half].append(self.sorted[:half])
self.key = self.sorted[N // 4]

def time_boolean_array(self, dtype):
self.idx[self.array_mask]
Expand All @@ -163,6 +168,18 @@ def time_slice(self, dtype):
def time_slice_step(self, dtype):
self.idx[::2]

def time_get_loc(self, dtype):
self.idx.get_loc(self.key)

def time_get_loc_sorted(self, dtype):
self.sorted.get_loc(self.key)

def time_get_loc_non_unique(self, dtype):
self.non_unique.get_loc(self.key)

def time_get_loc_non_unique_sorted(self, dtype):
self.non_unique_sorted.get_loc(self.key)


class Float64IndexMethod(object):
# GH 13166
Expand Down
2 changes: 1 addition & 1 deletion doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -672,7 +672,7 @@ The ``CategoricalIndex`` is **preserved** after indexing:
df2.loc['a'].index
Sorting the index will sort by the order of the categories (Recall that we
created the index with with ``CategoricalDtype(list('cab'))``, so the sorted
created the index with ``CategoricalDtype(list('cab'))``, so the sorted
order is ``cab``.).

.. ipython:: python
Expand Down
4 changes: 2 additions & 2 deletions doc/source/comparison_with_sas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ date/datetime columns.
The equivalent pandas operations are shown below. In addition to these
functions pandas supports other Time Series features
not available in Base SAS (such as resampling and and custom offsets) -
not available in Base SAS (such as resampling and custom offsets) -
see the :ref:`timeseries documentation<timeseries>` for more details.

.. ipython:: python
Expand Down Expand Up @@ -584,7 +584,7 @@ For example, in SAS you could do this to filter missing values.
if value_x ^= .;
run;
Which doesn't work in in pandas. Instead, the ``pd.isna`` or ``pd.notna`` functions
Which doesn't work in pandas. Instead, the ``pd.isna`` or ``pd.notna`` functions
should be used for comparisons.

.. ipython:: python
Expand Down
2 changes: 1 addition & 1 deletion doc/source/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -512,7 +512,7 @@ a same sized result as the input.

When using ``.resample()`` with an offset. Construct a new index that is the frequency of the offset. For each frequency
bin, aggregate points from the input within a backwards-in-time looking window that fall in that bin. The result of this
aggregation is the output for that frequency point. The windows are fixed size size in the frequency space. Your result
aggregation is the output for that frequency point. The windows are fixed size in the frequency space. Your result
will have the shape of a regular frequency between the min and the max of the original input object.

To summarize, ``.rolling()`` is a time-based window operation, while ``.resample()`` is a frequency-based window operation.
Expand Down
7 changes: 4 additions & 3 deletions doc/source/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1219,8 +1219,8 @@ see :ref:`here <basics.pipe>`.
Combining ``.groupby`` and ``.pipe`` is often useful when you need to reuse
GroupBy objects.

For an example, imagine having a DataFrame with columns for stores, products,
revenue and sold quantity. We'd like to do a groupwise calculation of *prices*
As an example, imagine having a DataFrame with columns for stores, products,
revenue and quantity sold. We'd like to do a groupwise calculation of *prices*
(i.e. revenue/quantity) per store and per product. We could do this in a
multi-step operation, but expressing it in terms of piping can make the
code more readable. First we set the data:
Expand All @@ -1230,7 +1230,8 @@ code more readable. First we set the data:
import numpy as np
n = 1000
df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n),
'Product': np.random.choice(['Product_1', 'Product_2', 'Product_3'], n),
'Product': np.random.choice(['Product_1',
'Product_2'], n),
'Revenue': (np.random.random(n)*50+10).round(2),
'Quantity': np.random.randint(1, 10, size=n)})
df.head(2)
Expand Down
2 changes: 1 addition & 1 deletion doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4529,7 +4529,7 @@ Several caveats.
on an attempt at serialization.

You can specify an ``engine`` to direct the serialization. This can be one of ``pyarrow``, or ``fastparquet``, or ``auto``.
If the engine is NOT specified, then the ``pd.options.io.parquet.engine`` option is checked; if this is also ``auto``, then
If the engine is NOT specified, then the ``pd.options.io.parquet.engine`` option is checked; if this is also ``auto``,
then ``pyarrow`` is tried, and falling back to ``fastparquet``.

See the documentation for `pyarrow <http://arrow.apache.org/docs/python/>`__ and `fastparquet <https://fastparquet.readthedocs.io/en/latest/>`__
Expand Down
10 changes: 5 additions & 5 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -406,7 +406,7 @@ of all enhancements and bugs that have been fixed in 0.20.1.

.. note::

This is a combined release for 0.20.0 and and 0.20.1.
This is a combined release for 0.20.0 and 0.20.1.
Version 0.20.1 contains one additional change for backwards-compatibility with downstream projects using pandas' ``utils`` routines. (:issue:`16250`)

Thanks
Expand Down Expand Up @@ -2918,7 +2918,7 @@ Improvements to existing features
- clipboard functions use pyperclip (no dependencies on Windows, alternative
dependencies offered for Linux) (:issue:`3837`).
- Plotting functions now raise a ``TypeError`` before trying to plot anything
if the associated objects have have a dtype of ``object`` (:issue:`1818`,
if the associated objects have a dtype of ``object`` (:issue:`1818`,
:issue:`3572`, :issue:`3911`, :issue:`3912`), but they will try to convert object
arrays to numeric arrays if possible so that you can still plot, for example, an
object array with floats. This happens before any drawing takes place which
Expand Down Expand Up @@ -4082,7 +4082,7 @@ Bug Fixes
columns (:issue:`1943`)
- Fix time zone localization bug causing improper fields (e.g. hours) in time
zones that have not had a UTC transition in a long time (:issue:`1946`)
- Fix errors when parsing and working with with fixed offset timezones
- Fix errors when parsing and working with fixed offset timezones
(:issue:`1922`, :issue:`1928`)
- Fix text parser bug when handling UTC datetime objects generated by
dateutil (:issue:`1693`)
Expand Down Expand Up @@ -4383,7 +4383,7 @@ Bug Fixes
error (:issue:`1090`)
- Consistently set name on groupby pieces (:issue:`184`)
- Treat dict return values as Series in GroupBy.apply (:issue:`823`)
- Respect column selection for DataFrame in in GroupBy.transform (:issue:`1365`)
- Respect column selection for DataFrame in GroupBy.transform (:issue:`1365`)
- Fix MultiIndex partial indexing bug (:issue:`1352`)
- Enable assignment of rows in mixed-type DataFrame via .ix (:issue:`1432`)
- Reset index mapping when grouping Series in Cython (:issue:`1423`)
Expand Down Expand Up @@ -5040,7 +5040,7 @@ New Features
- Add `melt` function to `pandas.core.reshape`
- Add `level` parameter to group by level in Series and DataFrame
descriptive statistics (:issue:`313`)
- Add `head` and `tail` methods to Series, analogous to to DataFrame (PR
- Add `head` and `tail` methods to Series, analogous to DataFrame (PR
:issue:`296`)
- Add `Series.isin` function which checks if each value is contained in a
passed sequence (:issue:`289`)
Expand Down
3 changes: 2 additions & 1 deletion doc/source/text.rst
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,8 @@ Extract first match in each subject (extract)
``DataFrame``, depending on the subject and regular expression
pattern (same behavior as pre-0.18.0). When ``expand=True`` it
always returns a ``DataFrame``, which is more consistent and less
confusing from the perspective of a user.
confusing from the perspective of a user. ``expand=True`` is the
default since version 0.23.0.

The ``extract`` method accepts a `regular expression
<https://docs.python.org/3/library/re.html>`__ with at least one
Expand Down
2 changes: 1 addition & 1 deletion doc/source/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ pandas Cookbook
The goal of this cookbook (by `Julia Evans <http://jvns.ca>`_) is to
give you some concrete examples for getting started with pandas. These
are examples with real-world data, and all the bugs and weirdness that
that entails.
entails.

Here are links to the v0.1 release. For an up-to-date table of contents, see the `pandas-cookbook GitHub
repository <http://github.com/jvns/pandas-cookbook>`_. To run the examples in this tutorial, you'll need to
Expand Down
103 changes: 101 additions & 2 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,50 @@ Please note that the string `index` is not supported with the round trip format,
new_df
print(new_df.index.name)

.. _whatsnew_0230.enhancements.index_division_by_zero:

Index Division By Zero Fills Correctly
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Division operations on ``Index`` and subclasses will now fill division of positive numbers by zero with ``np.inf``, division of negative numbers by zero with ``-np.inf`` and `0 / 0` with ``np.nan``. This matches existing ``Series`` behavior. (:issue:`19322`, :issue:`19347`)

Previous Behavior:

.. code-block:: ipython

In [6]: index = pd.Int64Index([-1, 0, 1])

In [7]: index / 0
Out[7]: Int64Index([0, 0, 0], dtype='int64')

# Previous behavior yielded different results depending on the type of zero in the divisor
In [8]: index / 0.0
Out[8]: Float64Index([-inf, nan, inf], dtype='float64')

In [9]: index = pd.UInt64Index([0, 1])

In [10]: index / np.array([0, 0], dtype=np.uint64)
Out[10]: UInt64Index([0, 0], dtype='uint64')

In [11]: pd.RangeIndex(1, 5) / 0
ZeroDivisionError: integer division or modulo by zero

Current Behavior:

.. ipython:: python

index = pd.Int64Index([-1, 0, 1])
# division by zero gives -infinity where negative, +infinity where positive, and NaN for 0 / 0
index / 0

# The result of division by zero should not depend on whether the zero is int or float
index / 0.0

index = pd.UInt64Index([0, 1])
index / np.array([0, 0], dtype=np.uint64)

pd.RangeIndex(1, 5) / 0

.. _whatsnew_0230.enhancements.other:

Other Enhancements
Expand Down Expand Up @@ -289,13 +333,64 @@ Convert to an xarray DataArray
p.to_xarray()


.. _whatsnew_0230.api_breaking.build_changes:

Build Changes
^^^^^^^^^^^^^

- Building pandas for development now requires ``cython >= 0.24`` (:issue:`18613`)
- Building from source now explicitly requires ``setuptools`` in ``setup.py`` (:issue:`18113`)
- Updated conda recipe to be in compliance with conda-build 3.0+ (:issue:`18002`)

.. _whatsnew_0230.api_breaking.extract:

Extraction of matching patterns from strings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By default, extracting matching patterns from strings with :func:`str.extract` used to return a
``Series`` if a single group was being extracted (a ``DataFrame`` if more than one group was
extracted``). As of Pandas 0.23.0 :func:`str.extract` always returns a ``DataFrame``, unless
``expand`` is set to ``False`` (:issue:`11386`).

Also, ``None`` was an accepted value for the ``expand`` parameter (which was equivalent to
``False``), but now raises a ``ValueError``.

Previous Behavior:

.. code-block:: ipython

In [1]: s = pd.Series(['number 10', '12 eggs'])

In [2]: extracted = s.str.extract('.*(\d\d).*')

In [3]: extracted
Out [3]:
0 10
1 12
dtype: object

In [4]: type(extracted)
Out [4]:
pandas.core.series.Series

New Behavior:

.. ipython:: python

s = pd.Series(['number 10', '12 eggs'])
extracted = s.str.extract('.*(\d\d).*')
extracted
type(extracted)

To restore previous behavior, simply set ``expand`` to ``False``:

.. ipython:: python

s = pd.Series(['number 10', '12 eggs'])
extracted = s.str.extract('.*(\d\d).*', expand=False)
extracted
type(extracted)

.. _whatsnew_0230.api:

Other API Changes
Expand Down Expand Up @@ -455,6 +550,7 @@ Datetimelike
- Bug in :func:`Series.truncate` which raises ``TypeError`` with a monotonic ``PeriodIndex`` (:issue:`17717`)
- Bug in :func:`~DataFrame.pct_change` using ``periods`` and ``freq`` returned different length outputs (:issue:`7292`)
- Bug in comparison of :class:`DatetimeIndex` against ``None`` or ``datetime.date`` objects raising ``TypeError`` for ``==`` and ``!=`` comparisons instead of all-``False`` and all-``True``, respectively (:issue:`19301`)
- Bug in :class:`Timestamp` and :func:`to_datetime` where a string representing a barely out-of-bounds timestamp would be incorrectly rounded down instead of raising ``OutOfBoundsDatetime`` (:issue:`19382`)
-

Timezones
Expand Down Expand Up @@ -531,6 +627,7 @@ I/O
- Bug in :func:`DataFrame.to_parquet` where an exception was raised if the write destination is S3 (:issue:`19134`)
- :class:`Interval` now supported in :func:`DataFrame.to_excel` for all Excel file types (:issue:`19242`)
- :class:`Timedelta` now supported in :func:`DataFrame.to_excel` for xls file type (:issue:`19242`, :issue:`9155`)
- Bug in :meth:`pandas.io.stata.StataReader.value_labels` raising an ``AttributeError`` when called on very old files. Now returns an empty dict (:issue:`19417`)

Plotting
^^^^^^^^
Expand All @@ -547,15 +644,16 @@ Groupby/Resample/Rolling
- Fixed regression in :func:`DataFrame.groupby` which would not emit an error when called with a tuple key not in the index (:issue:`18798`)
- Bug in :func:`DataFrame.resample` which silently ignored unsupported (or mistyped) options for ``label``, ``closed`` and ``convention`` (:issue:`19303`)
- Bug in :func:`DataFrame.groupby` where tuples were interpreted as lists of keys rather than as keys (:issue:`17979`, :issue:`18249`)
- Bug in ``transform`` where particular aggregation functions were being incorrectly cast to match the dtype(s) of the grouped data (:issue:`19200`)
- Bug in :func:`DataFrame.groupby` where aggregation by ``first``/``last``/``min``/``max`` was causing timestamps to lose precision (:issue:`19526`)
- Bug in :func:`DataFrame.transform` where particular aggregation functions were being incorrectly cast to match the dtype(s) of the grouped data (:issue:`19200`)
- Bug in :func:`DataFrame.groupby` passing the `on=` kwarg, and subsequently using ``.apply()`` (:issue:`17813`)

Sparse
^^^^^^

- Bug in which creating a ``SparseDataFrame`` from a dense ``Series`` or an unsupported type raised an uncontrolled exception (:issue:`19374`)
- Bug in :class:`SparseDataFrame.to_csv` causing exception (:issue:`19384`)
-
- Bug in :class:`SparseSeries.memory_usage` which caused segfault by accessing non sparse elements (:issue:`19368`)

Reshaping
^^^^^^^^^
Expand All @@ -571,6 +669,7 @@ Reshaping
- Bug in :func:`DataFrame.stack`, :func:`DataFrame.unstack`, :func:`Series.unstack` which were not returning subclasses (:issue:`15563`)
- Bug in timezone comparisons, manifesting as a conversion of the index to UTC in ``.concat()`` (:issue:`18523`)
- Bug in :func:`concat` when concatting sparse and dense series it returns only a ``SparseDataFrame``. Should be a ``DataFrame``. (:issue:`18914`, :issue:`18686`, and :issue:`16874`)
- Improved error message for :func:`DataFrame.merge` when there is no common merge key (:issue:`19427`)
-


Expand Down
Loading

0 comments on commit 2fb23d6

Please sign in to comment.