Skip to content

Commit

Permalink
DEPR: allow options for using bottleneck/numexpr (pandas-dev#16157)
Browse files Browse the repository at this point in the history
* DEPR: allow options for using bottleneck/numexpr

deprecate pd.computation.expressions.set_use_numexpr()

* DEPR: pandas.types.concat.union_categoricals in favor of pandas.api.type.union_categoricals

closes pandas-dev#16140
  • Loading branch information
jreback authored Apr 27, 2017
1 parent 669973a commit 075eca1
Show file tree
Hide file tree
Showing 17 changed files with 215 additions and 94 deletions.
11 changes: 10 additions & 1 deletion doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ Accelerated operations
----------------------

pandas has support for accelerating certain types of binary numerical and boolean operations using
the ``numexpr`` library (starting in 0.11.0) and the ``bottleneck`` libraries.
the ``numexpr`` library and the ``bottleneck`` libraries.

These libraries are especially useful when dealing with large data sets, and provide large
speedups. ``numexpr`` uses smart chunking, caching, and multiple cores. ``bottleneck`` is
Expand All @@ -114,6 +114,15 @@ Here is a sample (using 100 column x 100,000 row ``DataFrames``):
You are highly encouraged to install both libraries. See the section
:ref:`Recommended Dependencies <install.recommended_dependencies>` for more installation info.

These are both enabled to be used by default, you can control this by setting the options:

.. versionadded:: 0.20.0

.. code-block:: python
pd.set_option('compute.use_bottleneck', False)
pd.set_option('compute.use_numexpr', False)
.. _basics.binop:

Flexible binary operations
Expand Down
6 changes: 5 additions & 1 deletion doc/source/options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,10 @@ mode.use_inf_as_null False True means treat None, NaN, -IN
INF as null (old way), False means
None and NaN are null, but INF, -INF
are not null (new way).
compute.use_bottleneck True Use the bottleneck library to accelerate
computation if it is installed
compute.use_numexpr True Use the numexpr library to accelerate
computation if it is installed
=================================== ============ ==================================


Expand Down Expand Up @@ -538,4 +542,4 @@ Only ``'display.max_rows'`` are serialized and published.
.. ipython:: python
:suppress:
pd.reset_option('display.html.table_schema')
pd.reset_option('display.html.table_schema')
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -521,6 +521,7 @@ Other Enhancements
- The ``display.show_dimensions`` option can now also be used to specify
whether the length of a ``Series`` should be shown in its repr (:issue:`7117`).
- ``parallel_coordinates()`` has gained a ``sort_labels`` keyword arg that sorts class labels and the colours assigned to them (:issue:`15908`)
- Options added to allow one to turn on/off using ``bottleneck`` and ``numexpr``, see :ref:`here <basics.accelerate>` (:issue:`16157`)


.. _ISO 8601 duration: https://en.wikipedia.org/wiki/ISO_8601#Durations
Expand Down Expand Up @@ -1217,7 +1218,7 @@ If indicated, a deprecation warning will be issued if you reference theses modul

"pandas.lib", "pandas._libs.lib", "X"
"pandas.tslib", "pandas._libs.tslib", "X"
"pandas.computation", "pandas.core.computation", ""
"pandas.computation", "pandas.core.computation", "X"
"pandas.msgpack", "pandas.io.msgpack", ""
"pandas.index", "pandas._libs.index", ""
"pandas.algos", "pandas._libs.algos", ""
Expand Down
Empty file added pandas/computation/__init__.py
Empty file.
11 changes: 11 additions & 0 deletions pandas/computation/expressions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import warnings


def set_use_numexpr(v=True):
warnings.warn("pandas.computation.expressions.set_use_numexpr is "
"deprecated and will be removed in a future version.\n"
"you can toggle usage of numexpr via "
"pandas.get_option('compute.use_numexpr')",
FutureWarning, stacklevel=2)
from pandas import set_option
set_option('compute.use_numexpr', v)
3 changes: 2 additions & 1 deletion pandas/core/computation/expressions.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import numpy as np
from pandas.core.common import _values_from_object
from pandas.core.computation import _NUMEXPR_INSTALLED
from pandas.core.config import get_option

if _NUMEXPR_INSTALLED:
import numexpr as ne
Expand Down Expand Up @@ -156,7 +157,7 @@ def _where_numexpr(cond, a, b, raise_on_error=False):


# turn myself on
set_use_numexpr(True)
set_use_numexpr(get_option('compute.use_numexpr'))


def _has_bool_dtype(x):
Expand Down
35 changes: 34 additions & 1 deletion pandas/core/config_init.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,41 @@
from pandas.core.config import (is_int, is_bool, is_text, is_instance_factory,
is_one_of_factory, get_default_val,
is_callable)
from pandas.io.formats.format import detect_console_encoding
from pandas.io.formats.console import detect_console_encoding

# compute

use_bottleneck_doc = """
: bool
Use the bottleneck library to accelerate if it is installed,
the default is True
Valid values: False,True
"""


def use_bottleneck_cb(key):
from pandas.core import nanops
nanops.set_use_bottleneck(cf.get_option(key))


use_numexpr_doc = """
: bool
Use the numexpr library to accelerate computation if it is installed,
the default is True
Valid values: False,True
"""


def use_numexpr_cb(key):
from pandas.core.computation import expressions
expressions.set_use_numexpr(cf.get_option(key))


with cf.config_prefix('compute'):
cf.register_option('use_bottleneck', True, use_bottleneck_doc,
validator=is_bool, cb=use_bottleneck_cb)
cf.register_option('use_numexpr', True, use_numexpr_doc,
validator=is_bool, cb=use_numexpr_cb)
#
# options from the "display" namespace

Expand Down
5 changes: 3 additions & 2 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@
import pandas.core.nanops as nanops
import pandas.core.ops as ops
import pandas.io.formats.format as fmt
import pandas.io.formats.console as console
from pandas.io.formats.printing import pprint_thing
import pandas.plotting._core as gfx

Expand Down Expand Up @@ -513,7 +514,7 @@ def _repr_fits_horizontal_(self, ignore_width=False):
GH3541, GH3573
"""

width, height = fmt.get_console_size()
width, height = console.get_console_size()
max_columns = get_option("display.max_columns")
nb_columns = len(self.columns)

Expand Down Expand Up @@ -577,7 +578,7 @@ def __unicode__(self):
max_cols = get_option("display.max_columns")
show_dimensions = get_option("display.show_dimensions")
if get_option("display.expand_frame_repr"):
width, _ = fmt.get_console_size()
width, _ = console.get_console_size()
else:
width = None
self.to_string(buf=buf, max_rows=max_rows, max_cols=max_cols,
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -837,7 +837,8 @@ def _format_data(self):
"""
Return the formatted data as a unicode string
"""
from pandas.io.formats.format import get_console_size, _get_adjustment
from pandas.io.formats.console import get_console_size
from pandas.io.formats.format import _get_adjustment
display_width, _ = get_console_size()
if display_width is None:
display_width = get_option('display.width') or 80
Expand Down
28 changes: 20 additions & 8 deletions pandas/core/nanops.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,8 @@
import itertools
import functools
import numpy as np
import operator

try:
import bottleneck as bn
_USE_BOTTLENECK = True
except ImportError: # pragma: no cover
_USE_BOTTLENECK = False

import numpy as np
from pandas import compat
from pandas._libs import tslib, algos, lib
from pandas.core.dtypes.common import (
Expand All @@ -23,9 +17,27 @@
is_int_or_datetime_dtype, is_any_int_dtype)
from pandas.core.dtypes.cast import _int64_max, maybe_upcast_putmask
from pandas.core.dtypes.missing import isnull, notnull

from pandas.core.config import get_option
from pandas.core.common import _values_from_object

try:
import bottleneck as bn
_BOTTLENECK_INSTALLED = True
except ImportError: # pragma: no cover
_BOTTLENECK_INSTALLED = False

_USE_BOTTLENECK = False


def set_use_bottleneck(v=True):
# set/unset to use bottleneck
global _USE_BOTTLENECK
if _BOTTLENECK_INSTALLED:
_USE_BOTTLENECK = v


set_use_bottleneck(get_option('compute.use_bottleneck'))


class disallow(object):

Expand Down
84 changes: 84 additions & 0 deletions pandas/io/formats/console.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
"""
Internal module for console introspection
"""

import sys
import locale
from pandas.util.terminal import get_terminal_size

# -----------------------------------------------------------------------------
# Global formatting options
_initial_defencoding = None


def detect_console_encoding():
"""
Try to find the most capable encoding supported by the console.
slighly modified from the way IPython handles the same issue.
"""
global _initial_defencoding

encoding = None
try:
encoding = sys.stdout.encoding or sys.stdin.encoding
except AttributeError:
pass

# try again for something better
if not encoding or 'ascii' in encoding.lower():
try:
encoding = locale.getpreferredencoding()
except Exception:
pass

# when all else fails. this will usually be "ascii"
if not encoding or 'ascii' in encoding.lower():
encoding = sys.getdefaultencoding()

# GH3360, save the reported defencoding at import time
# MPL backends may change it. Make available for debugging.
if not _initial_defencoding:
_initial_defencoding = sys.getdefaultencoding()

return encoding


def get_console_size():
"""Return console size as tuple = (width, height).
Returns (None,None) in non-interactive session.
"""
from pandas import get_option
from pandas.core import common as com

display_width = get_option('display.width')
# deprecated.
display_height = get_option('display.height', silent=True)

# Consider
# interactive shell terminal, can detect term size
# interactive non-shell terminal (ipnb/ipqtconsole), cannot detect term
# size non-interactive script, should disregard term size

# in addition
# width,height have default values, but setting to 'None' signals
# should use Auto-Detection, But only in interactive shell-terminal.
# Simple. yeah.

if com.in_interactive_session():
if com.in_ipython_frontend():
# sane defaults for interactive non-shell terminal
# match default for width,height in config_init
from pandas.core.config import get_default_val
terminal_width = get_default_val('display.width')
terminal_height = get_default_val('display.height')
else:
# pure terminal
terminal_width, terminal_height = get_terminal_size()
else:
terminal_width, terminal_height = None, None

# Note if the User sets width/Height to None (auto-detection)
# and we're in a script (non-inter), this will return (None,None)
# caller needs to deal.
return (display_width or terminal_width, display_height or terminal_height)
77 changes: 0 additions & 77 deletions pandas/io/formats/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
from distutils.version import LooseVersion
# pylint: disable=W0141

import sys
from textwrap import dedent

from pandas.core.dtypes.missing import isnull, notnull
Expand Down Expand Up @@ -2290,82 +2289,6 @@ def _has_names(index):
return index.name is not None


# -----------------------------------------------------------------------------
# Global formatting options
_initial_defencoding = None


def detect_console_encoding():
"""
Try to find the most capable encoding supported by the console.
slighly modified from the way IPython handles the same issue.
"""
import locale
global _initial_defencoding

encoding = None
try:
encoding = sys.stdout.encoding or sys.stdin.encoding
except AttributeError:
pass

# try again for something better
if not encoding or 'ascii' in encoding.lower():
try:
encoding = locale.getpreferredencoding()
except Exception:
pass

# when all else fails. this will usually be "ascii"
if not encoding or 'ascii' in encoding.lower():
encoding = sys.getdefaultencoding()

# GH3360, save the reported defencoding at import time
# MPL backends may change it. Make available for debugging.
if not _initial_defencoding:
_initial_defencoding = sys.getdefaultencoding()

return encoding


def get_console_size():
"""Return console size as tuple = (width, height).
Returns (None,None) in non-interactive session.
"""
display_width = get_option('display.width')
# deprecated.
display_height = get_option('display.height', silent=True)

# Consider
# interactive shell terminal, can detect term size
# interactive non-shell terminal (ipnb/ipqtconsole), cannot detect term
# size non-interactive script, should disregard term size

# in addition
# width,height have default values, but setting to 'None' signals
# should use Auto-Detection, But only in interactive shell-terminal.
# Simple. yeah.

if com.in_interactive_session():
if com.in_ipython_frontend():
# sane defaults for interactive non-shell terminal
# match default for width,height in config_init
from pandas.core.config import get_default_val
terminal_width = get_default_val('display.width')
terminal_height = get_default_val('display.height')
else:
# pure terminal
terminal_width, terminal_height = get_terminal_size()
else:
terminal_width, terminal_height = None, None

# Note if the User sets width/Height to None (auto-detection)
# and we're in a script (non-inter), this will return (None,None)
# caller needs to deal.
return (display_width or terminal_width, display_height or terminal_height)


class EngFormatter(object):
"""
Formats float values according to engineering format.
Expand Down
Loading

0 comments on commit 075eca1

Please sign in to comment.