diff --git a/doc/source/merging.rst b/doc/source/merging.rst index aebbcee67ad48..7b4cf7cb1bda9 100644 --- a/doc/source/merging.rst +++ b/doc/source/merging.rst @@ -31,10 +31,10 @@ operations. Concatenating objects --------------------- -The :func:`~pandas.concat` function (in the main pandas namespace) does all of -the heavy lifting of performing concatenation operations along an axis while -performing optional set logic (union or intersection) of the indexes (if any) on -the other axes. Note that I say "if any" because there is only a single possible +The :func:`~pandas.concat` function (in the main pandas namespace) does all of +the heavy lifting of performing concatenation operations along an axis while +performing optional set logic (union or intersection) of the indexes (if any) on +the other axes. Note that I say "if any" because there is only a single possible axis of concatenation for Series. Before diving into all of the details of ``concat`` and what it can do, here is @@ -109,9 +109,9 @@ some configurable handling of "what to do with the other axes": to the actual data concatenation. - ``copy`` : boolean, default True. If False, do not copy data unnecessarily. -Without a little bit of context many of these arguments don't make much sense. -Let's revisit the above example. Suppose we wanted to associate specific keys -with each of the pieces of the chopped up DataFrame. We can do this using the +Without a little bit of context many of these arguments don't make much sense. +Let's revisit the above example. Suppose we wanted to associate specific keys +with each of the pieces of the chopped up DataFrame. We can do this using the ``keys`` argument: .. ipython:: python @@ -138,9 +138,9 @@ It's not a stretch to see how this can be very useful. More detail on this functionality below. .. note:: - It is worth noting that :func:`~pandas.concat` (and therefore - :func:`~pandas.append`) makes a full copy of the data, and that constantly - reusing this function can create a significant performance hit. If you need + It is worth noting that :func:`~pandas.concat` (and therefore + :func:`~pandas.append`) makes a full copy of the data, and that constantly + reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension. :: @@ -153,7 +153,7 @@ Set logic on the other axes ~~~~~~~~~~~~~~~~~~~~~~~~~~~ When gluing together multiple DataFrames, you have a choice of how to handle -the other axes (other than the one being concatenated). This can be done in +the other axes (other than the one being concatenated). This can be done in the following three ways: - Take the (sorted) union of them all, ``join='outer'``. This is the default @@ -216,8 +216,8 @@ DataFrame: Concatenating using ``append`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A useful shortcut to :func:`~pandas.concat` are the :meth:`~DataFrame.append` -instance methods on ``Series`` and ``DataFrame``. These methods actually predated +A useful shortcut to :func:`~pandas.concat` are the :meth:`~DataFrame.append` +instance methods on ``Series`` and ``DataFrame``. These methods actually predated ``concat``. They concatenate along ``axis=0``, namely the index: .. ipython:: python @@ -263,8 +263,8 @@ need to be: .. note:: - Unlike the :py:meth:`~list.append` method, which appends to the original list - and returns ``None``, :meth:`~DataFrame.append` here **does not** modify + Unlike the :py:meth:`~list.append` method, which appends to the original list + and returns ``None``, :meth:`~DataFrame.append` here **does not** modify ``df1`` and returns its copy with ``df2`` appended. .. _merging.ignore_index: @@ -362,9 +362,9 @@ Passing ``ignore_index=True`` will drop all name references. More concatenating with group keys ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A fairly common use of the ``keys`` argument is to override the column names +A fairly common use of the ``keys`` argument is to override the column names when creating a new ``DataFrame`` based on existing ``Series``. -Notice how the default behaviour consists on letting the resulting ``DataFrame`` +Notice how the default behaviour consists on letting the resulting ``DataFrame`` inherit the parent ``Series``' name, when these existed. .. ipython:: python @@ -460,7 +460,7 @@ Appending rows to a DataFrame ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ While not especially efficient (since a new object must be created), you can -append a single row to a ``DataFrame`` by passing a ``Series`` or dict to +append a single row to a ``DataFrame`` by passing a ``Series`` or dict to ``append``, which returns a new ``DataFrame`` as above. .. ipython:: python @@ -505,7 +505,7 @@ pandas has full-featured, **high performance** in-memory join operations idiomatically very similar to relational databases like SQL. These methods perform significantly better (in some cases well over an order of magnitude better) than other open source implementations (like ``base::merge.data.frame`` -in R). The reason for this is careful algorithmic design and the internal layout +in R). The reason for this is careful algorithmic design and the internal layout of the data in ``DataFrame``. See the :ref:`cookbook` for some advanced strategies. @@ -513,7 +513,7 @@ See the :ref:`cookbook` for some advanced strategies. Users who are familiar with SQL but new to pandas might be interested in a :ref:`comparison with SQL`. -pandas provides a single function, :func:`~pandas.merge`, as the entry point for +pandas provides a single function, :func:`~pandas.merge`, as the entry point for all standard database join operations between ``DataFrame`` objects: :: @@ -582,7 +582,7 @@ and ``right`` is a subclass of DataFrame, the return type will still be ``DataFrame``. ``merge`` is a function in the pandas namespace, and it is also available as a -``DataFrame`` instance method :meth:`~DataFrame.merge`, with the calling +``DataFrame`` instance method :meth:`~DataFrame.merge`, with the calling ``DataFrame `` being implicitly considered the left object in the join. The related :meth:`~DataFrame.join` method, uses ``merge`` internally for the @@ -594,7 +594,7 @@ Brief primer on merge methods (relational algebra) Experienced users of relational databases like SQL will be familiar with the terminology used to describe join operations between two SQL-table like -structures (``DataFrame`` objects). There are several cases to consider which +structures (``DataFrame`` objects). There are several cases to consider which are very important to understand: - **one-to-one** joins: for example when joining two ``DataFrame`` objects on @@ -634,8 +634,8 @@ key combination: labels=['left', 'right'], vertical=False); plt.close('all'); -Here is a more complicated example with multiple join keys. Only the keys -appearing in ``left`` and ``right`` are present (the intersection), since +Here is a more complicated example with multiple join keys. Only the keys +appearing in ``left`` and ``right`` are present (the intersection), since ``how='inner'`` by default. .. ipython:: python @@ -751,13 +751,13 @@ Checking for duplicate keys .. versionadded:: 0.21.0 -Users can use the ``validate`` argument to automatically check whether there -are unexpected duplicates in their merge keys. Key uniqueness is checked before -merge operations and so should protect against memory overflows. Checking key -uniqueness is also a good way to ensure user data structures are as expected. +Users can use the ``validate`` argument to automatically check whether there +are unexpected duplicates in their merge keys. Key uniqueness is checked before +merge operations and so should protect against memory overflows. Checking key +uniqueness is also a good way to ensure user data structures are as expected. -In the following example, there are duplicate values of ``B`` in the right -``DataFrame``. As this is not a one-to-one merge -- as specified in the +In the following example, there are duplicate values of ``B`` in the right +``DataFrame``. As this is not a one-to-one merge -- as specified in the ``validate`` argument -- an exception will be raised. @@ -770,11 +770,11 @@ In the following example, there are duplicate values of ``B`` in the right In [53]: result = pd.merge(left, right, on='B', how='outer', validate="one_to_one") ... - MergeError: Merge keys are not unique in right dataset; not a one-to-one merge + MergeError: Merge keys are not unique in right dataset; not a one-to-one merge -If the user is aware of the duplicates in the right ``DataFrame`` but wants to -ensure there are no duplicates in the left DataFrame, one can use the -``validate='one_to_many'`` argument instead, which will not raise an exception. +If the user is aware of the duplicates in the right ``DataFrame`` but wants to +ensure there are no duplicates in the left DataFrame, one can use the +``validate='one_to_many'`` argument instead, which will not raise an exception. .. ipython:: python @@ -786,8 +786,8 @@ ensure there are no duplicates in the left DataFrame, one can use the The merge indicator ~~~~~~~~~~~~~~~~~~~ -:func:`~pandas.merge` accepts the argument ``indicator``. If ``True``, a -Categorical-type column called ``_merge`` will be added to the output object +:func:`~pandas.merge` accepts the argument ``indicator``. If ``True``, a +Categorical-type column called ``_merge`` will be added to the output object that takes on values: =================================== ================ @@ -895,7 +895,7 @@ Joining on index ~~~~~~~~~~~~~~~~ :meth:`DataFrame.join` is a convenient method for combining the columns of two -potentially differently-indexed ``DataFrames`` into a single result +potentially differently-indexed ``DataFrames`` into a single result ``DataFrame``. Here is a very basic example: .. ipython:: python @@ -975,9 +975,9 @@ indexes: Joining key columns on an index ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -:meth:`~DataFrame.join` takes an optional ``on`` argument which may be a column +:meth:`~DataFrame.join` takes an optional ``on`` argument which may be a column or multiple column names, which specifies that the passed ``DataFrame`` is to be -aligned on that column in the ``DataFrame``. These two function calls are +aligned on that column in the ``DataFrame``. These two function calls are completely equivalent: :: @@ -987,7 +987,7 @@ completely equivalent: how='left', sort=False) Obviously you can choose whichever form you find more convenient. For -many-to-one joins (where one of the ``DataFrame``'s is already indexed by the +many-to-one joins (where one of the ``DataFrame``'s is already indexed by the join key), using ``join`` may be more convenient. Here is a simple example: .. ipython:: python @@ -1266,7 +1266,7 @@ similarly. Joining multiple DataFrame or Panel objects ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A list or tuple of ``DataFrames`` can also be passed to :meth:`~DataFrame.join` +A list or tuple of ``DataFrames`` can also be passed to :meth:`~DataFrame.join` to join them together on their indexes. .. ipython:: python @@ -1288,7 +1288,7 @@ Merging together values within Series or DataFrame columns ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Another fairly common situation is to have two like-indexed (or similarly -indexed) ``Series`` or ``DataFrame`` objects and wanting to "patch" values in +indexed) ``Series`` or ``DataFrame`` objects and wanting to "patch" values in one object from values for matching indices in the other. Here is an example: .. ipython:: python @@ -1313,7 +1313,7 @@ For this, use the :meth:`~DataFrame.combine_first` method: plt.close('all'); Note that this method only takes values from the right ``DataFrame`` if they are -missing in the left ``DataFrame``. A related method, :meth:`~DataFrame.update`, +missing in the left ``DataFrame``. A related method, :meth:`~DataFrame.update`, alters non-NA values inplace: .. ipython:: python @@ -1365,15 +1365,15 @@ Merging AsOf .. versionadded:: 0.19.0 -A :func:`merge_asof` is similar to an ordered left-join except that we match on -nearest key rather than equal keys. For each row in the ``left`` ``DataFrame``, -we select the last row in the ``right`` ``DataFrame`` whose ``on`` key is less +A :func:`merge_asof` is similar to an ordered left-join except that we match on +nearest key rather than equal keys. For each row in the ``left`` ``DataFrame``, +we select the last row in the ``right`` ``DataFrame`` whose ``on`` key is less than the left's key. Both DataFrames must be sorted by the key. -Optionally an asof merge can perform a group-wise merge. This matches the +Optionally an asof merge can perform a group-wise merge. This matches the ``by`` key equally, in addition to the nearest match on the ``on`` key. -For example; we might have ``trades`` and ``quotes`` and we want to ``asof`` +For example; we might have ``trades`` and ``quotes`` and we want to ``asof`` merge them. .. ipython:: python @@ -1432,8 +1432,8 @@ We only asof within ``2ms`` between the quote time and the trade time. by='ticker', tolerance=pd.Timedelta('2ms')) -We only asof within ``10ms`` between the quote time and the trade time and we -exclude exact matches on time. Note that though we exclude the exact matches +We only asof within ``10ms`` between the quote time and the trade time and we +exclude exact matches on time. Note that though we exclude the exact matches (of the quotes), prior quotes **do** propagate to that point in time. .. ipython:: python diff --git a/pandas/core/frame.py b/pandas/core/frame.py index e3164ca0262aa..d430d442fae0f 100644 --- a/pandas/core/frame.py +++ b/pandas/core/frame.py @@ -883,27 +883,66 @@ def dot(self, other): @classmethod def from_dict(cls, data, orient='columns', dtype=None, columns=None): """ - Construct DataFrame from dict of array-like or dicts + Construct DataFrame from dict of array-like or dicts. + + Creates DataFrame object from dictionary by columns or by index + allowing dtype specification. Parameters ---------- data : dict - {field : array-like} or {field : dict} + Of the form {field : array-like} or {field : dict}. orient : {'columns', 'index'}, default 'columns' The "orientation" of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass 'columns' (default). Otherwise if the keys should be rows, pass 'index'. dtype : dtype, default None - Data type to force, otherwise infer - columns: list, default None - Column labels to use when orient='index'. Raises a ValueError - if used with orient='columns' + Data type to force, otherwise infer. + columns : list, default None + Column labels to use when ``orient='index'``. Raises a ValueError + if used with ``orient='columns'``. .. versionadded:: 0.23.0 Returns ------- - DataFrame + pandas.DataFrame + + See Also + -------- + DataFrame.from_records : DataFrame from ndarray (structured + dtype), list of tuples, dict, or DataFrame + DataFrame : DataFrame object creation using constructor + + Examples + -------- + By default the keys of the dict become the DataFrame columns: + + >>> data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']} + >>> pd.DataFrame.from_dict(data) + col_1 col_2 + 0 3 a + 1 2 b + 2 1 c + 3 0 d + + Specify ``orient='index'`` to create the DataFrame using dictionary + keys as rows: + + >>> data = {'row_1': [3, 2, 1, 0], 'row_2': ['a', 'b', 'c', 'd']} + >>> pd.DataFrame.from_dict(data, orient='index') + 0 1 2 3 + row_1 3 2 1 0 + row_2 a b c d + + When using the 'index' orientation, the column names can be + specified manually: + + >>> pd.DataFrame.from_dict(data, orient='index', + ... columns=['A', 'B', 'C', 'D']) + A B C D + row_1 3 2 1 0 + row_2 a b c d """ index = None orient = orient.lower() diff --git a/pandas/core/generic.py b/pandas/core/generic.py index d9bc4804b061b..fc8aaa23d2f79 100644 --- a/pandas/core/generic.py +++ b/pandas/core/generic.py @@ -7584,11 +7584,10 @@ def _add_numeric_operations(cls): cls.any = _make_logical_function( cls, 'any', name, name2, axis_descr, 'Return whether any element is True over requested axis', - nanops.nanany) + nanops.nanany, '', '') cls.all = _make_logical_function( - cls, 'all', name, name2, axis_descr, - 'Return whether all elements are True over requested axis', - nanops.nanall) + cls, 'all', name, name2, axis_descr, _all_doc, + nanops.nanall, _all_examples, _all_see_also) @Substitution(outname='mad', desc="Return the mean absolute deviation of the values " @@ -7845,7 +7844,6 @@ def _doc_parms(cls): %(outname)s : %(name1)s or %(name2)s (if level specified)\n""" _bool_doc = """ - %(desc)s Parameters @@ -7853,17 +7851,71 @@ def _doc_parms(cls): axis : %(axis_descr)s skipna : boolean, default True Exclude NA/null values. If an entire row/column is NA, the result - will be NA + will be NA. level : int or level name, default None If the axis is a MultiIndex (hierarchical), count along a - particular level, collapsing into a %(name1)s + particular level, collapsing into a %(name1)s. bool_only : boolean, default None Include only boolean columns. If None, will attempt to use everything, then use only boolean data. Not implemented for Series. +**kwargs : any, default None + Additional keywords have no affect but might be accepted for + compatibility with numpy. Returns ------- -%(outname)s : %(name1)s or %(name2)s (if level specified)\n""" +%(outname)s : %(name1)s or %(name2)s (if level specified) + +%(examples)s +%(see_also)s""" + +_all_doc = """\ +Return whether all elements are True over series or dataframe axis. + +Returns True if all elements within a series or along a dataframe +axis are non-zero, not-empty or not-False.""" + +_all_examples = """\ +Examples +-------- +Series + +>>> pd.Series([True, True]).all() +True +>>> pd.Series([True, False]).all() +False + +Dataframes + +Create a dataframe from a dictionary. + +>>> df = pd.DataFrame({'col1': [True, True], 'col2': [True, False]}) +>>> df + col1 col2 +0 True True +1 True False + +Default behaviour checks if column-wise values all return True. + +>>> df.all() +col1 True +col2 False +dtype: bool + +Adding axis=1 argument will check if row-wise values all return True. + +>>> df.all(axis=1) +0 True +1 False +dtype: bool +""" + +_all_see_also = """\ +See also +-------- +pandas.Series.all : Return True if all elements are True +pandas.DataFrame.any : Return True if one (or more) elements are True +""" _cnum_doc = """ @@ -8046,9 +8098,10 @@ def cum_func(self, axis=None, skipna=True, *args, **kwargs): return set_function_name(cum_func, name, cls) -def _make_logical_function(cls, name, name1, name2, axis_descr, desc, f): +def _make_logical_function(cls, name, name1, name2, axis_descr, desc, f, + examples, see_also): @Substitution(outname=name, desc=desc, name1=name1, name2=name2, - axis_descr=axis_descr) + axis_descr=axis_descr, examples=examples, see_also=see_also) @Appender(_bool_doc) def logical_func(self, axis=None, bool_only=None, skipna=None, level=None, **kwargs): diff --git a/pandas/core/indexes/base.py b/pandas/core/indexes/base.py index 71ff2e8d66646..bdf298be1510c 100644 --- a/pandas/core/indexes/base.py +++ b/pandas/core/indexes/base.py @@ -681,7 +681,47 @@ def _values(self): return self.values def get_values(self): - """ return the underlying data as an ndarray """ + """ + Return `Index` data as an `numpy.ndarray`. + + Returns + ------- + numpy.ndarray + A one-dimensional numpy array of the `Index` values. + + See Also + -------- + Index.values : The attribute that get_values wraps. + + Examples + -------- + Getting the `Index` values of a `DataFrame`: + + >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], + ... index=['a', 'b', 'c'], columns=['A', 'B', 'C']) + >>> df + A B C + a 1 2 3 + b 4 5 6 + c 7 8 9 + >>> df.index.get_values() + array(['a', 'b', 'c'], dtype=object) + + Standalone `Index` values: + + >>> idx = pd.Index(['1', '2', '3']) + >>> idx.get_values() + array(['1', '2', '3'], dtype=object) + + `MultiIndex` arrays also have only one dimension: + + >>> midx = pd.MultiIndex.from_arrays([[1, 2, 3], ['a', 'b', 'c']], + ... names=('number', 'letter')) + >>> midx.get_values() + array([(1, 'a'), (2, 'b'), (3, 'c')], dtype=object) + >>> midx.get_values().ndim + 1 + """ return self.values @Appender(IndexOpsMixin.memory_usage.__doc__) @@ -1710,6 +1750,59 @@ def _invalid_indexer(self, form, key): kind=type(key))) def get_duplicates(self): + """ + Extract duplicated index elements. + + Returns a sorted list of index elements which appear more than once in + the index. + + Returns + ------- + array-like + List of duplicated indexes. + + See Also + -------- + Index.duplicated : Return boolean array denoting duplicates. + Index.drop_duplicates : Return Index with duplicates removed. + + Examples + -------- + + Works on different Index of types. + + >>> pd.Index([1, 2, 2, 3, 3, 3, 4]).get_duplicates() + [2, 3] + >>> pd.Index([1., 2., 2., 3., 3., 3., 4.]).get_duplicates() + [2.0, 3.0] + >>> pd.Index(['a', 'b', 'b', 'c', 'c', 'c', 'd']).get_duplicates() + ['b', 'c'] + >>> dates = pd.to_datetime(['2018-01-01', '2018-01-02', '2018-01-03', + ... '2018-01-03', '2018-01-04', '2018-01-04'], + ... format='%Y-%m-%d') + >>> pd.Index(dates).get_duplicates() + DatetimeIndex(['2018-01-03', '2018-01-04'], + dtype='datetime64[ns]', freq=None) + + Sorts duplicated elements even when indexes are unordered. + + >>> pd.Index([1, 2, 3, 2, 3, 4, 3]).get_duplicates() + [2, 3] + + Return empty array-like structure when all elements are unique. + + >>> pd.Index([1, 2, 3, 4]).get_duplicates() + [] + >>> dates = pd.to_datetime(['2018-01-01', '2018-01-02', '2018-01-03'], + ... format='%Y-%m-%d') + >>> pd.Index(dates).get_duplicates() + DatetimeIndex([], dtype='datetime64[ns]', freq=None) + + Notes + ----- + In case of datetime-like indexes, the function is overridden where the + result is converted to DatetimeIndex. + """ from collections import defaultdict counter = defaultdict(lambda: 0) for k in self.values: diff --git a/pandas/core/ops.py b/pandas/core/ops.py index 83879cdaaa63c..6c6a54993b669 100644 --- a/pandas/core/ops.py +++ b/pandas/core/ops.py @@ -343,50 +343,93 @@ def _get_op_name(op, special): # ----------------------------------------------------------------------------- # Docstring Generation and Templates +_add_example_FRAME = """ +>>> a = pd.DataFrame([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'], + columns=['one']) +>>> a + one +a 1.0 +b 1.0 +c 1.0 +d NaN +>>> b = pd.DataFrame(dict(one=[1, np.nan, 1, np.nan], + two=[np.nan, 2, np.nan, 2]), + index=['a', 'b', 'd', 'e']) +>>> b + one two +a 1.0 NaN +b NaN 2.0 +d 1.0 NaN +e NaN 2.0 +>>> a.add(b, fill_value=0) + one two +a 2.0 NaN +b 1.0 2.0 +c 1.0 NaN +d 1.0 NaN +e NaN 2.0 +""" + _op_descriptions = { + # Arithmetic Operators 'add': {'op': '+', 'desc': 'Addition', - 'reverse': 'radd'}, + 'reverse': 'radd', + 'df_examples': _add_example_FRAME}, 'sub': {'op': '-', 'desc': 'Subtraction', - 'reverse': 'rsub'}, + 'reverse': 'rsub', + 'df_examples': None}, 'mul': {'op': '*', 'desc': 'Multiplication', - 'reverse': 'rmul'}, + 'reverse': 'rmul', + 'df_examples': None}, 'mod': {'op': '%', 'desc': 'Modulo', - 'reverse': 'rmod'}, + 'reverse': 'rmod', + 'df_examples': None}, 'pow': {'op': '**', 'desc': 'Exponential power', - 'reverse': 'rpow'}, + 'reverse': 'rpow', + 'df_examples': None}, 'truediv': {'op': '/', 'desc': 'Floating division', - 'reverse': 'rtruediv'}, + 'reverse': 'rtruediv', + 'df_examples': None}, 'floordiv': {'op': '//', 'desc': 'Integer division', - 'reverse': 'rfloordiv'}, + 'reverse': 'rfloordiv', + 'df_examples': None}, 'divmod': {'op': 'divmod', 'desc': 'Integer division and modulo', - 'reverse': None}, + 'reverse': None, + 'df_examples': None}, + # Comparison Operators 'eq': {'op': '==', - 'desc': 'Equal to', - 'reverse': None}, + 'desc': 'Equal to', + 'reverse': None, + 'df_examples': None}, 'ne': {'op': '!=', - 'desc': 'Not equal to', - 'reverse': None}, + 'desc': 'Not equal to', + 'reverse': None, + 'df_examples': None}, 'lt': {'op': '<', - 'desc': 'Less than', - 'reverse': None}, + 'desc': 'Less than', + 'reverse': None, + 'df_examples': None}, 'le': {'op': '<=', - 'desc': 'Less than or equal to', - 'reverse': None}, + 'desc': 'Less than or equal to', + 'reverse': None, + 'df_examples': None}, 'gt': {'op': '>', - 'desc': 'Greater than', - 'reverse': None}, + 'desc': 'Greater than', + 'reverse': None, + 'df_examples': None}, 'ge': {'op': '>=', - 'desc': 'Greater than or equal to', - 'reverse': None}} + 'desc': 'Greater than or equal to', + 'reverse': None, + 'df_examples': None}} _op_names = list(_op_descriptions.keys()) for key in _op_names: @@ -532,30 +575,7 @@ def _get_op_name(op, special): Examples -------- ->>> a = pd.DataFrame([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'], - columns=['one']) ->>> a - one -a 1.0 -b 1.0 -c 1.0 -d NaN ->>> b = pd.DataFrame(dict(one=[1, np.nan, 1, np.nan], - two=[np.nan, 2, np.nan, 2]), - index=['a', 'b', 'd', 'e']) ->>> b - one two -a 1.0 NaN -b NaN 2.0 -d 1.0 NaN -e NaN 2.0 ->>> a.add(b, fill_value=0) - one two -a 2.0 NaN -b 1.0 2.0 -c 1.0 NaN -d 1.0 NaN -e NaN 2.0 +{df_examples} See also -------- @@ -622,14 +642,19 @@ def _make_flex_doc(op_name, typ): if typ == 'series': base_doc = _flex_doc_SERIES + doc = base_doc.format(desc=op_desc['desc'], op_name=op_name, + equiv=equiv, reverse=op_desc['reverse']) elif typ == 'dataframe': base_doc = _flex_doc_FRAME + doc = base_doc.format(desc=op_desc['desc'], op_name=op_name, + equiv=equiv, reverse=op_desc['reverse'], + df_examples=op_desc['df_examples']) elif typ == 'panel': base_doc = _flex_doc_PANEL + doc = base_doc.format(desc=op_desc['desc'], op_name=op_name, + equiv=equiv, reverse=op_desc['reverse']) else: raise AssertionError('Invalid typ argument.') - doc = base_doc.format(desc=op_desc['desc'], op_name=op_name, - equiv=equiv, reverse=op_desc['reverse']) return doc diff --git a/pandas/core/series.py b/pandas/core/series.py index 7b9b8a7a75008..19a9a0cf3da0f 100644 --- a/pandas/core/series.py +++ b/pandas/core/series.py @@ -547,6 +547,71 @@ def __len__(self): return len(self._data) def view(self, dtype=None): + """ + Create a new view of the Series. + + This function will return a new Series with a view of the same + underlying values in memory, optionally reinterpreted with a new data + type. The new data type must preserve the same size in bytes as to not + cause index misalignment. + + Parameters + ---------- + dtype : data type + Data type object or one of their string representations. + + Returns + ------- + Series + A new Series object as a view of the same data in memory. + + See Also + -------- + numpy.ndarray.view : Equivalent numpy function to create a new view of + the same data in memory. + + Notes + ----- + Series are instantiated with ``dtype=float64`` by default. While + ``numpy.ndarray.view()`` will return a view with the same data type as + the original array, ``Series.view()`` (without specified dtype) + will try using ``float64`` and may fail if the original data type size + in bytes is not the same. + + Examples + -------- + >>> s = pd.Series([-2, -1, 0, 1, 2], dtype='int8') + >>> s + 0 -2 + 1 -1 + 2 0 + 3 1 + 4 2 + dtype: int8 + + The 8 bit signed integer representation of `-1` is `0b11111111`, but + the same bytes represent 255 if read as an 8 bit unsigned integer: + + >>> us = s.view('uint8') + >>> us + 0 254 + 1 255 + 2 0 + 3 1 + 4 2 + dtype: uint8 + + The views share the same underlying values: + + >>> us[0] = 128 + >>> s + 0 -128 + 1 -1 + 2 0 + 3 1 + 4 2 + dtype: int8 + """ return self._constructor(self._values.view(dtype), index=self.index).__finalize__(self) @@ -1607,16 +1672,63 @@ def cov(self, other, min_periods=None): def diff(self, periods=1): """ - 1st discrete difference of object + First discrete difference of element. + + Calculates the difference of a Series element compared with another + element in the Series (default is element in previous row). Parameters ---------- periods : int, default 1 - Periods to shift for forming difference + Periods to shift for calculating difference, accepts negative + values. Returns ------- diffed : Series + + See Also + -------- + Series.pct_change: Percent change over given number of periods. + Series.shift: Shift index by desired number of periods with an + optional time freq. + DataFrame.diff: First discrete difference of object + + Examples + -------- + Difference with previous row + + >>> s = pd.Series([1, 1, 2, 3, 5, 8]) + >>> s.diff() + 0 NaN + 1 0.0 + 2 1.0 + 3 1.0 + 4 2.0 + 5 3.0 + dtype: float64 + + Difference with 3rd previous row + + >>> s.diff(periods=3) + 0 NaN + 1 NaN + 2 NaN + 3 2.0 + 4 4.0 + 5 6.0 + dtype: float64 + + Difference with following row + + >>> s.diff(periods=-1) + 0 0.0 + 1 -1.0 + 2 -1.0 + 3 -2.0 + 4 -3.0 + 5 NaN + dtype: float64 """ result = algorithms.diff(com._values_from_object(self), periods) return self._constructor(result, index=self.index).__finalize__(self) diff --git a/pandas/plotting/_misc.py b/pandas/plotting/_misc.py index 03a06169d46bc..150c9274d4e5c 100644 --- a/pandas/plotting/_misc.py +++ b/pandas/plotting/_misc.py @@ -364,20 +364,51 @@ def f(t): def bootstrap_plot(series, fig=None, size=50, samples=500, **kwds): - """Bootstrap plot. + """ + Bootstrap plot on mean, median and mid-range statistics. + + The bootstrap plot is used to estimate the uncertainty of a statistic + by relaying on random sampling with replacement [1]_. This function will + generate bootstrapping plots for mean, median and mid-range statistics + for the given number of samples of the given size. + + .. [1] "Bootstrapping (statistics)" in \ + https://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29 Parameters ---------- - series: Time series - fig: matplotlib figure object, optional - size: number of data points to consider during each sampling - samples: number of times the bootstrap procedure is performed - kwds: optional keyword arguments for plotting commands, must be accepted - by both hist and plot + series : pandas.Series + Pandas Series from where to get the samplings for the bootstrapping. + fig : matplotlib.figure.Figure, default None + If given, it will use the `fig` reference for plotting instead of + creating a new one with default parameters. + size : int, default 50 + Number of data points to consider during each sampling. It must be + greater or equal than the length of the `series`. + samples : int, default 500 + Number of times the bootstrap procedure is performed. + **kwds : + Options to pass to matplotlib plotting method. Returns ------- - fig: matplotlib figure + fig : matplotlib.figure.Figure + Matplotlib figure + + See Also + -------- + pandas.DataFrame.plot : Basic plotting for DataFrame objects. + pandas.Series.plot : Basic plotting for Series objects. + + Examples + -------- + + .. plot:: + :context: close-figs + + >>> import numpy as np + >>> s = pd.Series(np.random.uniform(size=100)) + >>> fig = pd.plotting.bootstrap_plot(s) """ import random import matplotlib.pyplot as plt