Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: list of lists in Series.str.cat #21950

Closed
h-vetinari opened this issue Jul 17, 2018 · 1 comment · Fixed by #22264
Closed

DEPR: list of lists in Series.str.cat #21950

h-vetinari opened this issue Jul 17, 2018 · 1 comment · Fixed by #22264
Labels
Deprecate Functionality to remove in pandas Strings String extension data type and string data
Milestone

Comments

@h-vetinari
Copy link
Contributor

The .str.cat-method is the only one in the str-accessor that takes another Series as an argument, and as such, is a bit of a special case (e.g. it had no index alignment until v0.23).

It makes sense to support lists of objects which get concatenated sequentially, and list of lists have been supported since at least v0.17, see https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.str.cat.html

When I wrote #20347, I tried very hard to keep signature backwards-compatible, and the example from the v0.17-22 docs working:

>>> Series(['a', 'b']).str.cat([['x', 'y'], ['1', '2']], sep=',')
0    a,x,1
1    b,y,2
dtype: object

However, this added lots of complexity, and I think that this should be simplified, especially in light of @TomAugspurger's comment in #21894

As a reminder, the plan is to have no new deprecations in 0.25.x and 1.0.0. So this [v0.24] is the last round of deprecations before 1.0.

My suggestion is to modify the allowed combinations (as of v0.23) as follows:

Type of "others"                        |  action  |  comment
---------------------------------------------------------------------
list-like of strings                    |   keep   |  as before; mimics behavior elsewhere,
                                                      cf.: pd.Series(range(3)) + [2,4,6]
Series                                  |   keep   |
np.ndarray (1-dim)                      |   keep   |
DataFrame                               |   keep   |  sequential concatenation
np.ndarray (2-dim)                      |   keep   |  sequential concatenation
list-like of
    Series/Index/np.ndarray (1-dim)     |   keep   |  sequential concatenation
list-like containing list-likes (1-dim)
    other than Series/Index/np.ndarray  |   DEPR   |  sequential concatenation

In other words, if the user wants sequential concatenation, there are many possibilities available, and list-of-lists does not have to be one of them, IMO. This would substantially simplify (post-deprecation) the code for str.cat._get_series_list, which is currently a bit complicated. https://github.com/pandas-dev/pandas/blob/v0.23.3/pandas/core/strings.py#L2089

Finally, for completeness, the example from the v0.17-22 docs has been removed for v0.23, but there are two examples in https://pandas.pydata.org/pandas-docs/stable/text.html#concatenating-a-series-and-many-objects-into-a-series that would fall under the deprecation I'm suggesting.

@h-vetinari h-vetinari mentioned this issue Jul 17, 2018
34 tasks
@h-vetinari
Copy link
Contributor Author

h-vetinari commented Jul 18, 2018

To check for myself, and as a little "advert" for this proposal, I wanted to see what _get_series_list would look like with the above deprecation (and removal of the FutureWarning, once str.cat aligns by default). This is to be compared with https://github.com/pandas-dev/pandas/blob/v0.23.3/pandas/core/strings.py#L2089

def _get_series_list(self, others):
    """
    Auxiliary function for :meth:`str.cat`. Turn potentially mixed input
    into a list of Series (elements without an index must match the length
    of the calling Series/Index).

    Parameters
    ----------
    others : Series, DataFrame, np.ndarray, list-like or list-like of
        objects that are either Series, Index or np.ndarray (1-dim)

    Returns
    -------
    list : others transformed into list of Series
    """
    from pandas import Index, Series, DataFrame

    # self._orig is either Series or Index
    idx = self._orig if isinstance(self._orig, Index) else self._orig.index

    # Generally speaking, all objects without an index inherit the index
    # `idx` of the calling Series/Index - i.e. must have matching length.
    # Objects with an index (i.e. Series/Index/DataFrame) keep their own.
    if isinstance(others, Series):
        return [others]
    elif isinstance(others, Index):
        return [Series(others.values, index=others)]
    elif isinstance(others, DataFrame):
        return [others[x] for x in others]
    elif isinstance(others, np.ndarray) and others.ndim == 2:
        others = DataFrame(others, index=idx)
        return [others[x] for x in others]
    elif is_list_like(others):
        others = list(others)  # ensure iterators do not get read twice etc

        # in case of list-like `others`, all elements must be
        # either Series/Index/np.ndarray (1-dim)...
        if all(isinstance(x, (Series, Index))
               or (isinstance(x, np.ndarray) and x.ndim == 1) for x in others):
            los = []
            while others:  # iterate through list and append each element
                los = los + self._get_series_list(others.pop(0))
            return los
        # ... or just strings
        elif all(not is_list_like(x) for x in others):
            return [Series(others, index=idx)]
    raise TypeError('others must be Series, Index, DataFrame, np.ndarrary or '
                    'list-like (either containing only strings or containing '
                    'only objects of type Series/Index/np.ndarray[1-dim])')

@gfyoung gfyoung added Strings String extension data type and string data Deprecate Functionality to remove in pandas labels Jul 18, 2018
@jreback jreback added this to the 0.24.0 milestone Aug 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants