Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow-up #20347: incorporate review about _get_series_list #20923

Merged
merged 3 commits into from
May 4, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/text.rst
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@ All one-dimensional list-likes can be arbitrarily combined in a list-like contai

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point of @TomAugspurger in #20347, responding to the line
All one-dimensional list-likes can be arbitrarily combined in a list-like container (including iterators, ``dict``-views, etc.):
(which is just above this line pointer):

@TomAugspurger

question: what happens with a dict? Do we use the keys, or does it transform to a series and get aligned?

Response 1:

I was talking about d.keys() or d.values(). For passing a dict directly, currently the keys would get read (as that's what x in d would return).

Response 2:

I could of course add another safety that maps d to d.values().
However, I tend to think that this could be left to the python-skills of the user as well -- a dictionary is not "list-like" from the POV of normal python usage (whereas its keys and values are, see d = dict(zip(keys, values))).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a dict is treated consistently here how we handle list-likes else where, we effectively call list on it

s
u
s.str.cat([u, pd.Index(u.values), ['A', 'B', 'C', 'D'], map(int, u.index)], na_rep='-')
s.str.cat([u, pd.Index(u.values), ['A', 'B', 'C', 'D'], map(str, u.index)], na_rep='-')

All elements must match in length to the calling ``Series`` (or ``Index``), except those having an index if ``join`` is not None:

Expand Down
4 changes: 2 additions & 2 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -314,7 +314,7 @@ The :func:`DataFrame.assign` now accepts dependent keyword arguments for python
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, :meth:`Series.str.cat` did not -- in contrast to most of ``pandas`` -- align :class:`Series` on their index before concatenation (see :issue:`18657`).
The method has now gained a keyword ``join`` to control the manner of alignment, see examples below and in :ref:`here <text.concatenate>`.
The method has now gained a keyword ``join`` to control the manner of alignment, see examples below and :ref:`here <text.concatenate>`.

In v.0.23 `join` will default to None (meaning no alignment), but this default will change to ``'left'`` in a future version of pandas.

Expand All @@ -325,7 +325,7 @@ In v.0.23 `join` will default to None (meaning no alignment), but this default w
s.str.cat(t)
s.str.cat(t, join='left', na_rep='-')

Furthermore, meth:`Series.str.cat` now works for ``CategoricalIndex`` as well (previously raised a ``ValueError``; see :issue:`20842`).
Furthermore, :meth:`Series.str.cat` now works for ``CategoricalIndex`` as well (previously raised a ``ValueError``; see :issue:`20842`).

.. _whatsnew_0230.enhancements.astype_category:

Expand Down
75 changes: 39 additions & 36 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -1943,21 +1943,21 @@ def _get_series_list(self, others, ignore_index=False):

Parameters
----------
input : Series, DataFrame, np.ndarray, list-like or list-like of
others : Series, DataFrame, np.ndarray, list-like or list-like of
objects that are either Series, np.ndarray (1-dim) or list-like
ignore_index : boolean, default False
Determines whether to forcefully align with index of the caller
Determines whether to forcefully align others with index of caller

Returns
-------
tuple : (input transformed into list of Series,
Boolean whether FutureWarning should be raised)
tuple : (others transformed into list of Series,
boolean whether FutureWarning should be raised)
"""

# once str.cat defaults to alignment, this function can be simplified;
# will not need `ignore_index` and the second boolean output anymore

from pandas import Index, Series, DataFrame, isnull
from pandas import Index, Series, DataFrame

# self._orig is either Series or Index
idx = self._orig if isinstance(self._orig, Index) else self._orig.index
Expand All @@ -1966,66 +1966,69 @@ def _get_series_list(self, others, ignore_index=False):
'list-like (either containing only strings or containing '
'only objects of type Series/Index/list-like/np.ndarray)')

# Generally speaking, all objects without an index inherit the index
# `idx` of the calling Series/Index - i.e. must have matching length.
# Objects with an index (i.e. Series/Index/DataFrame) keep their own
# index, *unless* ignore_index is set to True.
if isinstance(others, Series):
fu_wrn = not others.index.equals(idx)
warn = not others.index.equals(idx)
# only reconstruct Series when absolutely necessary
los = [Series(others.values, index=idx)
if ignore_index and fu_wrn else others]
return (los, fu_wrn)
if ignore_index and warn else others]
return (los, warn)
elif isinstance(others, Index):
fu_wrn = not others.equals(idx)
warn = not others.equals(idx)
los = [Series(others.values,
index=(idx if ignore_index else others))]
return (los, fu_wrn)
return (los, warn)
elif isinstance(others, DataFrame):
fu_wrn = not others.index.equals(idx)
if ignore_index and fu_wrn:
warn = not others.index.equals(idx)
if ignore_index and warn:
# without copy, this could change "others"
# that was passed to str.cat
others = others.copy()
others.index = idx
return ([others[x] for x in others], fu_wrn)
return ([others[x] for x in others], warn)
elif isinstance(others, np.ndarray) and others.ndim == 2:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback :

this is wrong
i don’t think we can align a ndarray at all like this
let’s can ndarray a that are > 1 dim

My response:
The DF-constructor works as expected for a 2-dim ndarray, but I haven't checked if this is tested behaviour. (essentially, df == DataFrame(df.values, columns=df.columns, index=df.index))

I would suggest not to can 2-dim ndarrays, because they are necessary to avoid alignment on the deprecation path for join:

[...] To disable alignment (the behavior before v.0.23) and silence this warning, use .values on any Series/Index/DataFrame in others. [...]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test for this, basically see what happens when I have a non-default index with the Series (e.g. 2,3, 4) or something and it gets aligned with the 0,1,2 of the ndarray-converted-to-DataFrame. It will 'work' but is wrong.

Copy link
Contributor Author

@h-vetinari h-vetinari May 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a general comment before all the cases for others because it seems this point wasn't clear:

# Generally speaking, all objects without an index inherit the index
# `idx` of the calling Series/Index - i.e. must have matching length.
# Objects with an index (i.e. Series/Index/DataFrame) keep their own
# index, *unless* ignore_index is set to True.

So, the case you mention does not happen, because an ndarray is always automatically aligned with the calling Series (of course, the lengths must match for this to work).

There are several tests with objects with different indexes, both with join is None (so with force-realign; successful but raising a warning), as well as with the different join-types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i c, ok then

others = DataFrame(others, index=idx)
return ([others[x] for x in others], False)
elif is_list_like(others):
others = list(others) # ensure iterators do not get read twice etc

# in case of list-like `others`, all elements must be
# either one-dimensional list-likes or scalars
if all(is_list_like(x) for x in others):
los = []
fu_wrn = False
warn = False
# iterate through list and append list of series for each
# element (which we check to be one-dimensional and non-nested)
while others:
nxt = others.pop(0) # list-like as per check above
# safety for iterators and other non-persistent list-likes
# do not map indexed/typed objects; would lose information
nxt = others.pop(0) # nxt is guaranteed list-like by above
if not isinstance(nxt, (DataFrame, Series,
Index, np.ndarray)):
# safety for non-persistent list-likes (e.g. iterators)
# do not map indexed/typed objects; info needed below
nxt = list(nxt)

# known types without deep inspection
# known types for which we can avoid deep inspection
no_deep = ((isinstance(nxt, np.ndarray) and nxt.ndim == 1)
or isinstance(nxt, (Series, Index)))
# Nested list-likes are forbidden - elements of nxt must be
# strings/NaN/None. Need to robustify NaN-check against
# x in nxt being list-like (otherwise ambiguous boolean)
# nested list-likes are forbidden:
# -> elements of nxt must not be list-like
is_legal = ((no_deep and nxt.dtype == object)
or all((isinstance(x, compat.string_types)
or (not is_list_like(x) and isnull(x))
or x is None)
for x in nxt))
or all(not is_list_like(x) for x in nxt))

# DataFrame is false positive of is_legal
# because "x in df" returns column names
if not is_legal or isinstance(nxt, DataFrame):
raise TypeError(err_msg)

nxt, fwn = self._get_series_list(nxt,
nxt, wnx = self._get_series_list(nxt,
ignore_index=ignore_index)
los = los + nxt
fu_wrn = fu_wrn or fwn
return (los, fu_wrn)
# test if there is a mix of list-like and non-list-like (e.g. str)
elif (any(is_list_like(x) for x in others)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback

you can make this simpler by just checking for all is not list like (eg strings)
anything else will fail thru to the TypeError

Done

and any(not is_list_like(x) for x in others)):
raise TypeError(err_msg)
else: # all elements in others are _not_ list-like
warn = warn or wnx
return (los, warn)
elif all(not is_list_like(x) for x in others):
return ([Series(others, index=idx)], False)
raise TypeError(err_msg)

Expand Down Expand Up @@ -2187,8 +2190,8 @@ def cat(self, others=None, sep=None, na_rep=None, join=None):

try:
# turn anything in "others" into lists of Series
others, fu_wrn = self._get_series_list(others,
ignore_index=(join is None))
others, warn = self._get_series_list(others,
ignore_index=(join is None))
except ValueError: # do not catch TypeError raised by _get_series_list
if join is None:
raise ValueError('All arrays must be same length, except '
Expand All @@ -2199,7 +2202,7 @@ def cat(self, others=None, sep=None, na_rep=None, join=None):
'must all be of the same length as the '
'calling Series/Index.')

if join is None and fu_wrn:
if join is None and warn:
warnings.warn("A future version of pandas will perform index "
"alignment when `others` is a Series/Index/"
"DataFrame (or a list-like containing one). To "
Expand Down