Skip to content

Commit

Permalink
Propogating NaN values when using str.split (pandas-dev#18450) (panda…
Browse files Browse the repository at this point in the history
…s-dev#18462)

(cherry picked from commit 20f6512)
  • Loading branch information
WillAyd authored and TomAugspurger committed Dec 8, 2017
1 parent cbe24b6 commit 7a72e3e
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 1 deletion.
6 changes: 5 additions & 1 deletion doc/source/whatsnew/v0.21.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -140,9 +140,13 @@ Categorical
- ``CategoricalIndex`` can now correctly take a ``pd.api.types.CategoricalDtype`` as its dtype (:issue:`18116`)
- Bug in ``Categorical.unique()`` returning read-only ``codes`` array when all categories were ``NaN`` (:issue:`18051`)

String
^^^^^^

- :meth:`Series.str.split()` will now propogate ``NaN`` values across all expanded columns instead of ``None`` (:issue:`18450`)

Other
^^^^^

-
-
-
4 changes: 4 additions & 0 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -1423,6 +1423,10 @@ def cons_row(x):
return [x]

result = [cons_row(x) for x in result]
if result:
# propogate nan values to match longest sequence (GH 18450)
max_len = max(len(x) for x in result)
result = [x * max_len if x[0] is np.nan else x for x in result]

if not isinstance(expand, bool):
raise ValueError("expand must be True or False")
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/test_strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -2086,6 +2086,18 @@ def test_rsplit_to_multiindex_expand(self):
tm.assert_index_equal(result, exp)
assert result.nlevels == 2

def test_split_nan_expand(self):
# gh-18450
s = Series(["foo,bar,baz", NA])
result = s.str.split(",", expand=True)
exp = DataFrame([["foo", "bar", "baz"], [NA, NA, NA]])
tm.assert_frame_equal(result, exp)

# check that these are actually np.nan and not None
# TODO see GH 18463
# tm.assert_frame_equal does not differentiate
assert all(np.isnan(x) for x in result.iloc[1])

def test_split_with_name(self):
# GH 12617

Expand Down

0 comments on commit 7a72e3e

Please sign in to comment.