Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate str.split return_type #10085

Closed
wants to merge 1 commit into from

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented May 8, 2015

Closes #9847, Closes #9870. Default value is expand=False to be compat withreturn_type='series'. May better to change the default to True in future (and show warning about it)?

CC @jreback @jorisvandenbossche @sreejata

@sinhrks sinhrks added the Strings String extension data type and string data label May 8, 2015
@sinhrks sinhrks added this to the 0.16.1 milestone May 8, 2015
@sinhrks sinhrks force-pushed the str_split_expand branch from 1346295 to e07da48 Compare May 8, 2015 21:25
@@ -742,10 +735,7 @@ def str_split(arr, pat=None, n=None, return_type='series'):
n = 0
regex = re.compile(pat)
f = lambda x: regex.split(x, maxsplit=n)
if return_type == 'frame':
res = DataFrame((Series(x) for x in _na_map(f, arr)), index=arr.index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this what expand=True does (and btw, can easily fix #10081) at the same time, just take out the use of Series here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the logic is replaced to _wrap_results_expand.

I think #10081 should be considered separately, because we can't simply use workaround for all the cases.

s = pd.Series([1.1, '2.2'])

# current behavior
s.str.split('.', expand=True)
#      0    1
# 0  NaN  NaN
# 1    2    2

# workaround
pd.read_table(StringIO(s.to_csv(None, index=None)), sep='.')
#    1  1.1
# 0  2    2

# numpy
pd.DataFrame(list(np.core.defchararray.split(s.values.astype(str), '.')))
#    0  1
# 0  1  1
# 1  2  2

@jreback jreback added the Deprecate Functionality to remove in pandas label May 8, 2015
@@ -221,6 +221,28 @@ enhancements are performed to make string operation easier.
idx.str.startswith('a')
s[s.index.str.startswith('a')]


- ``split`` now takes ``expand`` keyword to specify returning dimensionality. ``return_type`` is deprecated. (:issue:`9847`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"returning dimensionality" should be "expanding dimensionality"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about "specify whether to expand dimensionality"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

On May 8, 2015, at 8:59 PM, Sinhrks [email protected] wrote:

In doc/source/whatsnew/v0.16.1.txt:

@@ -221,6 +221,28 @@ enhancements are performed to make string operation easier.
idx.str.startswith('a')
s[s.index.str.startswith('a')]

+- split now takes expand keyword to specify returning dimensionality. return_type is deprecated. (:issue:9847)
how about "specify whether to expand dimensionality"?


Reply to this email directly or view it on GitHub.

@sinhrks
Copy link
Member Author

sinhrks commented May 9, 2015

@jreback, @mortada Thanks, updated.

@jorisvandenbossche
Copy link
Member

Looks good!

Another thing I encountered while reviewing: there is no good explanation of the n kwarg of split in the docstring, and the default differs between str_split/docstring (default of None) and the signature of split itself (default of -1)

@sinhrks sinhrks force-pushed the str_split_expand branch from fe13f04 to a5403a5 Compare May 9, 2015 12:43
@sinhrks sinhrks force-pushed the str_split_expand branch from a5403a5 to 4e08839 Compare May 9, 2015 12:46
@sinhrks
Copy link
Member Author

sinhrks commented May 9, 2015

@jorisvandenbossche Explanation of n is written as notes, thus moved it to main part. Also, modified the docstring to meet the str.split without changing the default values. Or may better to change defaults as None and -1 has the same meaning?

I also found current arg names are different from standard str.split (sep and maxsplit ) ...

@jreback
Copy link
Contributor

jreback commented May 9, 2015

merged via 8b89842

@sinhrks thank you!

if you have minor doc changes, pls fee free to make another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: return_type argument in StringMethods.split()
4 participants