Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.str._validate should infer for Series, not raise for all-na Index #23163

Closed
h-vetinari opened this issue Oct 15, 2018 · 0 comments · Fixed by #23167
Closed

.str._validate should infer for Series, not raise for all-na Index #23163

h-vetinari opened this issue Oct 15, 2018 · 0 comments · Fixed by #23167
Labels
Bug Strings String extension data type and string data
Milestone

Comments

@h-vetinari
Copy link
Contributor

h-vetinari commented Oct 15, 2018

While working on #22725, a couple of work-arounds were necessary to correctly raise on wrong data types hiding as objects, e.g.

>>> pd.Series([1, 2, 3], dtype=object).str.cat([1, 2, 3])

However, already the .str accessor itself should raise on __init__ resp. the internal _validate method (this is closely related to #23011), i.e. instead of

>>> pd.Series([1,2,3], dtype=object).str
<pandas.core.strings.StringMethods object at 0x000002A4C70AE198>

it should be

>>> pd.Series([1,2,3], dtype=object).str
AttributeError

Interestingly, Index does correctly infer already in .str._validate:

>>> pd.Index([1,2,3], dtype=object).str
AttributeError: Can only use .str accessor with string values (i.e. inferred_type is 'string', 'unicode' or 'mixed')

However, there is another nit about that that I want to fix at the same time as the inferral for Series - namely that an all-na object Index (or Series) should not raise the AttributeError (currently it does because all-na gets inferred as float). There are legitimate cases where a selection of string data may be all-na (by alignment or whatever), and if the dtype is object then this shouldn't fail.

Edit: xref #9343 #13877

@jreback jreback added Bug Strings String extension data type and string data labels Oct 17, 2018
@jreback jreback added this to the 0.24.0 milestone Oct 17, 2018
@jreback jreback modified the milestones: 0.24.0, Contributions Welcome Nov 25, 2018
@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 May 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants