Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reindex(fill_value=None) fills with np.NaN instead of None #14188

Closed
nekobon opened this issue Sep 8, 2016 · 9 comments
Closed

reindex(fill_value=None) fills with np.NaN instead of None #14188

nekobon opened this issue Sep 8, 2016 · 9 comments
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Usage Question

Comments

@nekobon
Copy link

nekobon commented Sep 8, 2016

Code Sample, a copy-pastable example if possible

In [12]: import pandas as pd

In [13]: s = pd.Series(['a', 'b'])

In [14]: s.reindex([0,1,2], fill_value=None)
Out[14]: 
0      a
1      b
2    NaN
dtype: object

Expected Output

0      a
1      b
2    None
dtype: object

output of pd.show_versions()

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-34-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0
nose: None
pip: 8.1.2
setuptools: 25.2.0
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 1.1.0
sphinx: 1.2b1
patsy: 0.4.1
dateutil: 2.4.2
pytz: 2013b
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: None
openpyxl: 2.3.0
xlrd: 0.9.3
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.8
pymysql: None
psycopg2: None
jinja2: 2.7
boto: None

in source

This is happening in BlockManager.reindex_indexer()
https://github.com/pydata/pandas/blob/8af626474f6f314527a9ad3f15403aa2dd8c402d/pandas/core/internals.py#L3820-L3822

@jreback
Copy link
Contributor

jreback commented Sep 8, 2016

this is by definition
np.nan is the missing indicator

@jreback
Copy link
Contributor

jreback commented Sep 8, 2016

@nekobon
Copy link
Author

nekobon commented Sep 8, 2016

I think it's confusing to get NaN when we give fill_value=None explicitly, without warnings or exceptions.

According to reindex's document http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reindex.html:

fill_value : scalar, default np.NaN
    Value to use for missing values. Defaults to NaN, but can be any “compatible” value

Perhaps we can improve the document to mention None is not a compatible value there.

@jreback
Copy link
Contributor

jreback commented Sep 8, 2016

the default argument is None, meaning it's not passed

this is fairly standard convention

what I would take for documentation is a small section in. missing.rst to add that strings use np.nan as the missing value

near the top

@nekobon
Copy link
Author

nekobon commented Sep 9, 2016

I see that not supporting None here is consistent with this issue on fillna.

It's true that None by default is standard in python, but it's also common to use a sentinel object (object()) when None could be meaningful. This would let us use None on both fillna and fill_value, and I think it's an improvement. What do you think?

@jreback
Copy link
Contributor

jreback commented Sep 9, 2016

you could use s sentinel but we don't allow None filling for a variety of reasons

@nekobon
Copy link
Author

nekobon commented Sep 9, 2016

Could you share with me some of those reasons? I've been using None in Series and DataFrames, so I'm curious about why we shouldn't.

@jreback
Copy link
Contributor

jreback commented Sep 9, 2016

@nekobon
Copy link
Author

nekobon commented Sep 9, 2016

Thank you

@nekobon nekobon closed this as completed Sep 9, 2016
@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Usage Question labels Sep 9, 2016
@jreback jreback added this to the No action milestone Sep 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Usage Question
Projects
None yet
Development

No branches or pull requests

2 participants