-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_csv in Pandas 0.14 loads NaNs when namedtuple is used for column names #7589
Comments
see here about the change in I don't think this is a bug, using |
OK, thanks loads for your help! I guess we'll move to using hierarchical indexes and, on each level, we'll just use strings. |
For our own project, we are now using MultiIndex for our columns (e.g. But I am a bit confused about whether or not it's OK to use tuples as column names in Pandas DataFrames. The Pandas docs say:
In [326]: Series(randn(8), index=tuples)
Out[326]:
(bar, one) -0.557549
(bar, two) 0.126204
(baz, one) 1.643615
(baz, two) -0.067716
(foo, one) 0.127064
(foo, two) 0.396144
(qux, one) 1.043289
(qux, two) -0.229627
dtype: float64 And, in my testing, vanilla tuples do seem fine as column names (in both v0.13.1 and v0.14). And On the other hand, you said that "using tuples (or tuple-like) as a name of a column is just asking trouble as these represent multi-indexes." and It sounds like tuples can work but they're unsafe and they might become totally unusable as column names in the future of Pandas. Is that the correct conclusion? |
Nothing wrong from you using tuples. They just IMHO don't offer any benefit over multi-indexes. If they work for you, then great. I don't mean to imply they are unsafe, that's my 2c in that selection is just VERY confusing in written code e.g. I don't think the usage will change in the future. I'll mark this issue as a bug. Pls feel free to submit a pull-request to fix!. |
This was a breaking change in 0.14. See discussion in #3323. as for "these... just asking trouble as these represent multi-indexes." - |
Thanks loads for the replies, @jreback and @armaganthis3 . For our own project, I think we will stick with using MultiIndex instead of namedtuples. It is, quite probably, a better solution than using namedtuples anyway (as @jreback points out). So I'm afraid I'm unlikely to find time to hack away at Pandas to try to explore this bug with namedtuples, I'm sorry. |
this looks working in master:
for both versions, so can close with a test |
@jreback: Just to be clear, you would like tests to ensure that both tuples and namedtuples give rise to multiindices, right? So something like from collections import namedtuple
from io import StringIO
import pandas.util.testing as tm
from pandas import DataFrame, MultiIndex, read_csv
TestTuple = namedtuple('columns', ['first', 'second'])
CSV = """foo,bar
baz,baz
1,2
3,4"""
expected_columns = MultiIndex(
levels=[['foo', 'bar'], ['baz']],
labels=[[0, 1], [0, 0]]
)
expected_df = DataFrame(data=[[1, 2], [3, 4]], columns=expected_columns)
multi_df = read_csv(StringIO(CSV), header=[0, 1])
tm.assert_frame_equal(expected_df, multi_df, check_column_type=True)
tuple_df = read_csv(
StringIO(CSV),
header=None,
skiprows=2,
names=[('foo', 'baz'), ('bar', 'baz')]
)
tm.assert_frame_equal(expected_df, tuple_df, check_column_type=True)
namedtuple_df = read_csv(
StringIO(CSV),
header=None,
skiprows=2,
names=[TestTuple('foo', 'baz'), TestTuple('bar', 'baz')]
)
tm.assert_frame_equal(expected_df, namedtuple_df, check_column_type=True) would work? |
Using a
namedtuple
as a column name forread_csv
in Pandas 0.14 resultsNaNs
being loaded.Here is a simple demonstration of the problem (this code works in Pandas 0.13.1):
Pandas 0.14, this is the output:
Strangely enough, Pandas 0.14 works fine if we used a tuple instead of a namedtuple:
Here is the output:
So, for some reason,
read_csv
in Pandas 0.14 doesn't like using anamedtuple
as a column name. (The ugly fix is to not pass any column names toread_csv
and then, once the DataFrame is loaded, replace the column names withdf.columns = [TestTuple('foo')]
)(Really love Pandas by the way, thanks so much for all your work!)
My software versions:
The text was updated successfully, but these errors were encountered: