Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv with MultiIndex on columns doesn't respect level's dtypes #11728

Closed
duboism opened this issue Nov 30, 2015 · 3 comments
Closed

read_csv with MultiIndex on columns doesn't respect level's dtypes #11728

duboism opened this issue Nov 30, 2015 · 3 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv MultiIndex

Comments

@duboism
Copy link

duboism commented Nov 30, 2015

I want to store a dataframe with hierarchical column index in a CSV. The index contains both string and integer levels. If I'm correct this should be done like that:

import numpy as np
import pandas as pd
n = 10
p = 6
names = ['alphabetic', 'numeric']
levels = [['a', 'b'], [0, 1, 2]]
column_index = pd.MultiIndex.from_product(levels, names=names)
data = np.random.rand(n, p)
df = pd.DataFrame(data=data, columns=column_index)
df.to_csv('test.csv')
read_df = pd.io.parsers.read_csv('test.csv', header=[0, 1], index_col=0)
# This assertion fails
assert(np.all(df.columns == read_df.columns))

As mentioned, the assertion fails. The problem is that in read_df, the second level ('numeric') contains strings ('0', '1', '2') instead of numbers.

Versions:

>>> pd.util.print_versions.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-68-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.0+git8-gcac4ad2
nose: 1.3.1
pip: 1.5.4
setuptools: 3.3
Cython: 0.20.1post0
numpy: 1.8.2
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 1.2.1
sphinx: 1.2.2
patsy: 0.4.1
dateutil: 1.5
pytz: 2012c
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.4.3
matplotlib: 1.3.1
openpyxl: 2.3.0-b2
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.3.3
bs4: 4.2.1
html5lib: 0.999
httplib2: 0.8
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
@jreback
Copy link
Contributor

jreback commented Nov 30, 2015

this is a dupe of #9435. no easy way to do this ATM.

@jreback jreback closed this as completed Nov 30, 2015
@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv MultiIndex labels Nov 30, 2015
@duboism
Copy link
Author

duboism commented Dec 2, 2015

Are you sure it's a dupe? I have found #9435 when looking for a solution and I have tested the solution you proposed but it doesn't help. My problem is the dtype of the columns.

@jreback
Copy link
Contributor

jreback commented Dec 3, 2015

yes I am sure. Its a dupe because the dtype kw doesn't apply to an index column Index or MultiIndex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv MultiIndex
Projects
None yet
Development

No branches or pull requests

2 participants