read_csv with MultiIndex on columns doesn't respect level's dtypes #11728

duboism · 2015-11-30T19:45:43Z

I want to store a dataframe with hierarchical column index in a CSV. The index contains both string and integer levels. If I'm correct this should be done like that:

import numpy as np
import pandas as pd
n = 10
p = 6
names = ['alphabetic', 'numeric']
levels = [['a', 'b'], [0, 1, 2]]
column_index = pd.MultiIndex.from_product(levels, names=names)
data = np.random.rand(n, p)
df = pd.DataFrame(data=data, columns=column_index)
df.to_csv('test.csv')
read_df = pd.io.parsers.read_csv('test.csv', header=[0, 1], index_col=0)
# This assertion fails
assert(np.all(df.columns == read_df.columns))

As mentioned, the assertion fails. The problem is that in read_df, the second level ('numeric') contains strings ('0', '1', '2') instead of numbers.

Versions:

>>> pd.util.print_versions.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-68-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.0+git8-gcac4ad2
nose: 1.3.1
pip: 1.5.4
setuptools: 3.3
Cython: 0.20.1post0
numpy: 1.8.2
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 1.2.1
sphinx: 1.2.2
patsy: 0.4.1
dateutil: 1.5
pytz: 2012c
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.4.3
matplotlib: 1.3.1
openpyxl: 2.3.0-b2
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.3.3
bs4: 4.2.1
html5lib: 0.999
httplib2: 0.8
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None

The text was updated successfully, but these errors were encountered:

jreback · 2015-11-30T21:20:16Z

this is a dupe of #9435. no easy way to do this ATM.

duboism · 2015-12-02T23:29:09Z

Are you sure it's a dupe? I have found #9435 when looking for a solution and I have tested the solution you proposed but it doesn't help. My problem is the dtype of the columns.

jreback · 2015-12-03T00:05:10Z

yes I am sure. Its a dupe because the dtype kw doesn't apply to an index column Index or MultiIndex.

jreback closed this as completed Nov 30, 2015

jreback added Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv MultiIndex labels Nov 30, 2015

makmanalp mentioned this issue Nov 30, 2015

index_col in read_csv and read_table ignores dtype argument #9435

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_csv with MultiIndex on columns doesn't respect level's dtypes #11728

read_csv with MultiIndex on columns doesn't respect level's dtypes #11728

duboism commented Nov 30, 2015

jreback commented Nov 30, 2015

duboism commented Dec 2, 2015

jreback commented Dec 3, 2015

read_csv with MultiIndex on columns doesn't respect level's dtypes #11728

read_csv with MultiIndex on columns doesn't respect level's dtypes #11728

Comments

duboism commented Nov 30, 2015

jreback commented Nov 30, 2015

duboism commented Dec 2, 2015

jreback commented Dec 3, 2015