Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix reading multi-index data in python parser #7029

Merged
merged 1 commit into from
May 6, 2014

Conversation

mcwitt
Copy link
Contributor

@mcwitt mcwitt commented May 3, 2014

partial fix for #6893

The python parser has a problem reading data with a multi-index specified in the row following the header, for example

In [3]: text = """                      A       B       C       D        E
one two three   four
a   b   10.0032 5    -0.5109 -2.3358 -0.4645  0.05076  0.3640
a   q   20      4     0.4473  1.4152  0.2834  1.00661  0.1744
x   q   30      3    -0.6662 -0.5243 -0.3580  0.89145  2.5838"""

In [4]: pd.read_table(StringIO(text), sep='\s+', engine='python')
Out[4]: 
                           E
one two three   four        
a   b   10.0032 5     0.3640
    q   20.0000 4     0.1744
x   q   30.0000 3     2.5838

[3 rows x 1 columns]

(the C parser doesn't make it this far, see #6893)

This PR fixes the bug in the python parser:

In [4]: pd.read_table(StringIO(text), sep='\s+', engine='python')
Out[4]: 
                           A       B       C        D       E
one two three   four                                         
a   b   10.0032 5    -0.5109 -2.3358 -0.4645  0.05076  0.3640
    q   20.0000 4     0.4473  1.4152  0.2834  1.00661  0.1744
x   q   30.0000 3    -0.6662 -0.5243 -0.3580  0.89145  2.5838

[3 rows x 5 columns]

@@ -1383,7 +1383,7 @@ def __init__(self, f, **kwds):
# multiple date column thing turning into a real spaghetti factory
if not self._has_complex_date_col:
(index_names,
self.orig_names, columns_) = self._get_index_name(self.columns)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't ignore columns here, we will lose the columns added from the row specifying the index...

@jreback
Copy link
Contributor

jreback commented May 5, 2014

ok, pls add a release note (even if it doesn't completely solve)

@jreback jreback added this to the 0.14.0 milestone May 5, 2014
@mcwitt
Copy link
Contributor Author

mcwitt commented May 6, 2014

@jreback OK, release note added. OK to open another PR for the C parser fix or at least to detect the condition?

@jreback
Copy link
Contributor

jreback commented May 6, 2014

go ahead and open a new issue for the c-parser (reference the original issue too and this fix)

put the release note with the other changes for the parser (just move right below)
are their any tests for this? or were the others just disabled?

@mcwitt
Copy link
Contributor Author

mcwitt commented May 6, 2014

go ahead and open a new issue for the c-parser (reference the original issue too and this fix)

would it make sense just to leave the original issue open, since this is only a partial fix?

are their any tests for this? or were the others just disabled?

sort of... test_read_table_buglet_4x_multiindex was previously in ParserTests but I copied it to TestPythonParser in #6889 since it had been falling back to python. For the C engine we currently just test that it produces a CParserError. However the test only checks that the index is read correctly and so it didn't expose the bug fixed in this PR. Now I've added a bit to the bottom that does cover this (testing now, will commit soon).

@jreback
Copy link
Contributor

jreback commented May 6, 2014

ok...we can leave the original issue open, no prob

let me know when you have rebased and green

@mcwitt
Copy link
Contributor Author

mcwitt commented May 6, 2014

@jreback green!

jreback added a commit that referenced this pull request May 6, 2014
BUG: fix reading multi-index data in python parser
@jreback jreback merged commit 85809e8 into pandas-dev:master May 6, 2014
@jreback
Copy link
Contributor

jreback commented May 6, 2014

thanks!

ok....so come back to the open issue at some point (its in 0.14.1), the projected bug-fix release
so have some time (prob 1 mo after 0.14)

@mcwitt mcwitt deleted the csv_mi_bug branch May 6, 2014 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants