-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_table fails with MultiIndex input and delim_whitespace=True #6893
Comments
@mcwitt can you address for 0.14? |
Yep, I've made some progress on this. Should have a PR soon... |
OK, I've fixed the bug in PythonParser. The C parser seems to have its own code for this, and I'm not sure yet whether it had been designed to support specifying a MultiIndex in this way. |
hmm. its only the header detection code which is pretty straightforward (e.g. you have to read the header line-by-line based on the spec |
I think this is a separate issue from header detection. The original title of this issue was wrong, this doesn't have to do with MI columns. This has to do with automatically setting the index columns to the 2nd row when the sum of the 1st and 2nd rows equals the 3rd, i.e. In [8]: text = """ A B C D E
one two three four
a b 10.0032 5 -0.5109 -2.3358 -0.4645 0.05076 0.3640
a q 20 4 0.4473 1.4152 0.2834 1.00661 0.1744
x q 30 3 -0.6662 -0.5243 -0.3580 0.89145 2.5838"""
In [9]: pd.read_table(StringIO(text), sep='\s+', engine='python')
Out[9]:
A B C D E
one two three four
a b 10.0032 5 -0.5109 -2.3358 -0.4645 0.05076 0.3640
q 20.0000 4 0.4473 1.4152 0.2834 1.00661 0.1744
x q 30.0000 3 -0.6662 -0.5243 -0.3580 0.89145 2.5838
[3 rows x 5 columns] The code for dealing with this in the python parser is in |
Made a PR for the fix for the python parser. The fix for the C parser will be a bit more involved so I think it would be good to do this in a separate PR. Essentially I think we need to duplicate the functionality in |
can you detect the condition in c-parser and raise? |
moving this to 0.14.1 (though we'll merge your pythonparse fix after release notes). and if you can detect the condition in the c-parser would be great (and just raise for now) |
@mcwitt any thoughts on fixing the c-parser for this? |
I'm still thinking about this. I think I will try to implement the fix described above, but I will be too busy to look at it for the next two weeks (traveling to a conference). I should be able to get it done for 0.14.1 though. |
gr8! |
Update: I'm close to a fix for this! Hopefully can make a PR tomorrow. |
thanks.....easier to revert this PR, can address in 0.15.0 when you have time! |
Hey @jreback . Just received this bug for triage. It's still a reproducible issue as of '1.1.0.dev0+1247.g9c317322f' |
and it’s an open issue if you want to submit a PR |
So, I've been the last 4 days trying to figure out how this tokenizer function works. I've initially narrowed down the test case to a simples one: import pandas as pd @jreback it doesn't seem related to MultiIndex specifically. Any ammount of indexes included on the data (is it called "implicit indexes"?) hits the bug. Do you think it's reasonable to change the bug title to something more appropriate? |
|
Related #6889
Example:
This (partially) works if
delim_whitespace=True
is replaced withsep='\s+', engine='python'
(although columns A-D are lost):The text was updated successfully, but these errors were encountered: