-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: read_csv skipfooter fails with invalid quoted line #15910
Comments
@chris-b1 : Could you post the full the stacktrace? I presume that that error message is coming from Python's |
yep, edited in the top comment |
Awesome. Yep, I think your diagnosis is correct. I can quickly patch that. |
Deeper analysis indicates that you can successfully parse this with the C engine on pd.read_csv(StringIO('''Date,Value
1/1/2012,100.00
1/2/2012,102.00
"a quoted junk row"morejunk''')
Date Value
0 1/1/2012 100.0
1 1/2/2012 102.0
2 a quoted junk rowmorejunk NaN However, the Python cannot read this correctly (with or without the @chris-b1 : What do you think? |
Here's a simpler example that we can use: >>> data = 'a\n1\n"a"b'
>>> read_csv(StringIO(data), engine='c')
a
0 1
1 ab
>>>
>>> read_csv(StringIO(data), engine='python')
...
_csv.Error: ',' expected after '"'
>>>
>>> read_csv(StringIO(data), engine='python', skipfooter=1)
...
_csv.Error: ',' expected after '"' |
This inconsistency notwithstanding, it would still be worthwhile to properly catch errors there at that |
Yeah, it does seem like that should parse. builtin csv reader doesn't complain import csv
data = 'a\n1\n"a"b'
list(csv.reader(StringIO(data)))
Out[16]: [['a'], ['1'], ['ab']] |
Oh, interesting...does your original example work with |
It does using defaults, but not with |
Ah, that's the reason then. Hmmm...seems like we wouldn't consider that malformed though. Well, as we can't "fix" the Python parser, I think we can add the test at least though. |
Actually, here's a "fix" (it just goes to show how broken regex splitting in the Python engine is): >>> data = 'a\n1\n"a"b'
>>> read_csv(StringIO(data), engine='python', sep='pandas')
a
0 1
1 ab |
**_ engine='c' does the job for me. Finally got my task working after a huge but simple hurdle. Thank you! _** |
Code Sample, a copy-pastable example if possible
Problem description
This error only happens if the last row has quoting, and is invalid - e.g. delete the
morejunk
above and it does not error.Expected Output
successful parse
pandas 0.19.2
The text was updated successfully, but these errors were encountered: