BUG: read_csv skipfooter fails with invalid quoted line #15910

chris-b1 · 2017-04-05T20:09:10Z

Code Sample, a copy-pastable example if possible

from pandas.compat import StringIO

pd.read_csv(StringIO('''Date,Value
1/1/2012,100.00
1/2/2012,102.00
"a quoted junk row"morejunk'''),  skipfooter=1)

Out[21]
ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 20))

---------------------------------------------------------------------------
Error                                     Traceback (most recent call last)
<ipython-input-34-d8dff6b9f4a7> in <module>()
      2 1/1/2012,100.00
      3 1/2/2012,102.00
----> 4 "a quoted junk row" '''),  skipfooter=1)

C:\Users\chris.bartak\Documents\python-dev\pandas\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    651                     skip_blank_lines=skip_blank_lines)
    652 
--> 653         return _read(filepath_or_buffer, kwds)
    654 
    655     parser_f.__name__ = name

C:\Users\chris.bartak\Documents\python-dev\pandas\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    404 
    405     try:
--> 406         data = parser.read()
    407     finally:
    408         parser.close()

C:\Users\chris.bartak\Documents\python-dev\pandas\pandas\io\parsers.py in read(self, nrows)
    977                 raise ValueError('skipfooter not supported for iteration')
    978 
--> 979         ret = self._engine.read(nrows)
    980 
    981         if self.options.get('as_recarray'):

C:\Users\chris.bartak\Documents\python-dev\pandas\pandas\io\parsers.py in read(self, rows)
   2066     def read(self, rows=None):
   2067         try:
-> 2068             content = self._get_lines(rows)
   2069         except StopIteration:
   2070             if self._first_chunk:

C:\Users\chris.bartak\Documents\python-dev\pandas\pandas\io\parsers.py in _get_lines(self, rows)
   2717                         while True:
   2718                             try:
-> 2719                                 new_rows.append(next(source))
   2720                                 rows += 1
   2721                             except csv.Error as inst:

Error: ',' expected after '"'

Problem description

This error only happens if the last row has quoting, and is invalid - e.g. delete the morejunk above and it does not error.

Expected Output

successful parse

pandas 0.19.2

The text was updated successfully, but these errors were encountered:

chris-b1 · 2017-04-05T20:11:05Z

Hmm, I guess this is the same as #13879 - although the PR to improve the error message doesn't seem to have caught this case cc @gfyoung

gfyoung · 2017-04-05T20:13:22Z

@chris-b1 : Could you post the full the stacktrace? I presume that that error message is coming from Python's csv library but would like to double check (no access to computer ATM).

chris-b1 · 2017-04-05T20:14:44Z

yep, edited in the top comment

gfyoung · 2017-04-05T20:17:17Z

Awesome. Yep, I think your diagnosis is correct. I can quickly patch that.

gfyoung · 2017-04-05T20:37:06Z

Deeper analysis indicates that you can successfully parse this with the C engine on master:

pd.read_csv(StringIO('''Date,Value
1/1/2012,100.00
1/2/2012,102.00
"a quoted junk row"morejunk''')

                        Date  Value
0                   1/1/2012  100.0
1                   1/2/2012  102.0
2  a quoted junk rowmorejunk    NaN

However, the Python cannot read this correctly (with or without the skipfooter argument). I'm not sure why the Python engine would complain about this. This parsing seems correct from the C engine.

@chris-b1 : What do you think?

gfyoung · 2017-04-05T20:38:53Z

Here's a simpler example that we can use:

>>> data = 'a\n1\n"a"b'
>>> read_csv(StringIO(data), engine='c')
    a
0   1
1  ab
>>>
>>> read_csv(StringIO(data), engine='python')
...
_csv.Error: ',' expected after '"'
>>>
>>> read_csv(StringIO(data), engine='python', skipfooter=1)
...
_csv.Error: ',' expected after '"'

gfyoung · 2017-04-05T20:45:00Z

This inconsistency notwithstanding, it would still be worthwhile to properly catch errors there at that try-except block. A PR can go up for that at the very least.

chris-b1 · 2017-04-05T21:09:31Z

Yeah, it does seem like that should parse. builtin csv reader doesn't complain

import csv
data = 'a\n1\n"a"b'
list(csv.reader(StringIO(data)))

Out[16]: [['a'], ['1'], ['ab']]

gfyoung · 2017-04-05T21:10:44Z

Oh, interesting...does your original example work with csv.reader(StringIO(...)) ? Maybe try passing in strict=True to csv.reader as well?

chris-b1 · 2017-04-05T21:13:01Z

It does using defaults, but not with strict=True

gfyoung · 2017-04-05T21:17:32Z

Ah, that's the reason then. Hmmm...seems like we wouldn't consider that malformed though. Well, as we can't "fix" the Python parser, I think we can add the test at least though.

Closes pandas-devgh-15910.

gfyoung · 2017-04-06T02:24:25Z

Actually, here's a "fix" (it just goes to show how broken regex splitting in the Python engine is):

>>> data = 'a\n1\n"a"b'
>>> read_csv(StringIO(data), engine='python', sep='pandas')
    a
0   1
1  ab

Closes pandas-devgh-15910.

Closes gh-15910.

shivampatel16 · 2019-09-03T17:59:06Z

Here's a simpler example that we can use:

>>> data = 'a\n1\n"a"b'
>>> read_csv(StringIO(data), engine='c')
    a
0   1
1  ab
>>>
>>> read_csv(StringIO(data), engine='python')
...
_csv.Error: ',' expected after '"'
>>>
>>> read_csv(StringIO(data), engine='python', skipfooter=1)
...
_csv.Error: ',' expected after '"'

**_

engine='c' does the job for me. Finally got my task working after a huge but simple hurdle.

Thank you!

_**

chris-b1 added Bug IO CSV read_csv, to_csv labels Apr 5, 2017

chris-b1 added this to the Next Major Release milestone Apr 5, 2017

chris-b1 added the Error Reporting Incorrect or improved errors from pandas label Apr 5, 2017

gfyoung added a commit to forking-repos/pandas that referenced this issue Apr 6, 2017

BUG: Standardize malformed row handling in Python engine

2a3b1fe

Closes pandas-devgh-15910.

gfyoung added a commit to forking-repos/pandas that referenced this issue Apr 6, 2017

BUG: Standardize malformed row handling in Python engine

e0157d6

Closes pandas-devgh-15910.

gfyoung mentioned this issue Apr 6, 2017

BUG: Standardize malformed row handling in Python engine #15913

Merged

gfyoung added a commit to forking-repos/pandas that referenced this issue Apr 6, 2017

BUG: Standardize malformed row handling in Python engine

884387e

Closes pandas-devgh-15910.

jreback modified the milestones: 0.20.0, Next Major Release Apr 6, 2017

jreback closed this as completed in #15913 Apr 6, 2017

jreback pushed a commit that referenced this issue Apr 6, 2017

BUG: Standardize malformed row handling in Python engine (#15913)

a0b089e

Closes gh-15910.

gfyoung mentioned this issue Apr 6, 2017

ENH: Support malformed row handling in Python engine #15925

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: read_csv skipfooter fails with invalid quoted line #15910

BUG: read_csv skipfooter fails with invalid quoted line #15910

chris-b1 commented Apr 5, 2017 •

edited

Loading

chris-b1 commented Apr 5, 2017

gfyoung commented Apr 5, 2017

chris-b1 commented Apr 5, 2017

gfyoung commented Apr 5, 2017

gfyoung commented Apr 5, 2017 •

edited

Loading

gfyoung commented Apr 5, 2017 •

edited

Loading

gfyoung commented Apr 5, 2017

chris-b1 commented Apr 5, 2017

gfyoung commented Apr 5, 2017 •

edited

Loading

chris-b1 commented Apr 5, 2017

gfyoung commented Apr 5, 2017

gfyoung commented Apr 6, 2017 •

edited

Loading

shivampatel16 commented Sep 3, 2019

BUG: read_csv skipfooter fails with invalid quoted line #15910

BUG: read_csv skipfooter fails with invalid quoted line #15910

Comments

chris-b1 commented Apr 5, 2017 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

chris-b1 commented Apr 5, 2017

gfyoung commented Apr 5, 2017

chris-b1 commented Apr 5, 2017

gfyoung commented Apr 5, 2017

gfyoung commented Apr 5, 2017 • edited Loading

gfyoung commented Apr 5, 2017 • edited Loading

gfyoung commented Apr 5, 2017

chris-b1 commented Apr 5, 2017

gfyoung commented Apr 5, 2017 • edited Loading

chris-b1 commented Apr 5, 2017

gfyoung commented Apr 5, 2017

gfyoung commented Apr 6, 2017 • edited Loading

shivampatel16 commented Sep 3, 2019

chris-b1 commented Apr 5, 2017 •

edited

Loading

gfyoung commented Apr 5, 2017 •

edited

Loading

gfyoung commented Apr 5, 2017 •

edited

Loading

gfyoung commented Apr 5, 2017 •

edited

Loading

gfyoung commented Apr 6, 2017 •

edited

Loading