BUG: Skipfooter disables decimal parameter #6971

GHPS · 2014-04-26T12:29:18Z

I ran into a bug in the read_csv importer when trying to read in a file with
a European style decimal encoding (e.g. 8.1 -> 8,2). Setting the decimal-parameter
appropriately should make this easy but in my case pandas refused to accepts any different data type than a simple object.

After a few attempts with various files and snipplets of code I nailed down the problem to the skipfooter parameter. As far as I can judge skipfooter causes the decimal parameter to be ignored. Take the following example:

In [44]:
data = 'a;b;c\n1,1;2,2;3,3\n4;5;6\n7;8;9'
data

Out[44]:
'a;b;c\n1,1;2,2;3,3\n4;5;6\n7;8;9'

In [45]:
df = pd.read_csv(io.StringIO(data), sep=";",decimal=",",dtype=np.float64)
df

Out[45]:
a b c
0 1.1 2.2 3.3
1 4.0 5.0 6.0
2 7.0 8.0 9.0

3 rows × 3 columns
In [46]:
df.dtypes

Out[46]:
a float64
b float64
c float64
dtype: object

Perfect - the behaviour I expected. Now let’s add as single line a an arbitrary footer and ignore this line in the import.

In [47]:
data = data+'\nFooter'
data

Out[47]:
'a;b;c\n1,1;2,2;3,3\n4;5;6\n7;8;9\nFooter'

In [48]:
df = pd.read_csv(io.StringIO(data), sep=";",decimal=",",dtype=np.float64,skipfooter=1)
df

Out[48]:
a b c
0 1,1 2,2 3,3
1 4 5 6
2 7 8 9
3 rows × 3 columns

In [49]:
df.dtypes

Out[49]:
a object
b object
c object
dtype: object

Now all data type information is lost supposingly because the conversion from the comma-separated to the dot-separated values failed. Adding an additional converter to the import (converters={'Rate': lambda x: float(x.replace('.','').replace(',','.'))}) fixes the problem and makes it more likely that the skipfooter routine is faulty.

System: iPython 2.0.0, Python 3.3.5, pandas 0.13.0

jreback · 2014-04-26T12:55:01Z

cc @mcwitt

can u take a look?

mcwitt · 2014-04-26T15:54:51Z

Hmm, since #6889 specifying decimal with skip_footer should raise:

In [3]: data = 'a;b;c\n1,1;2,2;3,3\n4;5;6\n7;8;9'

In [4]: pd.read_csv(StringIO(data), sep=';', decimal=',', skip_footer=True)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
. . .
ValueError: Falling back to the 'python' engine because the 'c' engine does not support skip_footer, but this causes 'decimal' to be ignored as it is not supported by the 'python' engine.

Currently neither of the parser engines can handle this combination, since the C engine can't handle skip_footer and the python engine can't handle decimal.

@jreback maybe I can look at adding support for decimal to PythonParser? I'm not familiar enough with the C engine to say if implementing skip_footer there would be easy...

jreback · 2014-04-26T20:07:23Z

ok gr8 so will convert this to an issue adding decimal to python parser (unless that issue already exists?)

jreback · 2014-04-27T13:41:00Z

ok, so the error is propogated nicely in 0.14 / leaving as a bug open for 0.15

@mcwitt if you have time would be great

GHPS · 2014-04-27T14:18:45Z

Currently neither of the parser engines can handle this combination, since the C engine can't handle
skip_footer and the python engine can't handle decimal.

To be precise: What is the official version of the parameter? skipfooter (as in 1) or skip_footer (as in 2)

Since no easy solution to the inital problem is at hand I'd make a suggestion: Drop skipfooter in favour of an enhanced version of skiprows. As a novice to pandas I expected the skipfooter functionality implemented in skiprows because a) it's the more generic term and b) it already accepts a list of lines. Intuitively I searched for something pythonic like skiprows=-2 instead of skipfooter=2. skiprows=[6,-2] would then skip 6 lines on top and 2 from the bottom.

1: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html
2: http://pandas.pydata.org/pandas-docs/dev/io.html#io-read-csv-table

mcwitt · 2014-04-27T18:51:58Z

@jreback sure, I will look at this.

To be precise: What is the official version of the parameter? skipfooter (as in 1) or skip_footer (as in 2)

Looking back through some old issues (e.g. #1948) it looks like the alias skipfooter was added for consistency with skiprows. The docstring was updated to use skipfooter, but io.rst still needs to be updated.

Drop skipfooter in favour of an enhanced version of skiprows.

Hmm, this sounds like an elegant solution but I don't think it would cover all use cases: with the current convention we'd expect skiprows=[6,-2] to skip the 6th row and the 2nd row from the end (only 2 rows). I suppose we could make skiprows=-n skip the last n rows, but that would use up the skiprows argument so we couldn't easily skip a header as well...

GHPS · 2014-05-07T20:11:11Z

Hmm, this sounds like an elegant solution but I don't think it would cover all use cases: with the current
convention we'd expect skiprows=[6,-2] to skip the 6th row and the 2nd row from the end (only 2 rows).

Sadly true. Since a construction with ranges becomes very fast very ugly (e.g. range(1,7,1), range(-1,-3,-1)) I'm wondering whether a tuple could be a solution: skiprows=(6,-2)

gfyoung · 2016-07-26T04:35:50Z

@jreback : With the master tracker from @kawochen in place to keep an eye on these compatibility issues between the C and Python engines, this seems like a dupe to me now.

jorisvandenbossche · 2016-07-26T08:29:50Z

Is it possible this actually fixed in master in the meantime? If I run the example from above, I get the correct float dtype

gfyoung · 2016-07-26T08:38:58Z

Ah, good point! Seems like the C engine got smarter with parsing since then. 😄

I guess we can close this?

jorisvandenbossche · 2016-07-26T08:42:25Z

Any idea if there is already a test for this? Otherwise can close this by adding a test.

gfyoung · 2016-07-26T08:54:15Z

There is one now!

jreback added Bug labels Apr 27, 2014

jreback added this to the 0.15.0 milestone Apr 27, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015

gfyoung mentioned this issue Jul 26, 2016

TST: Add test for skipfooter + decimal in read_csv #13800

Merged

jorisvandenbossche modified the milestones: 0.19.0, Next Major Release Jul 26, 2016

jreback closed this as completed in #13800 Jul 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Skipfooter disables decimal parameter #6971

BUG: Skipfooter disables decimal parameter #6971

GHPS commented Apr 26, 2014

jreback commented Apr 26, 2014

mcwitt commented Apr 26, 2014

jreback commented Apr 26, 2014

jreback commented Apr 27, 2014

GHPS commented Apr 27, 2014

mcwitt commented Apr 27, 2014

GHPS commented May 7, 2014

gfyoung commented Jul 26, 2016 •

edited

Loading

jorisvandenbossche commented Jul 26, 2016

gfyoung commented Jul 26, 2016 •

edited

Loading

jorisvandenbossche commented Jul 26, 2016

gfyoung commented Jul 26, 2016

BUG: Skipfooter disables decimal parameter #6971

BUG: Skipfooter disables decimal parameter #6971

Comments

GHPS commented Apr 26, 2014

jreback commented Apr 26, 2014

mcwitt commented Apr 26, 2014

jreback commented Apr 26, 2014

jreback commented Apr 27, 2014

GHPS commented Apr 27, 2014

mcwitt commented Apr 27, 2014

GHPS commented May 7, 2014

gfyoung commented Jul 26, 2016 • edited Loading

jorisvandenbossche commented Jul 26, 2016

gfyoung commented Jul 26, 2016 • edited Loading

jorisvandenbossche commented Jul 26, 2016

gfyoung commented Jul 26, 2016

gfyoung commented Jul 26, 2016 •

edited

Loading

gfyoung commented Jul 26, 2016 •

edited

Loading