-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Skipfooter disables decimal parameter #6971
Comments
cc @mcwitt can u take a look? |
Hmm, since #6889 specifying In [3]: data = 'a;b;c\n1,1;2,2;3,3\n4;5;6\n7;8;9'
In [4]: pd.read_csv(StringIO(data), sep=';', decimal=',', skip_footer=True)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
. . .
ValueError: Falling back to the 'python' engine because the 'c' engine does not support skip_footer, but this causes 'decimal' to be ignored as it is not supported by the 'python' engine. Currently neither of the parser engines can handle this combination, since the C engine can't handle @jreback maybe I can look at adding support for |
ok gr8 so will convert this to an issue adding decimal to python parser (unless that issue already exists?) |
ok, so the error is propogated nicely in 0.14 / leaving as a bug open for 0.15 @mcwitt if you have time would be great |
To be precise: What is the official version of the parameter? skipfooter (as in 1) or skip_footer (as in 2) Since no easy solution to the inital problem is at hand I'd make a suggestion: Drop skipfooter in favour of an enhanced version of skiprows. As a novice to pandas I expected the skipfooter functionality implemented in skiprows because a) it's the more generic term and b) it already accepts a list of lines. Intuitively I searched for something pythonic like skiprows=-2 instead of skipfooter=2. skiprows=[6,-2] would then skip 6 lines on top and 2 from the bottom. 1: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html |
@jreback sure, I will look at this.
Looking back through some old issues (e.g. #1948) it looks like the alias
Hmm, this sounds like an elegant solution but I don't think it would cover all use cases: with the current convention we'd expect |
Sadly true. Since a construction with ranges becomes very fast very ugly (e.g. range(1,7,1), range(-1,-3,-1)) I'm wondering whether a tuple could be a solution: skiprows=(6,-2) |
Is it possible this actually fixed in master in the meantime? If I run the example from above, I get the correct float dtype |
Ah, good point! Seems like the C engine got smarter with parsing since then. 😄 I guess we can close this? |
Any idea if there is already a test for this? Otherwise can close this by adding a test. |
There is one now! |
I ran into a bug in the read_csv importer when trying to read in a file with
a European style decimal encoding (e.g. 8.1 -> 8,2). Setting the decimal-parameter
appropriately should make this easy but in my case pandas refused to accepts any different data type than a simple object.
After a few attempts with various files and snipplets of code I nailed down the problem to the skipfooter parameter. As far as I can judge skipfooter causes the decimal parameter to be ignored. Take the following example:
In [44]:
data = 'a;b;c\n1,1;2,2;3,3\n4;5;6\n7;8;9'
data
Out[44]:
'a;b;c\n1,1;2,2;3,3\n4;5;6\n7;8;9'
In [45]:
df = pd.read_csv(io.StringIO(data), sep=";",decimal=",",dtype=np.float64)
df
Out[45]:
a b c
0 1.1 2.2 3.3
1 4.0 5.0 6.0
2 7.0 8.0 9.0
3 rows × 3 columns
In [46]:
df.dtypes
Out[46]:
a float64
b float64
c float64
dtype: object
Perfect - the behaviour I expected. Now let’s add as single line a an arbitrary footer and ignore this line in the import.
In [47]:
data = data+'\nFooter'
data
Out[47]:
'a;b;c\n1,1;2,2;3,3\n4;5;6\n7;8;9\nFooter'
In [48]:
df = pd.read_csv(io.StringIO(data), sep=";",decimal=",",dtype=np.float64,skipfooter=1)
df
Out[48]:
a b c
0 1,1 2,2 3,3
1 4 5 6
2 7 8 9
3 rows × 3 columns
In [49]:
df.dtypes
Out[49]:
a object
b object
c object
dtype: object
Now all data type information is lost supposingly because the conversion from the comma-separated to the dot-separated values failed. Adding an additional converter to the import (converters={'Rate': lambda x: float(x.replace('.','').replace(',','.'))}) fixes the problem and makes it more likely that the skipfooter routine is faulty.
System: iPython 2.0.0, Python 3.3.5, pandas 0.13.0
The text was updated successfully, but these errors were encountered: