-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change from of 'c' to 'python' engine in read_csv gives unexpected side-effect #21131
Comments
Is the issue just that the Python engine returns a string representation of NaN whereas the C parser returns the actually NaN value? If so, can you update your example to reflect that? |
Yes, that's the issue. The example has been updated. |
This looks to work on master now. Could use a test.
|
@mroeschke Hey, this is my first time contributing to pandas. The only thing left to do in this is writing a test, right? |
@aditya-hari correct. |
@mroeschke Would it be enough for the test to assert that the output using both engines is the same? And where will this test go? |
Best to construct the result manually The test should go in a file in |
Code Sample
Problem description
Reading an empty field from a csv-file returns NaN (dtype float) if read with the 'c' engine and 'nan' (string representation) if read with the 'python' engine.
In the example above the digits that shall be treated as strings and lines with empty data-fields shall be dropped. This works fine with the above code snippet if the 'c' engine is used. However if the 'python' engine is used the 'nan' fields are not dropped. This may break working code if one needs to switch from the 'c' engine to the 'python' engine.
Expected Output
It is expected that the two engines gives consistent output.
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: 3.5.1
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.3.1
sphinx: None
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: