API: add dtype= option to python parser #14295

chris-b1 · 2016-09-24T18:12:19Z

part of ENH/DOC/CLN: Document arguments and reconcile C and Python engines for read_csv #12686
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

Ultimately I'm working towards #8212 (types in excel parser), which should be pretty straightforward after this.

Right now the tests are moved from c_parser_only.py, may need to add some more
cc @gfyoung

codecov-io · 2016-09-25T14:10:34Z

Current coverage is 85.22% (diff: 100%)

Merging #14295 into master will increase coverage by 0.01%

@@             master     #14295   diff @@
==========================================
  Files           143        143          
  Lines         50804      50833    +29   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43292      43324    +32   
+ Misses         7512       7509     -3   
  Partials          0          0

Powered by Codecov. Last update 75b606a...3abb0bd

jorisvandenbossche · 2016-10-27T14:20:36Z

pandas/io/parsers.py

            result[c] = cvals
            if verbose and na_count:
                print('Filled %d NA values in column %s' % (na_count, str(c)))
        return result

-    def _convert_types(self, values, na_values, try_num_bool=True):
+    def _infer_types(self, values, na_values, try_num_bool=True):


While you are at it, can you add a docstring here?

jorisvandenbossche · 2016-10-27T14:22:57Z

pandas/io/parsers.py

-    not interpret dtype.
+    Use `str` or `object` to preserve and not interpret dtype.
+    If converters are specified, they will be applied AFTER
+    dtype conversion.


Is there a test that exercises this assumption?

I was wrong, the c-parser actually ignores the dtype and just uses the converter - changed docstring and added matching test.

jorisvandenbossche

Looks nice to me! (added two minor comments)

gfyoung · 2016-10-27T14:27:41Z

pandas/io/parsers.py

+                    values, conv_f, na_values,
+                    col_na_values, col_na_fvalues)
+            else:
+                try_num_bool = True


Why not just this:

try_num_bool = cast_type and is_object_dtype(cast_type)

gfyoung · 2016-10-27T14:41:30Z

pandas/io/parsers.py

@@ -1355,6 +1392,23 @@ def _convert_types(self, values, na_values, try_num_bool=True):

        return result, na_count

+    def _cast_types(self, values, cast_type, column):
+        """ cast column to type specified in dtypes= param """


While I know this isn't done too much for internal functions, it would be good to start documenting these functions more thoroughly for development purposes.

While I know this isn't done too much for internal functions,

But we should do it more! (therefore my comment above) It makes familiarizing yourself with a module a lot easier.
BTW, a PR to go through a file and document some of the functions (eg things you haven been working on in recent PRs and so you know better) is always very welcome :-)

jorisvandenbossche · 2016-11-02T09:48:54Z

@chris-b1 xref #14558. You say that dtype is ignored when converters is specified. In that case it would be nice to raise a warning about this if both are specified I think.

jorisvandenbossche

Two small comments, looks good to me!

jorisvandenbossche · 2016-11-07T21:04:37Z

doc/source/whatsnew/v0.20.0.txt


+.. ipython:: python
+
+   from io import StringIO


this will not work on python 2. Should check how it is done in other places in the docs

from pandas.compat import StringIO is imported in a suppressed ipython code block at the top of the file, so you can just use StringIO

For the record, the io module was backported to 2.6, but I will remove this, as it is already imported like you mentioned.

@chris-b1 That's true, but then the input must be unicode strings, not plain python 2 strings (which I think is not the case in the docs, but not sure)

jorisvandenbossche · 2016-11-07T21:08:58Z

pandas/io/parsers.py

+    If converters are specified, they will be applied INSTEAD
+    of dtype conversion.
+
+  .. versionadded:: 0.20.0 support for the Python parser.


I personally would leave this out of the docstring (this is already so long ..), but not strong feelings, your take

chris-b1 · 2016-11-13T15:47:40Z

I think I've got all the feedback so far worked in - any more comments? @jorisvandenbossche, @gfyoung, @jreback

gfyoung

LGTM!

jorisvandenbossche

Looks good to me as well

jreback · 2016-11-15T22:23:07Z

doc/source/io.rst

-    Specifying ``dtype`` with ``engine`` other than 'c' raises a
-    ``ValueError``.
+  .. versionadded:: 0.20.0 support for the Python parser.
+     The ``dtype`` option is supported by the 'python' engine


I think you need a blank line here (to avoid warnings)

jreback · 2016-11-15T22:23:21Z

doc/source/whatsnew/v0.20.0.txt

@@ -32,6 +32,14 @@ Other enhancements

 - ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)



i would make a sub-section

jreback · 2016-11-15T22:25:07Z

pandas/parser.pyx

            if conv:
+                if col_dtype is not None:
+                    warnings.warn(("Both a converter and dtype were specified "


is there a test for this? (IOW, this was prob tested for c-engine, but now need to test for all)

yes, this was added here:e0d1606

gfyoung · 2016-11-22T20:22:47Z

@chris-b1 : I'm putting together a PR to patch #14712. As this directly relates to your PR, I would suggest waiting for mine to make sure the bug isn't an issue for you.

jorisvandenbossche · 2016-11-22T21:12:39Z

@chris-b1 can you rebase this? Then this can be merged I think (depending on the status of @gfyoung new PR)

jorisvandenbossche · 2016-11-24T21:27:31Z

@chris-b1 I merged #14717. This added a dtype test to c_parser_only, so added a commit to move this to the general dtype tests as well. Feel free to merge if you're ok with this and the tests are green.

jreback · 2016-11-25T13:28:08Z

doc/source/whatsnew/v0.20.0.txt


+The ``dtype`` keyword argument in the :func:`read_csv` function for specifying the types of parsed columns
+ is now supported with the ``'python'`` engine.  See the :ref:`io docs <io.dtypes>` for more information.


add the issue references here

…into textreader-dtype Conflicts: pandas/io/tests/parser/dtypes.py

jorisvandenbossche · 2016-11-26T09:14:54Z

@chris-b1 Thanks!

jreback added IO CSV read_csv, to_csv Dtype Conversions Unexpected or buggy dtype conversions labels Sep 26, 2016

jorisvandenbossche reviewed Oct 27, 2016

View reviewed changes

gfyoung reviewed Oct 27, 2016

View reviewed changes

chris-b1 force-pushed the textreader-dtype branch from de4660d to ab7e1e8 Compare October 30, 2016 21:40

jorisvandenbossche mentioned this pull request Nov 2, 2016

read_table with dtype=object and an int converter still returns float64 if NaN present #14558

Closed

jorisvandenbossche reviewed Nov 7, 2016

View reviewed changes

jorisvandenbossche added this to the 0.20.0 milestone Nov 11, 2016

chris-b1 force-pushed the textreader-dtype branch from 8153feb to 035c921 Compare November 13, 2016 15:21

chris-b1 changed the title ~~API: add dtype= option to python parser (WIP)~~ API: add dtype= option to python parser Nov 13, 2016

gfyoung approved these changes Nov 14, 2016

View reviewed changes

jorisvandenbossche approved these changes Nov 15, 2016

View reviewed changes

jreback reviewed Nov 15, 2016

View reviewed changes

gfyoung mentioned this pull request Nov 22, 2016

BUG: Respect the dtype parameter for empty CSV #14717

Merged

chris-b1 added 6 commits November 23, 2016 08:32

API: add dtype= option to python parser

960441a

remove unsupported test

7be7b42

add test/fix for dtype=object

65a94ae

float precision...

6853587

float precision fix

3024177

add docs; test for conv cast

f9ff10e

lint

7c703fe

jorisvandenbossche modified the milestones: 0.19.2, 0.20.0 Nov 24, 2016

chris-b1 and others added 11 commits November 24, 2016 22:21

API: add dtype= option to python parser

d790bdf

remove unsupported test

5462774

add test/fix for dtype=object

64c7214

float precision...

26f42c2

float precision fix

7fbe0a3

add docs; test for conv cast

08315b8

Add warning if both converter and dtype specified

810e750

doc comments

10f5be3

doc updates

b2f7b94

lint

be2b43b

TST: move empty dtype tests from c_parser_only to dtype tests

47669d3

jorisvandenbossche force-pushed the textreader-dtype branch from 7c703fe to 47669d3 Compare November 24, 2016 21:25

jreback reviewed Nov 25, 2016

View reviewed changes

chris-b1 and others added 3 commits November 25, 2016 09:14

Merge branch 'textreader-dtype' of https://github.com/chris-b1/pandas …

34e3a96

…into textreader-dtype Conflicts: pandas/io/tests/parser/dtypes.py

issue ref

1706b39

fix merge conflict leftover

3abb0bd

jorisvandenbossche merged commit 75bb530 into pandas-dev:master Nov 26, 2016

kawochen mentioned this pull request Nov 26, 2016

ENH/DOC/CLN: Document arguments and reconcile C and Python engines for read_csv #12686

Open

22 tasks

gfyoung mentioned this pull request Nov 26, 2016

read_fwf does not support dtype argument #7141

Closed

chris-b1 deleted the textreader-dtype branch November 30, 2016 01:00

chris-b1 mentioned this pull request Nov 30, 2016

DOC/TST: dtype param in read_fwf #14768

Merged

4 tasks

jorisvandenbossche mentioned this pull request Oct 16, 2017

No way to force read numerics as string in read_html #10534

Open

SandroCasagrande mentioned this pull request Jul 27, 2022

DOC: Minor fixes in the IO user guide #47875

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: add dtype= option to python parser #14295

API: add dtype= option to python parser #14295

chris-b1 commented Sep 24, 2016 •

edited

Loading

codecov-io commented Sep 25, 2016 •

edited

Loading

jorisvandenbossche Oct 27, 2016

jorisvandenbossche Oct 27, 2016

chris-b1 Oct 30, 2016

jorisvandenbossche left a comment

gfyoung Oct 27, 2016

gfyoung Oct 27, 2016

jorisvandenbossche Oct 27, 2016

jorisvandenbossche commented Nov 2, 2016

jorisvandenbossche left a comment

jorisvandenbossche Nov 7, 2016

jorisvandenbossche Nov 7, 2016

chris-b1 Nov 13, 2016

jorisvandenbossche Nov 14, 2016

jorisvandenbossche Nov 7, 2016

chris-b1 commented Nov 13, 2016

gfyoung left a comment

jorisvandenbossche left a comment

jreback Nov 15, 2016

jreback Nov 15, 2016

jreback Nov 15, 2016

jorisvandenbossche Nov 22, 2016

gfyoung commented Nov 22, 2016

jorisvandenbossche commented Nov 22, 2016

jorisvandenbossche commented Nov 24, 2016

jreback Nov 25, 2016

jorisvandenbossche commented Nov 26, 2016

		@@ -32,6 +32,14 @@ Other enhancements

		- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)


		The ``dtype`` keyword argument in the :func:`read_csv` function for specifying the types of parsed columns
		is now supported with the ``'python'`` engine. See the :ref:`io docs <io.dtypes>` for more information.

API: add dtype= option to python parser #14295

API: add dtype= option to python parser #14295

Conversation

chris-b1 commented Sep 24, 2016 • edited Loading

codecov-io commented Sep 25, 2016 • edited Loading

Current coverage is 85.22% (diff: 100%)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Nov 2, 2016

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chris-b1 commented Nov 13, 2016

gfyoung left a comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung commented Nov 22, 2016

jorisvandenbossche commented Nov 22, 2016

jorisvandenbossche commented Nov 24, 2016

Choose a reason for hiding this comment

jorisvandenbossche commented Nov 26, 2016

chris-b1 commented Sep 24, 2016 •

edited

Loading

codecov-io commented Sep 25, 2016 •

edited

Loading