BUG: read_csv not converting to float for python engine with decimal sep, usecols and parse_dates #38334

phofl · 2020-12-06T20:52:47Z

closes BUG: PythonParser does not use decimal separator when usecols and parse_date are specified #35873
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

self.columns is alread resticted to usecols, while data is not restricted yet, so we have use the initial column indices

…sep, usecols and parse_dates

jreback · 2020-12-07T13:41:21Z

pandas/io/parsers.py

@@ -2354,12 +2354,16 @@ def _set_no_thousands_columns(self):
        # Create a set of column ids that are not to be stripped of thousands
        # operators.
        noconvert_columns = set()
+        if self._col_indices is not None:


sort _col_indices when its created. I would try actually just setting it to the list(range(len(self.columns))) otherwise when its set (rather than doing it here).

Moved it up and sorted when set immediately

jreback · 2020-12-08T21:14:34Z

ci/checks is failing

phofl · 2020-12-08T22:29:01Z

Fixed the mypy issues. If we do it this way, we are more type safe than before

ivanovmg

I suggest removing mypy-related comments.

ivanovmg · 2020-12-09T15:43:46Z

pandas/io/parsers.py

                        # pandas\io\parsers.py:3159: error: Unsupported right
                        # operand type for in ("Optional[Any]")  [operator]
-                        or i - len(self.index_col)  # type: ignore[operator]
-                        in self._col_indices
+                        or i - len(self.index_col) in self._col_indices


Since you do not ignore mypy checking anymore, then the comments above are irrelevant.
Maybe xref #37715?

Thx, that is a good point.

ivanovmg · 2020-12-09T15:44:32Z

pandas/io/parsers.py

@@ -3203,7 +3203,7 @@ def _rows_to_cols(self, content):
                    # operand type for in ("Optional[Any]")  [operator]
                    a
                    for i, a in enumerate(zipped_content)
-                    if i in self._col_indices  # type: ignore[operator]
+                    if i in self._col_indices


Same here (comment above about mypy error).

mroeschke · 2020-12-10T00:08:45Z

doc/source/whatsnew/v1.2.0.rst

@@ -740,6 +740,7 @@ I/O
 - Bug in :meth:`DataFrame.to_hdf` was not dropping missing rows with ``dropna=True`` (:issue:`35719`)
 - Bug in :func:`read_html` was raising a ``TypeError`` when supplying a ``pathlib.Path`` argument to the ``io`` parameter (:issue:`37705`)
 - :meth:`DataFrame.to_excel`, :meth:`Series.to_excel`, :meth:`DataFrame.to_markdown`, and :meth:`Series.to_markdown` now support writing to fsspec URLs such as S3 and Google Cloud Storage (:issue:`33987`)
+- Bug in :meth:`read_csv` returning object dtype when ``delimiter=","`` with ``usecols`` and ``parse_dates`` specified for ``engine="python"`` (:issue:`35873`)


This will need to be removed off of 1.2

Hm weird, must have missed that- Thx

jreback · 2020-12-14T14:03:35Z

pandas/io/parsers.py

@@ -2336,6 +2335,9 @@ def __init__(self, f: Union[FilePathOrBuffer, List], **kwds):
            if self.index_names is None:
                self.index_names = index_names

+        if not hasattr(self, "_col_indices"):


Why can't we always define this way on L2310 or L2297?

We do not have acces to self.columns in L2297. Sometimes we set this in _infer_columns in L2302, so we have to check if it was already set to not override. Could move it up a bit but would have to check if it exists nevertheless

can you instead define _col_list = None then as the default

This would work of course, but we would get the mypy problems in again. Could do that nevertheless if this is preferable

yes i think its important that this is always defined. you can use Optional[List[int]], then you assert its not None

Optional[List[int]] unfortunately raises the mypy error too. assert is not None does not help, so added the ignores back in

jreback · 2020-12-14T14:05:14Z

pandas/tests/io/parser/test_python_parser_only.py

@@ -314,3 +314,19 @@ def test_malformed_skipfooter(python_parser_only):
    msg = "Expected 3 fields in line 4, saw 5"
    with pytest.raises(ParserError, match=msg):
        parser.read_csv(StringIO(data), header=1, comment="#", skipfooter=1)
+
+
+def test_delimiter_with_usecols_and_parse_dates(python_parser_only):


doesn't this not work in c-parser? why?

Works for c too, not quite sure why I tested only for python. Moved it

jreback · 2020-12-29T17:23:25Z

this looked good last time, can you merge master

phofl · 2020-12-29T20:53:09Z

Done

jreback · 2020-12-29T20:55:15Z

pandas/io/parsers.py

@@ -2336,6 +2335,9 @@ def __init__(self, f: Union[FilePathOrBuffer, List], **kwds):
            if self.index_names is None:
                self.index_names = index_names

+        if not hasattr(self, "_col_indices"):


can you instead define _col_list = None then as the default

jreback · 2020-12-29T23:24:37Z

pandas/io/parsers.py

@@ -2336,6 +2335,9 @@ def __init__(self, f: Union[FilePathOrBuffer, List], **kwds):
            if self.index_names is None:
                self.index_names = index_names

+        if not hasattr(self, "_col_indices"):


yes i think its important that this is always defined. you can use Optional[List[int]], then you assert its not None

� Conflicts: � pandas/tests/io/parser/test_dtypes.py

jreback · 2021-01-03T16:38:17Z

pandas/io/parsers.py

@@ -2358,7 +2361,11 @@ def _set(x):
            if is_integer(x):
                noconvert_columns.add(x)
            else:
-                noconvert_columns.add(self.columns.index(x))
+                # pandas\io\parsers.py:2366: error: Unsupported right
+                # operand type for in ("Optional[List[int]")  [index]


you need to assert that the result values is not None (assign it to a new variable first).

Thanks very much. That helped me a lot.

jreback · 2021-01-03T16:38:27Z

pandas/io/parsers.py

@@ -3186,16 +3192,16 @@ def _rows_to_cols(self, content):
                    for i, a in enumerate(zipped_content)
                    if (
                        i < len(self.index_col)
-                        # pandas\io\parsers.py:3159: error: Unsupported right
-                        # operand type for in ("Optional[Any]")  [operator]
+                        # pandas\io\parsers.py:3198: error: Unsupported right


jreback · 2021-01-03T21:14:26Z

pandas/tests/io/parser/usecols/init.py

seems to have gotten commited?

phofl · 2021-01-03T21:35:34Z

These folders were missing init files, so I have added them. Shall I remove them?

jreback · 2021-01-03T23:25:44Z

These folders were missing init files, so I have added them. Shall I remove them?

nope this is fine, didnt' realize they weren't there

…sep, usecols and parse_dates (pandas-dev#38334)

BUG: read_csv not converting to float for python engine with decimal …

8c2e1ca

…sep, usecols and parse_dates

phofl added the IO CSV read_csv, to_csv label Dec 6, 2020

mroeschke approved these changes Dec 6, 2020

View reviewed changes

mroeschke added this to the 1.2 milestone Dec 6, 2020

jreback requested changes Dec 7, 2020

View reviewed changes

jreback removed this from the 1.2 milestone Dec 7, 2020

Move sorted

c1b9a7b

Fix mypy issues

76b91bf

phofl added 3 commits December 8, 2020 23:29

Run black

9de5059

Merge branch 'master' of https://github.com/pandas-dev/pandas into 35873

ca76832

Move whatsnew

85a3d22

ivanovmg reviewed Dec 9, 2020

View reviewed changes

Remove comment

2958d2a

phofl mentioned this pull request Dec 9, 2020

TYP: investigate/fix ignored mypy errors #37715

Closed

mroeschke reviewed Dec 10, 2020

View reviewed changes

phofl added 2 commits December 11, 2020 18:31

Merge branch 'master' of https://github.com/pandas-dev/pandas into 35873

1d740e0

Remove from 1.2

88bf395

mroeschke approved these changes Dec 11, 2020

View reviewed changes

phofl added 2 commits December 13, 2020 18:18

Merge branch 'master' into 35873

6ad5385

Merge branch 'master' into 35873

da4b602

jreback added this to the 1.3 milestone Dec 14, 2020

jreback added the Bug label Dec 14, 2020

jreback requested changes Dec 14, 2020

View reviewed changes

phofl added 3 commits December 15, 2020 21:32

Move test

384c114

Merge branch '35873' of https://github.com/phofl/pandas into 35873

070f67d

Remove import

70780da

Merge branch 'master' of https://github.com/pandas-dev/pandas into 35873

9d6205a

jreback requested changes Dec 29, 2020

View reviewed changes

phofl added 5 commits January 2, 2021 00:13

Always define self._col_indices

4afa2c8

Move comments

5bee24a

Merge branch 'master' of https://github.com/pandas-dev/pandas into 35873

d96a256

� Conflicts: � pandas/tests/io/parser/test_dtypes.py

Add init files

c6a226b

Fix line numbers

3ece01b

jreback requested changes Jan 3, 2021

View reviewed changes

Remove mypy ignores

6cca960

jreback approved these changes Jan 3, 2021

View reviewed changes

jreback merged commit fb47c75 into pandas-dev:master Jan 3, 2021

phofl deleted the 35873 branch January 3, 2021 23:28

luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021

BUG: read_csv not converting to float for python engine with decimal …

9eccde2

…sep, usecols and parse_dates (pandas-dev#38334)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: read_csv not converting to float for python engine with decimal sep, usecols and parse_dates #38334

BUG: read_csv not converting to float for python engine with decimal sep, usecols and parse_dates #38334

phofl commented Dec 6, 2020 •

edited

Loading

jreback Dec 7, 2020

phofl Dec 7, 2020

jreback commented Dec 8, 2020

phofl commented Dec 8, 2020

ivanovmg left a comment

ivanovmg Dec 9, 2020

phofl Dec 9, 2020

ivanovmg Dec 9, 2020

mroeschke Dec 10, 2020

phofl Dec 11, 2020

jreback Dec 14, 2020

phofl Dec 15, 2020

jreback Dec 29, 2020

phofl Dec 29, 2020

jreback Dec 29, 2020

phofl Jan 1, 2021

jreback Dec 14, 2020

phofl Dec 15, 2020

jreback commented Dec 29, 2020

phofl commented Dec 29, 2020

jreback Dec 29, 2020

jreback Dec 29, 2020

jreback Jan 3, 2021

phofl Jan 3, 2021

jreback Jan 3, 2021

phofl Jan 3, 2021

jreback commented Jan 3, 2021

phofl commented Jan 3, 2021

jreback commented Jan 3, 2021

BUG: read_csv not converting to float for python engine with decimal sep, usecols and parse_dates #38334

BUG: read_csv not converting to float for python engine with decimal sep, usecols and parse_dates #38334

Conversation

phofl commented Dec 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 8, 2020

phofl commented Dec 8, 2020

ivanovmg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 29, 2020

phofl commented Dec 29, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jan 3, 2021

phofl commented Jan 3, 2021

jreback commented Jan 3, 2021

phofl commented Dec 6, 2020 •

edited

Loading