usecols keyword argument of pd.read_csv says it expects list[str] but the documentation says otherwise #605

JasonMendoza2008 · 2023-03-30T09:24:47Z

Describe the bug
usecols keyword argument of pd.read_csv says it expects list[str] but the documentation says otherwise:

Documentation (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html):

usecols list-like or callable, optional Return a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s). If names are given, the document header row(s) are not taken into account. For example, a valid list-like usecols parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']. Element order is ignored, so usecols=[0, 1] is the same as [1, 0]. To instantiate a DataFrame from data with element order preserved use pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']] for columns in ['foo', 'bar'] order or pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']] for ['bar', 'foo'] order.

If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be lambda x: x.upper() in ['AAA', 'BBB', 'DDD']. Using this parameter results in much faster parsing time and lower memory usage.

To Reproduce

Provide a minimal runnable pandas example that is not properly checked by the stubs.
import pandas as pd; path_to_csv = "mycsv.csv"; df_db = pd.read_csv(path_to_csv, usecols=[0])
Indicate which type checker you are using (mypy or pyright). PyCharm default type-checker, I haven't checked mypy.
Show the error message received from that type checker while checking your example.

Please complete the following information:

OS: Windows
OS Version 11
python version 3.11.1
version of type checker mypy 1.1.1
version of installed pandas-stubs pandas-stubs 1.5.3.230321

Additional context
Realted to this SO post.

The text was updated successfully, but these errors were encountered:

Dr-Irv · 2023-03-30T14:58:59Z

Need to change

pandas-stubs/pandas-stubs/io/parsers/readers.pyi

Lines 45 to 52 in 2e3bbe8

    
           usecols: list[str] 
        
           | tuple[str, ...] 
        
           | Sequence[int] 
        
           | Series 
        
           | Index 
        
           | npt.NDArray 
        
           | Callable[[str], bool] 
        
           | None = ...,

to change list[str] to list[HashableT] and Callable[[str], bool] to Callable[[Hashable], bool]

Also change other places in that file that have the usecols argument.

PR with tests welcome. Tests should be added near here:

pandas-stubs/tests/test_io.py

Line 518 in 2e3bbe8

df13: pd.DataFrame = pd.read_csv(path, usecols=pd.Series(data=["col1"]))

* gh-623: broaden 'names' param of read_csv Broaden the type hint for the 'names' param of read_csv (and read_table, which behaves similarly) from previous list[str], so that other valid types are accepted by mypy. * allow None as names param of read_clipboard Noticed as I found clipboard after the changes to read_csv and read_table, and it calls it, so should match - but it was missing None as an option. * broaden 'names' param of read_clipboard Match prior change to read_csv, since read_clipboard calls read_csv. * broaden 'names' param of read_excel Match prior change to read_csv, read_table, read_clipboard. * gh-605: broader usecols param type hint This fixes the pycharm tooltip problem in gh-605, as well as allowing more list-like types of strings (tuples of strings, as well as mutable sequences of strings other than list), and callables that accept hashables, not just strings. * test that read_excel accepts string for usecols * test names and usecols correctly exclude strings Strings aren't valid arguments here (except for read_excel, where we have a test now to check that this is accepted). Adding tests to make sure the type hints aren't overly wide and accept string arguments by mistake.

JasonMendoza2008 · 2023-04-07T13:31:12Z

When will the changes be made public? meaning I can do pip install -U pandas-stubs?

Dr-Irv · 2023-04-07T13:51:03Z

When will the changes be made public? meaning I can do pip install -U pandas-stubs?

Unsure at the moment. I would like the next release to support the 2.0 features, but there is work described in #624 that needs to get done.

If you can't wait for that work to get done, I believe that you could just clone the repo, switch to the main branch, set up the dev environment, do a poetry build, then you will get a wheel file in dist and you can then install from the wheel via pip install -U dist/name_of_wheel_file.whl .

Dr-Irv added Bug IO CSV read_csv, to_csv good first issue labels Mar 30, 2023

This was referenced Apr 3, 2023

names parameter of pd.read_csv restricted to list[str] even though the valid types are much broader #623

Closed

Broaden read csv param types #630

Merged

Dr-Irv closed this as completed in #630 Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

usecols keyword argument of pd.read_csv says it expects list[str] but the documentation says otherwise #605

usecols keyword argument of pd.read_csv says it expects list[str] but the documentation says otherwise #605

JasonMendoza2008 commented Mar 30, 2023 •

edited

Loading

Dr-Irv commented Mar 30, 2023 •

edited

Loading

JasonMendoza2008 commented Apr 7, 2023

Dr-Irv commented Apr 7, 2023

usecols keyword argument of pd.read_csv says it expects list[str] but the documentation says otherwise #605

usecols keyword argument of pd.read_csv says it expects list[str] but the documentation says otherwise #605

Comments

JasonMendoza2008 commented Mar 30, 2023 • edited Loading

Dr-Irv commented Mar 30, 2023 • edited Loading

JasonMendoza2008 commented Apr 7, 2023

Dr-Irv commented Apr 7, 2023

JasonMendoza2008 commented Mar 30, 2023 •

edited

Loading

Dr-Irv commented Mar 30, 2023 •

edited

Loading