Support separate null_values per column in pyarrow.csv.ConvertOptions #34637

paiforsyth · 2023-03-19T19:51:41Z

Describe the enhancement requested

I have a csv dataset in which nulls are encoded differently in different columns. It looks like when reading csv data with pyarrow, the same list of null_values must be used for all columns (see ConvertOptions). This concerns me because a value used as a null code in one column ("9999" for example) may be a valid non-null value in another column. In pandas's read_csv, it is possible to pass a dictionary specifying different null codes for different columns. Could this functionality be added to pyarrow?

Component(s)

Other

paiforsyth added the Type: enhancement label Mar 19, 2023

github-actions bot added the Component: Other label Mar 19, 2023

jorisvandenbossche added Component: C++ and removed Component: Other labels Apr 5, 2023

jorisvandenbossche added the Component: Python label Apr 20, 2023

roll mentioned this issue Jan 4, 2024

Promote the Missing Values per Field pattern to the Table Schema spec? frictionlessdata/datapackage#861

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support separate null_values per column in pyarrow.csv.ConvertOptions #34637

Support separate null_values per column in pyarrow.csv.ConvertOptions #34637

paiforsyth commented Mar 19, 2023

Support separate null_values per column in pyarrow.csv.ConvertOptions #34637

Support separate null_values per column in pyarrow.csv.ConvertOptions #34637

Comments

paiforsyth commented Mar 19, 2023

Describe the enhancement requested

Component(s)