-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] na_values
should also support string in read_csv
#6677
Comments
@galipremsagar |
Yep, only lists seem to work. >>> new_df = pd.read_csv(io.StringIO(csv_buf), na_values=["_NA_"])
>>> new_df
Unnamed: 0 a
0 0 0.4334
1 1 NaN
2 2 1.0000
3 3 2.0000
>>> new_gdf = cudf.read_csv(io.StringIO(csv_buf), na_values=["_NA_"])
>>> new_gdf
Unnamed: 0 a
0 0 0.4334
1 1 <NA>
2 2 1.0
3 3 2.0 I'll make a fix to support strings in |
na_values
are being preserved while parsing a csv filena_values
should also support string in read_csv
@galipremsagar Thank you! Please include the fix for #6678 in the PR, it's a very small change. |
Sure |
Describe the bug
In csv reader
na_values
parameter is provided for a flexibility to indicate which values to be treated as null values. These values if exist in the data should not be carried over to the actual data in the dataframe.Steps/Code to reproduce bug
Expected behavior
Notice the
new_df['a']
how thena_values
are removed during parsing and those occurrences are replaced with null/nan values. We should be performing the same instead of continuing to preservena_values
like innew_gdf['a']
.Environment overview (please complete the following information)
Environment details
Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsClick here to see environment details
Additional context
Surfaced while running fuzz tests #6001
The text was updated successfully, but these errors were encountered: