-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to specify data types for a subset of columns in read_csv
#10484
Allow users to specify data types for a subset of columns in read_csv
#10484
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall! Only one minor change request.
is this intended for |
I think it's a bit late for 22.04, would prefer to leave for 22.06 unless the issue is critical. Did not get a comment from @shwina about the severity. |
Co-authored-by: Bradley Dice <[email protected]>
Apologies for missing the pings about this one. 22.06 should be fine. |
…fea-csv-allow-partial-dtype
…fea-csv-allow-partial-dtype
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10484 +/- ##
===============================================
Coverage ? 86.31%
===============================================
Files ? 140
Lines ? 22312
Branches ? 0
===============================================
Hits ? 19259
Misses ? 3053
Partials ? 0 Continue to review full report at Codecov.
|
@gpucibot merge |
Fixes #10254
CSV reader previously assumed that all data types are specified by the user, or none.
This PR changes the logic so that user can pass a map/dictionary to specify type for any subset of columns, and reader infers the type for the remaining columns.
When passing columns as an array, users still need to specify all columns' types, because the array become ambiguous when reading a subset of columns in the file.