Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to specify data types for a subset of columns in read_csv #10484

Merged
merged 11 commits into from
Mar 24, 2022

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Mar 22, 2022

Fixes #10254

CSV reader previously assumed that all data types are specified by the user, or none.
This PR changes the logic so that user can pass a map/dictionary to specify type for any subset of columns, and reader infers the type for the remaining columns.
When passing columns as an array, users still need to specify all columns' types, because the array become ambiguous when reading a subset of columns in the file.

@vuule vuule added bug Something isn't working cuIO cuIO issue non-breaking Non-breaking change labels Mar 22, 2022
@vuule vuule requested a review from a team as a code owner March 22, 2022 19:10
@vuule vuule self-assigned this Mar 22, 2022
@vuule vuule requested review from bdice and ttnghia March 22, 2022 19:10
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Mar 22, 2022
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall! Only one minor change request.

cpp/src/io/csv/reader_impl.cu Outdated Show resolved Hide resolved
@galipremsagar
Copy link
Contributor

is this intended for 22.04 or 22.06? - Asking this because the PR and issue are on two different boards.

@vuule
Copy link
Contributor Author

vuule commented Mar 22, 2022

is this intended for 22.04 or 22.06? - Asking this because the PR and issue are on two different boards.

I think it's a bit late for 22.04, would prefer to leave for 22.06 unless the issue is critical. Did not get a comment from @shwina about the severity.

@shwina
Copy link
Contributor

shwina commented Mar 23, 2022

Apologies for missing the pings about this one. 22.06 should be fine.

@codecov
Copy link

codecov bot commented Mar 24, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.06@8d86ae8). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.06   #10484   +/-   ##
===============================================
  Coverage                ?   86.31%           
===============================================
  Files                   ?      140           
  Lines                   ?    22312           
  Branches                ?        0           
===============================================
  Hits                    ?    19259           
  Misses                  ?     3053           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d86ae8...3d0809d. Read the comment docs.

@vuule
Copy link
Contributor Author

vuule commented Mar 24, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 3a16a7f into rapidsai:branch-22.06 Mar 24, 2022
@vuule vuule deleted the fea-csv-allow-partial-dtype branch March 24, 2022 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] read_csv fails when not all columns specified in the dtype argument
5 participants