-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create an int8
column in read_csv
when all elements are missing
#12110
Create an int8
column in read_csv
when all elements are missing
#12110
Conversation
Codecov ReportBase: 87.47% // Head: 88.12% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-22.12 #12110 +/- ##
================================================
+ Coverage 87.47% 88.12% +0.64%
================================================
Files 133 135 +2
Lines 21826 22133 +307
================================================
+ Hits 19093 19504 +411
+ Misses 2733 2629 -104
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice small fix. Thanks!
…bug-read_csv-empty=column-type
…uule/cudf into bug-read_csv-empty=column-type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
@@ -33,6 +33,11 @@ struct column_type_histogram { | |||
cudf::size_type positive_small_int_count{}; | |||
cudf::size_type big_int_count{}; | |||
cudf::size_type bool_count{}; | |||
auto total_count() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps for the future -- seems like this struct would be more versatile and more easily extendible (and totals easier to calculate in a bug-free way) if you store a std::array
(or similar) of counts and index it using constant names for indices. Then this line could just be a std::accumulate
(or similar).
@gpucibot merge |
Description
CSV reader creates int8 columns when all elements are null. However, when all elements in a
column are missing (e.g.
names
option includes more columns than the CSV file), CSV reader creates anint64
column. Such columns take up a lot more device memory.This PR changes the behavior so that all columns with no valid elements are created as
int8
.Checklist