-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set the null count in output columns in the CSV reader #13221
Conversation
@@ -859,7 +867,7 @@ table_with_metadata read_csv(cudf::io::datasource* source, | |||
const std::string dblquotechar(2, parse_opts.quotechar); | |||
std::unique_ptr<column> col = cudf::make_strings_column(*out_buffers[i]._strings, stream); | |||
out_columns.emplace_back( | |||
cudf::strings::replace(col->view(), dblquotechar, quotechar, -1, mr)); | |||
cudf::strings::detail::replace(col->view(), dblquotechar, quotechar, -1, stream, mr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change, just noticed that the version that the detail API needs to be used here to ensure that the right stream is used.
CUDF_EXPECTS(col_lhs.null_count() == 0 and col_rhs.null_count() == 0, | ||
"All elements should be valid"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one test that was (kind of) checking the null count. With this additional check, it fails when the null count is incorrect.
/merge |
Description
CSV reader used to keep the null count of the output columns as
UNKNOWN_NULL_COUNT
. This has performance implications on further use of these columns.This PR adds a validity counter to convert_csv_to_cudf kernel. This counter is then used to set the correct null count in the columns.
Also added a test that explicitly checks the null counts, as it is never explicitly used in the CSV reader tests.
Checklist