-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand CSV and JSON reader APIs to accept dtypes
as a vector or map of data_type
objects
#8856
Conversation
…fea-csv-dtypes-api
…fea-csv-dtypes-api
…fea-csv-dtypes-api
@@ -464,47 +475,71 @@ void reader::impl::set_column_names(device_span<uint64_t const> rec_starts, | |||
} | |||
} | |||
|
|||
void reader::impl::set_data_types(device_span<uint64_t const> rec_starts, | |||
rmm::cuda_stream_view stream) | |||
std::vector<data_type> reader::impl::parse_data_types( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the diff is very confusing here. Did not rename the function, but added parse_data_types
to handle the deprecated API that takes strings. This will make the future removal cleaner.
Co-authored-by: Ram (Ramakrishna Prabhu) <[email protected]>
…to fea-csv-dtypes-api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First high-level round of review. Overall looks good. Just found a few minors so far.
Co-authored-by: Elias Stehle <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest look good, a small suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libcudf changes LGTM 👍
@gpucibot merge |
Found an compile error in `csv_test.cpp` that only occurs with a debug build since it exists in an `assert()` statement. Looks like this was introduced in PR #8856 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Mike Wilson (https://github.com/hyperbolic2346) URL: #8981
Found an compile error in `csv_test.cpp` that only occurs with a debug build since it exists in an `assert()` statement. Looks like this was introduced in PR rapidsai#8856 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Mike Wilson (https://github.com/hyperbolic2346) URL: rapidsai#8981
Goal of the PR is to enable CSV to read columns as decimal, and to replace the string-based
dtype
part of the API.data_type
based API is needed because we need to specify scale for decimal columns, and doing this via a string that describes the type is 💩Changes in the PR:
dtype
related getters/setters to also take a vector or a map ofdata_type
objects.In case of CSV, vector of
data_type
s was already supported. Reworked the implementation to support different use cases that the "dtype-as-string" code path supports.parse_dates
option to make up for the special strings that CSV supported to denote that a column needs to be parsed as hexadecimal (the option to pass strings is to be removed).infer_date
option toparse_dates
.Breaking because API to specify date columns has been renamed to match the new
parse_hex
API; renamed frominfer_date
toparse_dates
Depends on #8843