-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup common parsing code in JSON, CSV reader #12022
Cleanup common parsing code in JSON, CSV reader #12022
Conversation
} | ||
if (exponent != 0) { value *= exp10(double(exponent * exponent_sign)); } | ||
} | ||
} | ||
if (!all_digits_valid) { return error_result; } | ||
if (!all_digits_valid) { return {}; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be more explicit here. Current code could be mistaken for returning a zero.
if (!all_digits_valid) { return {}; } | |
if (!all_digits_valid) { return std::nullopt; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compiles locally, but CI build gives this error
04:37:13 $SRC_DIR/cpp/src/io/utilities/parsing_utils.cuh(240): error #20014-D: calling a __host__ function from a __host__ __device__ function is not allowed
The constructor from nullopt_t arg might be a host function. not constexpr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karthikeyann Hi,I also encountered this problem, did you solve this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. By callling default constructor of std::optional
if (!all_digits_valid) { return std::optional<T>{}; }
CC @galipremsagar for viz and Python perspective on this behavior change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few small suggestions.
It's not obvious at first glance what the desired behavior is, especially since Pandas uses NaN
(for it's own reasons). The main benefit of using null
here is that we used to return zeroes for invalid integers :( Not a great stand-in for NaN
.
0e45d13
pushing this breaking change to next release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement.
@@ -670,6 +597,7 @@ struct ConvertFunctor { | |||
parse_options_view const& opts, | |||
bool as_hex) | |||
{ | |||
// TODO what's invalid input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please make these TODOs clearer if we're planning to merge them?
I looked into the code and don't see that invalid input is handled in a robust way in parse_decimal
and to_timestamp
. Is that what this TODO needs to remind us of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. this TODO
is a reminder for future and also to remind reviewers to discuss and decide what is invalid input for this type.
@gpucibot merge |
1 similar comment
@gpucibot merge |
…12272) This PR changes behaviour of nested json reader to return null element on parsing error of numeric types. Breaking behaviour in read_csv, and read_json: - old behaviour: parsing error returns quiet NAN for that type as result for that row. - new behaviour: parsing error returns null for that row. Moved commit 0e45d13 #12022 (comment) to 23.02 Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Vyas Ramasubramani (https://github.com/vyasr) - Vukasin Milovanovic (https://github.com/vuule) URL: #12272
Description
This PR will cleanup nested json reader and csv reader's common parsing code.
std::optional
for indicating parsing failure inparse_numeric
decode_value
as it only gives only specialization for timestamp and duration types, rest of types are passthrough.decode_digit
Depends on #11898 and #12021
Checklist