-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds option to take explicit nested schema for nested JSON reader #11682
Adds option to take explicit nested schema for nested JSON reader #11682
Conversation
…fea-json-col-cast
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## branch-22.10 #11682 +/- ##
===============================================
Coverage ? 85.90%
===============================================
Files ? 151
Lines ? 23601
Branches ? 0
===============================================
Hits ? 20275
Misses ? 3326
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
…n-explicit-schema
<< (ret.has_value() ? std::to_string(static_cast<int>(ret->type.id())) : "n/a") | ||
<< ", with " << (ret.has_value() ? ret->child_types.size() : 0) << " children" | ||
<< "\n"; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will these be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the debug prints inconvenient while reading the code?
These debug prints help a lot during development and debugging.
any suggestions to mitigate this and still keep the debug prints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find them in the way as a reader/reviewer. And most code is read more than written.
Does this code need to be maintained? Should it be reviewed for correctness?
If yes to both then we should leave it.
We should probably have rules for this since I'm seeing this show up in other code reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I also find them hindering the reading flow but, as @karthikeyann pointed out, they also proofed to be really helpful for debugging. My plan and suggestion would be to keep them for a couple more weeks until we gained confidence that everything is working reliably and to remove them afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
only comment on std::nullopt
. not need to address it in this PR. will address this when I implement this same feature in full gpu json parser path.
-> std::optional<schema_element> { | ||
auto ret = (user_dtypes.find(col_name) != std::end(user_dtypes)) | ||
? user_dtypes.find(col_name)->second | ||
: std::optional<schema_element>{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: std::nullopt
is preferred.
rerun tests |
@gpucibot merge |
Description
This PR adds the option to take an explicit nested schema, allowing users to specify the target data types of the leave columns in the nested JSON reader. This PR adds the corresponding interface and implementation to libcudf.
In addition, the PR makes existing JSON reader tests parametrised tests and enables those tests for dual execution of (1) the existing JSON reader and (2) the new nested JSON reader.
Checklist