Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Enable
schema_element
&keep_quotes
support in json reader #11746[REVIEW] Enable
schema_element
&keep_quotes
support in json reader #11746Changes from 13 commits
3425dd7
4621855
dd05c2c
7235cbd
323c5b2
9594fc0
c5dbb1c
4ac82da
cad50cf
43dc243
66be512
6e6a55c
daea9b2
e5bdc74
af32f3e
289cac2
55e3b32
3b5bacc
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With PR #11714,
cudf_experimental
will support both JSON lines, and also records.Also, old
cudf_experimental
already supports records format.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karthikeyann, is #11714 ready to merge for 22.10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if tests pass, it should be. Addressed most of review comments, left some for next PRs.
I merged this PR too locally and tested with above change. Tests pass.
experimental engine could use more python tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is
"[1, 2, 3]\n[4, 5, 6]\n[7, 8, 9]\n"
a valid JSON lines?This is not supported by
cudf_experimental
engine. Record orient with lines=True/False is only supported.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the
dtype
argument going to be a breaking change for theengine='cudf'
reader?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope,
dtype
is going to continue to work forengine='cudf'
reader. But the reason I've deprecated supporting the list of dtypes indtype
param is there is no way we can give the column a name withschema_info
being introduced going forward.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype=list
is going to a breaking change once we drop it. But this PR is just deprecating. Pandas don't have it and hence we won't be needing it either.