-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Null literals to be not parsed as string when mixed types as string is enabled in JSON reader #14939
Fix Null literals to be not parsed as string when mixed types as string is enabled in JSON reader #14939
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of nitpicks. This looks good to me. I'm still digesting the test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall! There are a few points that are unclear to me -
test_fn(R"( | ||
{ "a": { "b": 1 } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand why we have STRUCT + STR
when we already have STR + STRUCT
test case in line 2112? This also applies for the other <STRUCT, STR, LIST>
permutations above. Does the ordering of the records in the JSON lines input matter for the parser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the examples provided by @andygrove successfully caught a segfault bug where the ordering is different. After fixing the bug, for unit tests, I created all test cases with different order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you!
/merge |
Addresses part of #14288 Depends on #14939 (mixed type ignore nulls fix) In the input schema, if a struct column is given as STRING type, it's forced to be a STRING column. This could be used to support map type in spark JSON reader. (Force a map type to be a STRING, and use different parser to extract this string column as key, value columns) To enable this forcing, mixed type as string should be enabled in json_reader_options. Authors: - Karthikeyan (https://github.com/karthikeyann) - Nghia Truong (https://github.com/ttnghia) Approvers: - Andy Grove (https://github.com/andygrove) - Mike Wilson (https://github.com/hyperbolic2346) - Shruti Shivakumar (https://github.com/shrshi) - Bradley Dice (https://github.com/bdice) URL: #14936
Description
Fixes #14864
null
literal should be ignored (considered as null) during parsing while handling mixed types.Unit tests of complex scenarios are added to test this as well.
Checklist