-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] JSON reader: support unquoted JSON field names. #10266
Comments
This issue has been labeled |
Setting this to P1 as it is off by default in Spark |
This issue has been labeled |
should we support whitespace in the unquoted field names? |
No. see below for details
White space at the beginning and after the name but before the
That would only show up for a non-json lines use case. In those cases newline is treated like other white space, and stripped from the beginning and end, but an error if it is in the middle.
Not for unquoted names. None quoted escape characters in the names of a field are considered an error. This is true even if escaping any character is allowed as set by a second config. Here is the test files and code that I used.
Like in the other examples you can ignore the "_corrupt_record" field it is generally not used and we don't support it on the GPU, but it shows which lines had errors in them . |
This is part of FEA of NVIDIA/spark-rapids#9
We have a JSON file
{name: "Reynold Xin"}
Spark can parse it when enabling
allowUnquotedFieldNames
CUDF parsing will throw exception
We expect there is a configure
allowUnquotedFieldNames
to control this behavior.The text was updated successfully, but these errors were encountered: