-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] JSON mixed_types_as_strings feature incorrectly returns some structs as strings #14864
Comments
@karthikeyann Would you please take a look? |
In this example, |
Thanks @karthikeyann. I have either misunderstood something or my repro case is not correct. I will take a look. |
I found another issue while working on a better repro.
{"teacher": "Ntflgt","student": {"name": "Odhut", "age": 10}}
{"teacher": "Pnjugo","student": {"name": "Xqnqpg", "age": 13}}
{"student":null}
|
I now have an accurate repro: @Test
void testMixedTypes() throws IOException {
MultiBufferDataSource source = sourceFrom(
TestUtils.getResourceAsFile("structs.json"));
JSONOptions opts = JSONOptions.builder()
.withLines(true)
.withMixedTypesAsStrings(true)
.build();
TableWithMeta table = Table.readJSON(opts, source.getHostBuffers()[0],
0, source.size());
Table t = table.releaseTable();
for (int i=0; i<t.getNumberOfColumns(); i++) {
System.out.println("TYPE " + i + ": " + t.getColumn(i).getType());
}
TableDebug.builder().build().debug("TABLE", t);
} Note that this requires adding a method to public HostMemoryBuffer[] getHostBuffers() {
return hostBuffers;
} Test data: {"teacher": "Ntflgt","student": {"name": "Odhut", "age": 10}}
{"teacher": "Pnjugo","student": {"name": "Xqnqpg", "age": 13}}
{"student":null}
{"teacher": "Ntflgt","student": {"name": "Odhut", "age": 10}}
{"teacher": "Pnjugo","student": {"name": "Xqnqpg", "age": 13}} Output:
The |
Fix available with PR #14939 In [5]: df
Out[5]:
teacher student
0 Ntflgt {'name': 'Odhut', 'age': 10}
1 Pnjugo {'name': 'Xqnqpg', 'age': 13}
2 <NA> None
3 Ntflgt {'name': 'Odhut', 'age': 10}
4 Pnjugo {'name': 'Xqnqpg', 'age': 13}
In [6]: df.info()
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 teacher 4 non-null object
1 student 4 non-null struct
dtypes: object(1), struct(1)
memory usage: 390.0+ bytes |
…ng is enabled in JSON reader (#14939) Fixes #14864 `null` literal should be ignored (considered as null) during parsing while handling mixed types. Unit tests of complex scenarios are added to test this as well. Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - MithunR (https://github.com/mythrocks) - Andy Grove (https://github.com/andygrove) - Shruti Shivakumar (https://github.com/shrshi) - https://github.com/nvdbaranec URL: #14939
Describe the bug
When reading JSON containing structs where some fields are optional and when the
mixed_types_as_strings
feature is enabled, the data is returned as a string column instead of a struct column.Steps/Code to reproduce bug
Input file:
Java test code:
This outputs
STRING
instead ofSTRUCT
.Expected behavior
I would not consider this input to be "mixed types". We expect this to be returned as a struct column.
Environment overview (please complete the following information)
N/A
Environment details
N/A
Additional context
The text was updated successfully, but these errors were encountered: