Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] White space normalization in JSON arrays #15251

Closed
revans2 opened this issue Mar 7, 2024 · 3 comments
Closed

[FEA] White space normalization in JSON arrays #15251

revans2 opened this issue Mar 7, 2024 · 3 comments
Assignees
Labels
0 - Waiting on Author Waiting for author to respond to review cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@revans2
Copy link
Contributor

revans2 commented Mar 7, 2024

Is your feature request related to a problem? Please describe.
I am putting this down as a new feature because I think it is a missed requirement as a part of #15033

I really would love it if we also could normalize the whitespace within arrays, not just objects.

{"data": [0,5,6,7 ,8 , 9 ]}

should become

{"data":[0,5,6,7,8,9]}

Similarly we should be able to deal with strings and we remove all of the white space around quoted strings, but not in them.

{"data": ["0"," 5","6 ","7" ,"8" , "9" ]}

should become

{"data":["0"," 5","6 ","7","8","9"]}

And ideally this should nest

{"data": [{"a": "A"}, {"b": "B"}]}

would become

{"data":[{"a":"A"},{"b":"B"}]}
@revans2 revans2 added the feature request New feature or request label Mar 7, 2024
@GregoryKimball GregoryKimball added cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS labels Mar 7, 2024
@GregoryKimball GregoryKimball added this to the Nested JSON reader milestone Mar 7, 2024
@shrshi
Copy link
Contributor

shrshi commented Mar 8, 2024

Thank you for the explanation and the examples, @revans2
All of the cases mentioned are already being parsed correctly with the current whitespace normalizer i.e. it can handle whitespace in nested lists and around quoted strings.

@GregoryKimball GregoryKimball added the 0 - Waiting on Author Waiting for author to respond to review label Mar 8, 2024
@revans2
Copy link
Contributor Author

revans2 commented Mar 11, 2024

Okay then I will do some more debugging and see why I am not seeing the results coming back properly.

@revans2
Copy link
Contributor Author

revans2 commented Mar 12, 2024

You are correct it was a bug in my test

@revans2 revans2 closed this as completed Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Waiting on Author Waiting for author to respond to review cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

No branches or pull requests

3 participants