Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] get_json_object and json_tuple do not re-escape escaped charters properly in nested string output #11387

Open
revans2 opened this issue Aug 22, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@revans2
Copy link
Collaborator

revans2 commented Aug 22, 2024

Describe the bug
get_json_object and json_tuple may output results in one of two ways. It may output the data with all escaped characters processed into their unescaped equivalents. i.e. "\u0000" becomes the NUL character. This happens with then only thing matched is a single quoted string value.

It can also output the escaped/normalized data. This happens if a string is a part of a nested object that matches. In these cases Spark is outputting normalized JSON for the section that matched.

Our implementation is not outputting the normalized escaped strings correctly in all cases.

Currently \u encoded characters that are control characters (< decimal 32) are not escaped at all. They are just output as the unescaped value. Spark will output the value as a \u escaped sequence. For the special escape sequences like \b \f \r \n, etc. Those should be favored over the \u versions. But I would be happy to live with the \u versions as they are still technically valid JSON.

@revans2 revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Aug 22, 2024
@revans2 revans2 removed the ? - Needs Triage Need team to review and classify label Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant