[BUG] get_json_object and json_tuple do not re-escape escaped charters properly in nested string output #11387

revans2 · 2024-08-22T16:44:14Z

Describe the bug
get_json_object and json_tuple may output results in one of two ways. It may output the data with all escaped characters processed into their unescaped equivalents. i.e. "\u0000" becomes the NUL character. This happens with then only thing matched is a single quoted string value.

It can also output the escaped/normalized data. This happens if a string is a part of a nested object that matches. In these cases Spark is outputting normalized JSON for the section that matched.

Our implementation is not outputting the normalized escaped strings correctly in all cases.

Currently \u encoded characters that are control characters (< decimal 32) are not escaped at all. They are just output as the unescaped value. Spark will output the value as a \u escaped sequence. For the special escape sequences like \b \f \r \n, etc. Those should be favored over the \u versions. But I would be happy to live with the \u versions as they are still technically valid JSON.

revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Aug 22, 2024

revans2 removed the ? - Needs Triage Need team to review and classify label Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] get_json_object and json_tuple do not re-escape escaped charters properly in nested string output #11387

[BUG] get_json_object and json_tuple do not re-escape escaped charters properly in nested string output #11387

revans2 commented Aug 22, 2024

[BUG] get_json_object and json_tuple do not re-escape escaped charters properly in nested string output #11387

[BUG] get_json_object and json_tuple do not re-escape escaped charters properly in nested string output #11387

Comments

revans2 commented Aug 22, 2024