Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON becomes malformed when trying to write string containing \\u0000 #475

Closed
susisu opened this issue Mar 17, 2020 · 13 comments
Closed

JSON becomes malformed when trying to write string containing \\u0000 #475

susisu opened this issue Mar 17, 2020 · 13 comments

Comments

@susisu
Copy link

susisu commented Mar 17, 2020

If input JSON has a string containing a sequence \\u0000, due to #348, the part \u0000 is removed and the escape character \ is left, and this possibly causes error.

For example, when trying to write a JSON { "xxx": "***\\u0000***" }, I got the following error.

org.postgresql.util.PSQLException: ERROR: invalid input syntax for type json
  Detail: Escape sequence "\*" is invalid.
  Where: JSON data, line 1: {"xxx":"***\*...
@susisu
Copy link
Author

susisu commented Mar 17, 2020

If slick-pg can carefully remove null characters in strings, it would be nice; otherwise IMO it should not touch JSON after it is stringified.

@tminglei
Copy link
Owner

@susisu fixed, pls help check.

BTW, \\u0000 must be removed from inputing json string, otherwise it will encounter exception as

[info]   Cause: org.postgresql.util.PSQLException: ERROR: unsupported Unicode escape sequence
[info]   details: \u0000 cannot be converted to text.
[info]   position: JSON data, line 1: {"d":...

@susisu
Copy link
Author

susisu commented Mar 17, 2020

@tminglei Thank you for quick response.

BTW, \u0000 must be removed from inputing json string, otherwise it will encounter exception

I mean JSON string "\\u0000" should be saved as-is, because it is a normal sequence that does not contain null characters. Your fix seems not solving this...

@susisu
Copy link
Author

susisu commented Mar 17, 2020

In addition, string like "\\\u0000" will result in a malformed one.

tminglei added a commit that referenced this issue Mar 18, 2020
@tminglei
Copy link
Owner

@susisu I did a re-fix try, can you help check again?

@susisu
Copy link
Author

susisu commented Mar 18, 2020

@tminglei Thanks! It is pretty good for "\\u0000".
However, it looks \u0000 in "\\\u0000" will not be removed now.

How about using regex like """\\+u0000""".r and removing \u0000 only if the number of \ is 2n+1?

@tminglei
Copy link
Owner

Regex will fail, I didn't find a useable regex.

@susisu
Copy link
Author

susisu commented Mar 18, 2020

Regex will fail, I didn't find a useable regex.

Yes, the exact regex that matches \u0000 in "\u0000" and "\\\u0000" but not matches it in "\\u0000" does not exist (or is non-standard).
But it is possible to find candidates that contain \u0000 to be removed, isn't it?

@susisu
Copy link
Author

susisu commented Mar 18, 2020

Maybe it becomes a bit complex if using regex etc., and I think it can be a reasonable way to not touch JSON after stringified.
But I don't know which is better...

@tminglei
Copy link
Owner

now, 123\u000045\\u00006\\\u00007 will be converted to 12345\\u00006\\7.
I didn't find a better way.

@susisu
Copy link
Author

susisu commented Mar 18, 2020

now, 123\u000045\\u00006\\\u00007 will be converted to 12345\\u00006\\7.

I expected this result, but d2731f4 will yield 12345\\u00006\\\u00007 which contains null character \u0000.

tminglei added a commit that referenced this issue Mar 19, 2020
@tminglei
Copy link
Owner

@susisu re-fixed again. Your question should be answered by the test case.
Pls help check again.

@susisu
Copy link
Author

susisu commented Mar 19, 2020

@tminglei Looks good enough. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants