-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON Resultset UTF-8 encoding issues when escaped with \u #303
Comments
Spot on… JSON only supports 4-digit Unicode escape sequences. Unicode characters outside the BMP must be emitted directly as a UTF-8 sequence (allowed by JSON production char) or encoded as surrogate pairs. This is a serious bug as browser-provided JSON.parse() doesn't support lenient parsing and breaks on illegal escape sequences, as in
May be reproduced by the following query on the DBpedia endpoint:
|
The following should work as a stopgap measure: JSON.parse(text.replace(/\\U([0-9A-Fa-f]{8})/g, function ($0, $1) {
var c=parseInt($1, 16)-0x010000;
var h=(c>>10)+ 0xD800;
var l=(c & 0x3FF) + 0xDC00;
return String.fromCharCode(h, l)
})) |
This issue was fixed a few days ago , and will be making its way to the commercial and open source archives , dbpedia included in the coming days ... |
The fix for this issue has been pushed to the open source develop/7 branch: http://sourceforge.net/p/virtuoso/virtuoso-opensource/ci/e0f65ec67f980251579fbd614be1fb0ac6b18786 |
Thanks! |
Hi,
It appears that UTF-8 characters returned in SPARQL JSON resultsets are not properly encoded with
\u
.Here is a DBPedia query that fails:
Encoded characters such as "\U0001B000" should probably encoded as "\uD82C\uDC00" instead.
The text was updated successfully, but these errors were encountered: