-
-
Notifications
You must be signed in to change notification settings - Fork 775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine surrogate pairs into one escape sequence when dumping. #369
Combine surrogate pairs into one escape sequence when dumping. #369
Conversation
e1b3f9a
to
1393e25
Compare
Please don't forget to provide proofs that such encoding is valid. |
@puzrin I'm sorry, are you wanting proof that it is valid according to the spec, or that this code produces valid output? |
that it is valid according to the yaml spec |
Alternative could be to leave surrogates unencoded. Then after string will be saved in utf-8 (in file, for example), characters become "native". But this also requires investigation that such thing is allowed in yaml. |
Encoding astral characters as 32-bit escape sequences (like
Is that what you were wanting as proof, or something else? |
Exactly. References are solid. Thank you. One more question. Could this be a problem with JSON in JS? (i think - no) |
Partially, yes. There are two related things here that are causing this in JS and JSON. The first is that JavaScript internally uses either UCS-2 or UTF-16 (https://mathiasbynens.be/notes/javascript-encoding). So whenever you use The second issue is that in JSON, astral characters are supposed to be encoded as surrogate pairs (see JSON spec section "9 String"), as JSON does not have long escape sequences. This does make it confusing to convert between JSON and YAML. |
@puzrin Also, do you think this should be an option, or the default? If an option, do you have an idea what it should be named (maybe |
I think - default will be good enougth. |
@puzrin Also, the loader should already be able to parse these 32-bit sequences, correct? https://github.com/tech4him1/js-yaml/blob/1393e2539556776899d274cce2cc972b5b9ae69f/lib/js-yaml/loader.js#L114-L119 It works for me, I just wanted to check with you as well. |
1393e25
to
640f868
Compare
This PR should be ready to merge, unless you have any changes that you want made. |
published |
Thank you! |
* master: (58 commits) Check for leading newlines when determining if block indentation indicator is needed (nodeca#404) Add property based tests to assess load reverses dump (nodeca#398) 3.11.0 released Browser files rebuild Dumper: fix negative integers in bin/octal/hex formats, close nodeca#399 support es6 arrow functions, fixes nodeca#389 (nodeca#393) Fix typo in README.md (nodeca#373) 3.10.0 released Browser files rebuild Add test for astrals dump Combine surrogate pairs into one escape sequence when encoding. (nodeca#369) Fix condenseFlow for objects (nodeca#371) correct spelling mistake (nodeca#367) More meaningful error for loader (nodeca#361) Fix typo and format code. (nodeca#365) 3.9.1 released Browser files rebuild Ensure stack is present for custom errors (fixes nodeca#351) (nodeca#360) 3.9.0 released Browser files rebuild ...
When an astral character is encountered while dumping to a double-quoted string, save it as an 32-bit escaped Unicode sequence instead of as a surrogate pair (
\U0001F600
instead of\uD83D\uDE00
).Closes #368.