解析字符串时，代理对不是针对 utf-16 才有的吗，这里是解析utf-8 的编解码，也需要处理代理对？ #249

jacklin-hxl · 2023-10-26T09:01:43Z

如题

jacklin-hxl · 2023-10-26T16:36:26Z

To escape an extended character that is not in the Basic Multilingual
Plane, the character is represented as a twelve-character sequence,
encoding the UTF-16 surrogate pair. So, for example, a string
containing only the G clef character (U+1D11E) may be represented as
"\uD834\uDD1E".

通过查阅rfc4627，json 规范中明确说明如果要转义超过bmp的字符，通过 UTF-16 surrogate pair 来转义。编码的时候会再转化为unicode code point，再将其通过utf-8 规则编码为二进制。并不是以utf-16 来编码。

jacklin-hxl · 2023-10-26T16:47:06Z

我理解这应该是一种解析规则，在python中直接打印‘\uD834\uDD1E‘ 是会报错的，但是直接打印\U1D11E 是可以的。bash shell 中同样前者显示乱码，后者正常显示字符。

jacklin-hxl closed this as completed Oct 26, 2023

jacklin-hxl reopened this Oct 26, 2023

jacklin-hxl closed this as completed Oct 26, 2023

jacklin-hxl reopened this Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

解析字符串时，代理对不是针对 utf-16 才有的吗，这里是解析utf-8 的编解码，也需要处理代理对？ #249

解析字符串时，代理对不是针对 utf-16 才有的吗，这里是解析utf-8 的编解码，也需要处理代理对？ #249

jacklin-hxl commented Oct 26, 2023

jacklin-hxl commented Oct 26, 2023

jacklin-hxl commented Oct 26, 2023

解析字符串时，代理对不是针对 utf-16 才有的吗，这里是解析utf-8 的编解码，也需要处理代理对？ #249

解析字符串时，代理对不是针对 utf-16 才有的吗，这里是解析utf-8 的编解码，也需要处理代理对？ #249

Comments

jacklin-hxl commented Oct 26, 2023

jacklin-hxl commented Oct 26, 2023

jacklin-hxl commented Oct 26, 2023