Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

解析字符串时,代理对不是针对 utf-16 才有的吗,这里是解析utf-8 的编解码,也需要处理代理对? #249

Open
jacklin-hxl opened this issue Oct 26, 2023 · 2 comments

Comments

@jacklin-hxl
Copy link

如题

@jacklin-hxl
Copy link
Author

To escape an extended character that is not in the Basic Multilingual
Plane, the character is represented as a twelve-character sequence,
encoding the UTF-16 surrogate pair. So, for example, a string
containing only the G clef character (U+1D11E) may be represented as
"\uD834\uDD1E".

通过查阅rfc4627,json 规范中明确说明如果要转义超过bmp的字符,通过 UTF-16 surrogate pair 来转义。编码的时候会再转化为unicode code point,再将其通过utf-8 规则编码为二进制。 并不是以utf-16 来编码。

@jacklin-hxl
Copy link
Author

我理解这应该是一种解析规则,在python中直接打印‘\uD834\uDD1E‘ 是会报错的,但是直接打印\U1D11E 是可以的。bash shell 中同样前者显示乱码,后者正常显示字符。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant