-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emoji support #53
Comments
flutter_emoji is a package which can parse emojis and can be stored as strings for later reference. |
I just hit this, too! |
It seems that the yaml package restricts the INPUT parsing to the same restrictions as the YAML spec defines for OUTPUT, which may be the problem. |
Any plan to fix this bug? |
PR welcome! |
Hello 😊 Thank you for this package! Any updates on that? It would great to load emojis! **Edit: ** I just figured out that everything works fine if I double quote my emojis ( |
Just want to add that this is not just an emoji issue, but a more general one. The yaml package uses Lines 1594 to 1598 in e598443
The last line checks for a range between 0x10000 and 0x10FFFF. If my observation is correct, this will never be true because char: 兼 But the following would yield an "Unexpected character" exception: char: 𠔥 Both are CJK characters, parsing would fail for the second case because 𠔥 (U+20525) requires two code units. This is what I have encountered. This requires YAML file author to remember to quote certain characters, or to quote all characters to play safe. |
Maybe a PR to `string_scanner` is the place to start?
…On Sat, Mar 23, 2024 at 8:46 PM Tamcy ***@***.***> wrote:
Just want to add that this is not just an emoji issue, but a more general
one.
The yaml package uses string_scanner for string parsing, and it mostly
utilizes the package's peekChar method, which returns the *code unit* of
the string in a specific offset. We all know Dart uses UTF-16 in its
internal representation, but the implementation of _isStandardCharacter()
in scanner.dart obviously assumes a UTF-32 codepoint be supplied:
https://github.com/dart-lang/yaml/blob/e5984433a2803d5c67ed0abac5891a55040381ee/lib/src/scanner.dart#L1594-L1598
The last line checks for a range between 0x10000 and 0x10FFFF. If my
observation is correct, this will never be true because char will never
be larger than 0xFFFF. So, the actual issue is that the YAML parser doesn't
handle surrogate pairs correctly for plain (unquoted) style. This does not
only affects emojis. For instance, the following line is fine:
char: 兼
But the following would yield an "Unexpected character" exception:
char: 𠔥
Both are CJK characters, parsing would fail for the second case because 𠔥
(U+20525) requires two code units. This is what I have encountered. This
requires YAML file author to remember to quote certain characters, or to
quote all characters to play safe.
—
Reply to this email directly, view it on GitHub
<#53 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAEFCXMPRYK5MV5NHSPMWTYZZEC7AVCNFSM4HI44YE2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBRGY3DQMRWGM3Q>
.
You are receiving this because you commented.Message ID:
<dart-lang/yaml/issues/53/2016682637 ***@***.***>
|
(Edit: please see my comment below) string_scanner seems fine, and the latest version (1.2.0) also added codepoint scanning which yaml can take advantage of to fix the problem I described. Perhaps after dart-lang/string_scanner#69 is accepted, which make On the other hand, while there seems no harm to migrate the yaml package to use codepoint as the scanning unit (but it may cause performance issue), it also seems to me that changes like #125 is already good enough or even inevitable. Here're my points:
|
oh, I saw @kevmoo points to https://yaml.org/spec/1.2-old/spec.html#id2770814 in #125. I've glanced through the document. It looks like the pull request wasn't accepted because the spec clearly excludes surrogate block from the allowed character range, so accepting |
Current parser crashes when trying to load a yaml document with emojis in it.
Error:
The text was updated successfully, but these errors were encountered: