-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding: make TextDecoder handle BOM correctly #30132
Conversation
Do not accept the BOM if it comes from a different encoding, and only discard the BOM after it has actually been read (including when it is spread over multiple chunks in streaming mode). Fixes: nodejs#25315
b6414f5
to
a30272b
Compare
Landed in 237be2e |
Do not accept the BOM if it comes from a different encoding, and only discard the BOM after it has actually been read (including when it is spread over multiple chunks in streaming mode). Fixes: #25315 PR-URL: #30132 Reviewed-By: Gus Caplan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed this discussion. Code looks OK, I'm not familiar with this spec. But shouldn't ICU do this in ucnv_toUnicode?
@srl295 I don’t know, should it drop a BOM at the beginning of the text automatically? The current behaviour is that it doesn’t. Unless maybe there’s a flag in there that we could use and that I don’t know about? (And we do also at least need one way to make it not drop a BOM at the beginning of the text.) |
the converter names can take parameters such as "ISO_2022,locale=ja,version=4” - if ICU should support these options for whatwg compliance, please open an issue on https://unicode-org.atlassian.net under the ICU project and mention nodejs
|
@srl295 I really have no idea whether ICU should or should not support automatic BOM-dropping or where to look up if it already does (other than that the implementation doesn’t look to me like it does), so I don’t feel qualified to open an issue and explain Node’s needs. All I know is that it’s relatively easy for Node.js to work with what ICU currently provides. |
OK. I'll file an issue then. |
Do not accept the BOM if it comes from a different encoding, and only discard the BOM after it has actually been read (including when it is spread over multiple chunks in streaming mode). Fixes: #25315 PR-URL: #30132 Reviewed-By: Gus Caplan <[email protected]>
Do not accept the BOM if it comes from a different encoding, and only discard the BOM after it has actually been read (including when it is spread over multiple chunks in streaming mode). Fixes: #25315 PR-URL: #30132 Reviewed-By: Gus Caplan <[email protected]>
Do not accept the BOM if it comes from a different encoding, and only discard the BOM after it has actually been read (including when it is spread over multiple chunks in streaming mode). Fixes: #25315 PR-URL: #30132 Reviewed-By: Gus Caplan <[email protected]>
Do not accept the BOM if it comes from a different encoding, and
only discard the BOM after it has actually been read (including
when it is spread over multiple chunks in streaming mode).
Fixes: #25315
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes