-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support document.write #6
Comments
As much as I’d like to, I don’t know that we can convince other implementations to replace lone surrogates with U+FFFD. For those that use UCS-2 internally (every one but us), this is pure overhead and has a performance cost. And it’s not just Another solution could be WTF-8: rust-lang/rust#12056 (comment). It’s a superset of UTF-8 (like UTF-8 is a superset of ASCII) that allows surrogates, but only if they’re unpaired. (Concatenating two WTF-8 strings is not just concatenating the bytes, but also needs to check for newly-paired surrogates at the boundary and converts them to the UTF-8 representation of a single code point.) |
Is it out of the question that the spec would allow but not mandate U+FFFD replacement? When I brought this up before people seemed to think it was enough of a corner case that we could get away with it (spec wording changes or no) |
“Allow but not mandate” sounds bad for interop on principle, though I don’t know how much it really matters here. But even if we replace in When this was brought up in CSS WG to replace in CSSOM, the conclusion was "no change". (Though it’s not clear to me the arguments for change were well represented then. I was in the meeting remotely in audio only with very bad sound quality.) |
WTF-8 is a thing now: http://www.mail-archive.com/[email protected]/msg00921.html |
I’ve changed my mind on the above. I’d like Servo to try UTF-8 everywhere in the DOM and what you first suggested here for |
https://github.com/kmcallister/tendril encompasses my latest proposal. |
document.write landed. |
See servo/servo#3704.
The argument to
document.write
is a sequence of UCS-2 code units and we need a way to interface this with the UTF-8 parser. My plan is:(Edit: Largely superseded by this proposal)
document.write
input ends with a leading surrogate, we can't convert it yet, so save this singleu16
in theBufferQueue
alongside the UTF-8 buffers.document.write
input starts with a trailing surrogate, and there's a saved leading surrogate in theBufferQueue
, then replace both with the appropriate Unicode character as UTF-8.document.write
calls, or wrote a lone leading surrogate and then finished.)The text was updated successfully, but these errors were encountered: