-
-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nbsp char in xml name allowed #578
Comments
Hello @wangyoutian , It is possible for you to reproduce this issue in .NET Fiddle I currently get 2 "consituent" elements on my side: https://dotnetfiddle.net/Yb9nBG The end tag with the Best Regards, Jon |
https://dotnetfiddle.net/5WSaR2 is the reproduced issue (see the "test 3" there) An extra notable phenomenon: (the above code can also be found at: ) |
Thank you ;) |
Hello @wangyoutian , What kind of behavior are you expecting? We currently have the same behavior as browsers like Firefox and Chrome. Since this is an "EndTag" and doesn't have any corresponding "BeginTag", we simply ignore it and continue the logic. A But I'm not expecting any kind of error to be thrown. Let me know more as at this moment, I believe it works as intended. |
In some text input field, such as "textarea", in some webpage , when you input space(0x20), it will be converted to nbsp(0xa0). So If one user intends to input some xml code in such text field, and inadvertently inputs a space(0x20) that is appended to the endtag name, then the space is converted into nbsp(0xa0). The user would think it's still space, as visually the nbsp is indiscernible. And if it's indeed space (0x20), per the specification: https://dev.w3.org/html5/spec-LC/syntax.html#:~:text=HTML%20elements%20all%20have%20names,005A%20LATIN%20CAPITAL%20LETTER%20Z8.1.2.2 End tags The first character of an end tag must be a U+003C LESS-THAN SIGN character (<).
|
1. Description
if we append a char nbsp (0xa0) to element name, it's parsed normally without exception thrown.
eg:
will be parsed as one "constituent" element, not two.
And the problem is suppressed (which is not good), and it's hard to debug, as 0xA0 is visually indiscernible from 0x20.
2. Expectation
https://dev.w3.org/html5/spec-LC/syntax.html#:~:text=HTML%20elements%20all%20have%20names,005A%20LATIN%20CAPITAL%20LETTER%20Z.
doesnot allow such chars in element name.
nor xml allows as stipulated in:
http://w3.org/TR/REC-xml/#NT-NameStartChar
;
Otherwise, it's hard to pin down the issue.
Solution?
Should we in documentation explicitly allow such chars or should we throw exception?
The text was updated successfully, but these errors were encountered: