-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escape apostrophes by default? #167
Comments
I would be happy with a setting/flag optional argument for XML loader calls indicating a desire for stricter XML 1.0 semantics, vs. lax XML/HTML4 compatibility semantics with the default being the lax behavior as that is what is current. Our usage is only for XML, no HTML involved, so we would just always pass this flag in strict setting. |
In scala-xml/shared/src/main/scala/scala/xml/Utility.scala Lines 98 to 112 in 7f36f80
And then in
So it always handles apos when parsing. The additional flag would be passed to scala-xml/shared/src/main/scala/scala/xml/Utility.scala Lines 202 to 209 in 7f36f80
Considering how many optional arguments the function now takes, maybe there should be a Also, should apos escaping default to true to match the expected behavior these days or remain false to preserve current behavior? |
removing this from the 2.0 milestone since we're nearing release (#432) and it doesn't seem like a blocker |
There is a fix related to apostrophes and
XMLEventReader
in #72 that is slated to appear in version 1.1.0. However, while fixing the defect there was not a change to the handling of apostrophes for the entire library. Fundamentally, this is represented by the behavior ofUtitlity.escape
method:Users are curious why the apostrophe isn't generally being escaped to become
'
:The reason apostrophe isn't escaped is because the scala-xml code has been around for a long time, and although it is primarily an XML library its purpose was one of mixed use with HTML. The HTML 4.0 standard still does not define an
apos
entity, see https://www.w3.org/TR/html401/sgml/entities.htmlThe apos entity has been in XML 1.0 since the beginning, see https://www.w3.org/TR/1998/REC-xml-19980210#sec-predefined-ent
Most HTML these days is XHTML, and the apos entity was defined in XHTML 1.0, since it needed to conform to the XML standard, see https://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html#a_dtd_Special_characters
It seems like
apos
was explicitly commented out for some historical issue about HTML4 and a Web browser called Internet Explorer -- see the enigmatic comment added in e1fadeb.Ten years later, can we bring back XML escape support for
apos
?There is some commentary on the Entities representing special characters in XHTML at Wikipedia, but I couldn't find an an analysis of browser support for
apos
.Presumably, all browsers support
apos
if the document is properly declared as XHTML. So, it seems that fixing this would require accepting that scala-xml no longer supports HTML4 or earlier?If that's the case, than the only other issue is finding out the second-order consequences of "fixing" this, and how this would affect users. Byte-for-byte, there XML would suddenly look a little different.
There are at least 4 different contexts for an apostrophe:
<p>The ' character</p>
<![CDATA[the ' character]]>
<a href="#the-'-character">
<a href='foo.html'>
And then there are at least 3 different programming modes
Elem(null, "p", Null, TopScope, Text("The ' character"))
)And then there at least 3 different types of parsing and reading:
scala.xml.factory.XMLLoader
scala.xml.pull.XMLEventReader
scala.xml.parsing.ConstructingParser
The text was updated successfully, but these errors were encountered: