Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape end of line characters based on 'XML End-Of-Line Encoding' SEP. #349

Merged
merged 2 commits into from
May 14, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,54 @@ class XmlStreamWriterTest {

assertEquals(expected, writer.toString())
}

/**
* The set of EOL characters and their corresponding escaped form are:
*
* | Name| Unicode code point | Escape Sequences |
* |-----|-------------|-----------------|
* | `CiAK` | `'\n \n'` | `'
 
'` |
* | `YQ0KIGIKIGMN` | `'a\r\n b\n c\r'` | `'a
 b
 c
'` |
* | `YQ3ChSBiwoU=` | `'a\r\u0085 b\u0085'` | `'a
… b…'` |
* | `YQ3igKggYsKFIGPigKg=` | `'a\r\u2028 b\u0085 c\u2028'` | `'a

 b… c
'` |
*/
// FIXME ~ Unable to add "org.junit.jupiter:junit-jupiter-params" as a test dependency...classes do not resolve
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment

or you could just write your own quick loop, something like:

// list of (raw input, expected) pairs
val tests = listOf(
    "\n \n" to "
 
",
    ...
)

tests.forEachIndexed { idx, test -> {
    val writer = xmlStreamWriter(pretty = false)
    writer.startTag("a")
    writer.text(test.first)
    writer.endTag("a")

    val expected = "<a>${test.second}</a>"
    assertEquals(expected, writer.toString(), "test case $idx failed")
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's better I'll update before push.

// Once that's fixed this should be changed to use parameters
@Test
fun itEncodesEndOfLine() {
var writer = xmlStreamWriter(pretty = false)

writer.startTag("a")
writer.text("\n \n")
writer.endTag("a")

var expected = """<a>&#xA; &#xA;</a>"""
assertEquals(expected, writer.toString())

writer = xmlStreamWriter(pretty = false)
writer.startTag("a")
writer.text("a\r\n b\n c\r")
writer.endTag("a")

expected = """<a>a&#xD;&#xA; b&#xA; c&#xD;</a>"""
assertEquals(expected, writer.toString())

writer = xmlStreamWriter(pretty = false)
writer.startTag("a")
writer.text("a\r\u0085 b\u0085")
writer.endTag("a")

expected = """<a>a&#xD;&#x85; b&#x85;</a>"""
assertEquals(expected, writer.toString())

writer = xmlStreamWriter(pretty = false)
writer.startTag("a")
writer.text("a\r\u2028 b\u0085 c\u2028")
writer.endTag("a")

expected = """<a>a&#xD;&#x2028; b&#x85; c&#x2028;</a>"""
assertEquals(expected, writer.toString())
}
}

const val expectedIdempotent = """<?xml version="1.0"?><id>912345678901</id>"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,15 @@ class XmlPullSerializer(pretty: Boolean, private val serializer: XmlSerializer =
}

override fun text(text: String): XmlStreamWriter {
serializer.text(text)
text.forEach { character ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question

  1. It looks as though serializer.text(...) works as "append" if called multiple times? (own clarification)
  2. Is there any performance consideration here?

Copy link
Contributor Author

@kggilmer kggilmer May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes this is my understanding as well. I would expect unit tests to fail if this were not the case after the addition. Here is the library function that appends to the string.
  2. Yes. Optimistically perhaps not much of an impact as we're simply pulling the linear iteration on each character in the string from inside the xpp library into our code, but there may be things I'm missing. In any case I was unable to think of a cleaner or more performant way of implementing this due to the restrictions in the XPP interface itself. I think the way forward here is to move from XPP to our own implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure sounds good. I sort of figured as much

when (character) {
'\n' -> serializer.entityRef("#xA")
'\r' -> serializer.entityRef("#xD")
'\u0085' -> serializer.entityRef("#x85")
'\u2028' -> serializer.entityRef("#x2028")
else -> serializer.text(character.toString())
}
}
return this
}

Expand Down