-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utility.trim collapse whitespace for adjacent text nodes #73 #113
Conversation
@EdgeCaseBerg Thanks for the PR. Let me go through the request. Good Day. |
What I did over here is inserted a node inside text nodes. Do we have to look into cases like this ? |
That seems very much related to #77 (whitespace trimming = aggressive). If I look at the original ticket for this issue I see the comment
I was only thinking about adjacent text nodes with this because it seems like something that can be solved. When there's another tag in their midst, with just another text node inside of it, I almost want to say that it too should be treated like a text node (so in this case, keep both sides of WS). However, I think that's also letting the trimmer try to read the semantics of the XML, which isn't something a general utility should do in my opinion. In your example I think we could do a look ahead, by examining the first child of the next non-text node, but then what about in cases like:
or
If we were to treat
And we'd be able to handle it. But I still think that's trying to read meaning during trimming, which will result in cases like this being messed up:
Sorry about the stream of consciousness, just thinking out loud. So in regards to
I think my answer is that we shouldn't because that's trying to guess at the intent of the XML itself and not just trying to collapse text nodes and their whitespace together. |
needs a rebase now. other than that, where do we stand here...? is there any consensus on whether changes are needed before merging? |
@SethTisue |
3019f09
to
88df70c
Compare
ran |
I don't know why github's interface had to run the merge conflict commit twice (3b86f51 and b34f5ae) but that's fixed again though I haven't had the chance to compile things locally since I just noticed the merge conflict randomly today. |
b34f5ae
to
6578a8e
Compare
K, rebased again to resolve that merge conflict that popped up. @SethTisue / @ashawley do you know if this might make it into a release soon? |
Hey @EdgeCaseBerg! Thanks for rebasing and getting the test suite to work. I'm leaning towards including your patch with version 1.1.1, which is the milestone version after the next planned release. This is for no particular reason than wanting to study the consequences further. I'll be in touch soon after the release is made. |
@EdgeCaseBerg rebase against current master...? |
@SethTisue sorry about the wait, github's "bell" notifications never catch my eye and my email is too full of work emails in my filtered github folder to catch everything. I'm rebasing now and will push things up when I'm done sorting things out |
@SethTisue / @ashawley I've rebased again and all the tests pass once more. |
.travis.yml
Outdated
@@ -22,6 +22,7 @@ addons: | |||
jdk: | |||
- openjdk6 | |||
- oraclejdk8 | |||
- oraclejdk9 | |||
|
|||
notifications: | |||
email: [email protected] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Ethan,
Something went awry with your rebase: 55 commits, 100 files changed.
I don't have your branch locally, but according to the most recent Travis build last fall the commit id of your branch's head was 6578a8e back then. Presumably, you could try doing the rebase again with:
git checkout -B 'utility-collapse-ws-#73' 6578a8e
git rebase master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give that a try. I'm on a new computer so maybe something went wrong from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked out to the commit you mentioned and rebased, then force pushed the branch on my repository to get that up there. I'm not seeing any of the tests I added failed though locally I do see
[error] Failed tests:
[error] scala.xml.CompilerErrors
[error] scala.xml.PrintEmptyElementsTest
[error] scala.xml.XMLTestJVM
[error] scala.xml.XMLTest
Though I don't really know anything about those tests or why they would be failing on my machine... I'll wait for travis to confirm though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much better. Thanks for rebasing.
I'll wait for travis to confirm though
Travis seems happy.
5827f23
to
1e3b288
Compare
acc.dropRight(1) :+ Text(l + r) | ||
} | ||
case _ => acc :+ n | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Ethan,
This pattern match is kind of hairy.
Couldn't you drop the lastOption
business, use foldRight
, and have it just be the following?
case (Text(l), Text(r) +: tt) => Text(l + r) +: tt
case (t, tt) => t +: tt
Will it have the same result? Does that improve comprehensibility, as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, foldRight
I always use foldLeft so it never comes to mind. That would work,
import scala.xml._
val node = <A><One>1</One>{Text("Hello ")}{Text("World")}{Text(" I like pie.")}<Two>2</Two><Three><Nope>no</Nope></Three></A>
node.child.foldRight(Seq.empty[Node]) {
case (Text(left), Text(right) +: accMinusLast) => Text(left + right) +: accMinusLast
case (n, acc) => n +: acc
}
// Seq[scala.xml.Node] = List(<One>1</One>, Hello World I like pie., <Two>2</Two>, <Three><Nope>no</Nope></Three>)
Looks like it would. I wasn't aware you can pattern match the last element in a list but I think that code is more elegant so I can update the PR with that and the tests will inform us if we're getting the same result (answer is probably yes)
Sorry it took so long to get back to this for a review. Unfortunately, this library has a long life cycle. When time passes after I write code, I know I have to get reacquainted and relearn it, so appreciate your effort to make your brain travel back in time. |
Remove lastOption and Option matching in favor of a more elegant foldRight solution proposed by ashawley.
Time travel successful. I updated the helper method you put a comment on and ran the tests locally (they passed), now just waiting for travis to confirm. |
} | ||
children.foldRight(Seq.empty[Node]) { | ||
case (Text(left), Text(right) +: accMinusLast) => Text(left + right) +: accMinusLast | ||
case (n, acc) => n +: acc | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware you can pattern match the last element in a list
It is pattern matching the front of the list with foldRight
("fold right starts from the left" is how I remember it). Is it important to start from the end? I presume it's not. You're just trying to merge adjacent Text
s I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops sorry, I think I wrote that when I was tired this morning and reading +:
as :+
and thinking out loud about
scala> val (last :+ list) = Seq(1,2,3,4)
last: Seq[Int] = List(1, 2, 3)
list: Int = 4
Correct that the order doesn't matter so long as the accumulated list is properly prepended/appended according to the direction it was traversed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok. Yeah, it's a lot for a person to keep in their head... between the colons, plus signs and foldings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's best to keep this new internal-only method private as you did.
I'm not especially worried about performance, since it is just an additional iterative pass over the sequence of nodes. Also, this method isn't used generally by the library, so it's even less of a priority.
Thanks for fixing this!
This fixes #73. I've got a few unit tests that are passing that verify the text is trimmed.
I have a couple questions regarding the new
combineAdjacentTextNodes
method though:foldLeft
within that function, I'm accumulating with an immutableSeq
. If we're going over a lot of nodes that seems like a lost of wasted time copying over arrays, would it be alright to use a mutableSeq
here for performance, or is there some other buffer I should use?