-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve whitespace handling #869
Improve whitespace handling #869
Conversation
65180c3
to
47b9904
Compare
Is this a new issue uncovered by the change or introduced by the change? |
@@ -451,6 +451,12 @@ private bool MoveToNextElement() | |||
case ElementState.MiscNode: | |||
// cursor is end element, pop stack | |||
_elementStack.Pop(); | |||
if (_elementStack.Count == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it was a bug that just wasn't caught before :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the whitespace change this was handeled by
if (_xmlReader.EOF || !_xmlReader.Read())
{
_elementState = ElementState.EOF;
return false;
}
further down in the function. With the whitespace change the stack can be empty but the reader is not at the end of file as it still has whitespace to read
</w:body> | ||
</w:document> "; | ||
|
||
UTF8Encoding utf8Encoding = new UTF8Encoding(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just use Encoding.UTF8
to simplify it a little
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
reader.Close(); | ||
}); | ||
Assert.Null(exception); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of asserting null, I'd rather see checks that the reader is output things we'd expect. ie after each Assert.True(reader.Read())
add an assert for the current state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the test cases and added one test case that does not depend on the whitespace setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for following up with this. A few things to fix up.
I think it is an issue intorduced by the change. With the IgnoreWhitespace set to false new nodes of type Whitespace are returned by the underlying xml reader which wasn't the case before |
That makes sense, but was there any way to expose it before? Just trying to understand if it's only an issue that exists now due to the option to ignore whitespace. |
I am not sure. <w:document xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:p>
</w:p>
</w:body>
</w:document> <!--Comment--> |
47b9904
to
4340d1f
Compare
4340d1f
to
73aab53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for following up with this!
Added more test cases for the ignoreWhitespace=false option and fixes to handle these cases.
@twsouthwick don't know what you prefer. If you want to remove the changes done in #857 or if you would like to fix the introduced issues. I wasn't aware that changing the xml reader settings will break the internal state handling of the OpenXMLReader