Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve whitespace handling #869

Merged
merged 3 commits into from
Jan 27, 2021

Conversation

lklein53
Copy link
Contributor

Added more test cases for the ignoreWhitespace=false option and fixes to handle these cases.

@twsouthwick don't know what you prefer. If you want to remove the changes done in #857 or if you would like to fix the introduced issues. I wasn't aware that changing the xml reader settings will break the internal state handling of the OpenXMLReader

@lklein53 lklein53 force-pushed the improve-whitespace-handling branch from 65180c3 to 47b9904 Compare January 27, 2021 16:59
@twsouthwick
Copy link
Member

Is this a new issue uncovered by the change or introduced by the change?

@@ -451,6 +451,12 @@ private bool MoveToNextElement()
case ElementState.MiscNode:
// cursor is end element, pop stack
_elementStack.Pop();
if (_elementStack.Count == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it was a bug that just wasn't caught before :)

Copy link
Contributor Author

@lklein53 lklein53 Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the whitespace change this was handeled by

if (_xmlReader.EOF || !_xmlReader.Read())
{
  _elementState = ElementState.EOF;
  return false;
}

further down in the function. With the whitespace change the stack can be empty but the reader is not at the end of file as it still has whitespace to read

</w:body>
</w:document> ";

UTF8Encoding utf8Encoding = new UTF8Encoding();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use Encoding.UTF8 to simplify it a little

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


reader.Close();
});
Assert.Null(exception);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of asserting null, I'd rather see checks that the reader is output things we'd expect. ie after each Assert.True(reader.Read()) add an assert for the current state

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the test cases and added one test case that does not depend on the whitespace setting

Copy link
Member

@twsouthwick twsouthwick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up with this. A few things to fix up.

@lklein53
Copy link
Contributor Author

Is this a new issue uncovered by the change or introduced by the change?

I think it is an issue intorduced by the change. With the IgnoreWhitespace set to false new nodes of type Whitespace are returned by the underlying xml reader which wasn't the case before

@twsouthwick
Copy link
Member

That makes sense, but was there any way to expose it before? Just trying to understand if it's only an issue that exists now due to the option to ignore whitespace.

@lklein53
Copy link
Contributor Author

lklein53 commented Jan 27, 2021

That makes sense, but was there any way to expose it before? Just trying to understand if it's only an issue that exists now due to the option to ignore whitespace.

I am not sure.
I think the following example would have caused the same error without the whitespace change:

 <w:document xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:p>
    </w:p>
  </w:body>
</w:document> <!--Comment-->

@lklein53 lklein53 force-pushed the improve-whitespace-handling branch from 47b9904 to 4340d1f Compare January 27, 2021 18:32
@lklein53 lklein53 force-pushed the improve-whitespace-handling branch from 4340d1f to 73aab53 Compare January 27, 2021 18:33
Copy link
Member

@twsouthwick twsouthwick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for following up with this!

@twsouthwick twsouthwick merged commit 3427fc9 into dotnet:master Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants