-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XML-Reader: support rich text #4001
Comments
PR #4003 addresses several relatively simple Xml Reader issues. This one is a lot more complicated than the others, and will require more thought. |
oleibman
added a commit
to oleibman/PhpSpreadsheet
that referenced
this issue
May 1, 2024
Fix PHPOffice#4001. Thanks to @SlowFox71 who reported the problem and developed most of the solution. This PR adds Rich Text support to the XML reader. The Xml Spreadsheet stores Rich Text as Html tags, children of the ss:Data tag using a specific namespace. These can be parsed into a RichText object using existing method Helper/Html::toRichTextObject. There are 2 items which need special attention. First, for attributes like bold or italic, Excel uses the appropriate Html tag (e.g. `<B>`). However, for an attribute like color, Excel uses `<Font html:Color="#FF0000">`, with a prefix on the Color tag. PhpSpreadsheet's Html parser cannot cope with the prefix. The parser is changed to strip `html:` from attribute names for the Font tag. The example cited by the user used a `<BR />` to indicate a line break in the data. However, it appears that, at least some of the time, Excel will instead use ` ` to indicate a line break. The existing parser reduces one or more whitespace characters in the text to a single space, and so ` ` will wind up disappearing. I am not sure why the existing code does this, but I do know that I am not willing to break it. Instead, I've added an optional boolean parameter `$preserveWhiteSpace` to `toRichTextObject`. If false (default), the existing logic will be used; but if true, substitution for whitespace characters in the text will not happen.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is:
What is the expected behavior?
Accept formatting instructions, if present
What is the current behavior?
Everything is parsed as text
What are the steps to reproduce?
Read the attached file with XML reader and save as XLSX.
What features do you think are causing the issue
Does an issue affect all spreadsheet file formats? If not, which formats are affected?
XML reader only
I implemented the desired behaviour in Xml::loadIntoExisting() in a rather brute-force way like follows (untested); there might be a much better way to extract the inner content of the SimpleXMLElement:
richtext_xml.txt
The text was updated successfully, but these errors were encountered: