Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Html Reader Not Handling non-ASCII Data Correctly #2943

Merged
merged 6 commits into from
Jul 17, 2022

Conversation

oleibman
Copy link
Collaborator

Fix #2942. Code was changed by #2894 because PHP8.2 will deprecate how it was being done. See linked issue for more details. Dom loadhtml assumes ISO-8859-1 in the absence of a charset attribute or equivalent, and there is no way to override that assumption. Sigh. The suggested replacements are unsuitable in one way or another. I think this will work with minimal disruption (replace ampersand, less than, and greater than with entities representing illegal characters, then use htmlentities, then restore ampersand, less than, and greater than).

This is:

- [x] a bugfix
- [ ] a new feature
- [ ] refactoring
- [ ] additional unit tests

Checklist:

Why this change is needed?

Provide an explanation of why this change is needed, with links to any Issues (if appropriate).
If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.

oleibman added 6 commits July 15, 2022 16:01
Fix PHPOffice#2942. Code was changed by PHPOffice#2894 because PHP8.2 will deprecate how it was being done. See linked issue for more details. Dom loadhtml assumes ISO-8859-1 in the absence of a charset attribute or equivalent, and there is no way to override that assumption. Sigh. The suggested replacements are unsuitable in one way or another. I think this will work with minimal disruption (replace ampersand, less than, and greater than with entities representing illegal characters, then use htmlentities, then restore ampersand, less than, and greater than).
Use regexp to escape non-ASCII. Less kludgey, less reliant on the vagaries of the PHP maintainers.
Test non-ASCII outside of cell contents: sheet title, image alt attribute.
Forgot to change loadFromString.
Confirm escaped ampersand is handled correctly.
@oleibman oleibman merged commit 5de8298 into PHPOffice:master Jul 17, 2022
MarkBaker added a commit that referenced this pull request Jul 18, 2022
### Added

- Add Chart Axis Option textRotation [Issue #2705](#2705) [PR #2940](#2940)

### Changed

- Nothing

### Deprecated

- Nothing

### Removed

- Nothing

### Fixed

- Fix Encoding issue with Html reader (PHP 8.2 deprecation for mb_convert_encoding) [Issue #2942](#2942) [PR #2943](#2943)
- Additional Chart fixes
  - Pie chart with part separated unwantedly [Issue #2506](#2506) [PR #2928](#2928)
  - Chart styling is lost on simple load / save process [Issue #1797](#1797) [Issue #2077](#2077) [PR #2930](#2930)
  - Can't create contour chart (surface 2d) [Issue #2931](#2931) [PR #2933](#2933)
- VLOOKUP Breaks When Array Contains Null Cells [Issue #2934](#2934) [PR #2939](#2939)
@oleibman oleibman deleted the issue2942 branch August 14, 2022 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Encoding issue with Html reader after upgrading to 1.24.0
1 participant