-
-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] Nokogiri::XML::Reader.from_io.each
misidentifies character encoding?
#2882
Comments
@koshigoe Thank you for reporting this! This error message is being generated by libxml2. I have reproduced the issue and will investigate. |
Git bisect shows that this is the commit that introduced the new behavior: https://gitlab.gnome.org/GNOME/libxml2/-/commit/3582b07bd24d438be7dd08ab57e3f9e635373e32
|
I've narrowed this down to specific changes in libxml2 chunk parsing that may be a bug. I'll open an issue upstream and link to it here. |
Neat! This was already reported upstream at https://gitlab.gnome.org/GNOME/libxml2/-/issues/542 and was fixed about an hour ago in https://gitlab.gnome.org/GNOME/libxml2/-/commit/e0f3016f71297314502a3620a301d7e064cbb612 I expect it'll be fixed shortly in a libxml2 release. I'll leave this open until that happens and I can ship a new nokogiri release. |
libxml2 v2.11.4 is out with the fix: https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.11.4 I'll try to get a release out in the next day. |
Nokogiri v1.15.1 is out with this upstream fix. https://github.com/sparklemotion/nokogiri/releases/tag/v1.15.1 |
Please describe the bug
Nokogiri::XML::Reader.from_io.each
cause exceptionNokogiri::XML::SyntaxError
when XML node contain long non-ascii characters.The XML node contain only valid UTF-8 characters, but cause error
FATAL: Input is not proper UTF-8, indicate encoding !
.Help us reproduce what you're seeing
Expected behavior
Do not raise error.
Environment
Additional context
The text was updated successfully, but these errors were encountered: