-
-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML.fragment() ignores encoding #305
Comments
using encoding set on string when parsing document fragments. closed by 602d2a5 |
Regarding 602d2a5: To improve support for Ruby <= 1.8.x, would it make sense for HTML.fragment() to accept an optional 'encoding' argument akin to the like-named HTML.parse() argument? |
Ya, that would probably be good. Added an encoding parameter here: |
Hi Aaron, Thank you for the quick response to this bug report. Regarding 9490d0e: As implemented, HTML.fragment() unconditionally ignores its 'encoding' argument if 'tags' responds to #encoding. This behavior differs dramatically from HTML.parse() which always employs 'encoding' when provided explicitly by the client. (In parse(), 'encoding' defaults to nil.) There are a few reasons why it might be best for HTML.fragment() to respect 'encoding' if provided:
|
I agree! In fact, my tests desire that functionality. This commit should take care of it: bde0aac Thanks for being patient with me! :-D |
Thanks again, though I must bother you once more. In bde0aac, you missed the default argument value encoding='UTF-8' which was added to the higher-level HTML.fragment() method in 9490d0e. This also should default to nil. See: http://github.com/tenderlove/nokogiri/commit/9490d0e3353db528d17dcb188ef58859505f00d9#L0R27 |
Hah. No problem. Thanks for catching this. Should be fixed here: a5df08d |
Given the input:
Output is:
Note the failure of to_xhtml() in the fragment case. Specifically, HTML.fragment() provides no mechanism for dealing with encoding and instead assumes unconditionally that the incoming string is UTF-8: http://github.com/tenderlove/nokogiri/blob/REL_1.4.2/lib/nokogiri/html/document_fragment.rb#L8
HTML.parse(), on the other hand, interrogates the encoding of the incoming string if encoding is not specified explicitly: http://github.com/tenderlove/nokogiri/blob/REL_1.4.2/lib/nokogiri/html/document.rb#L71
Additional information:
The text was updated successfully, but these errors were encountered: