-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
default encoding changed from UTF-8 to ASCII-8BIT (1.7.2 - 1.8.0) #1659
Comments
After more digging (by a coworker of mine, Jason He), it looks like the difference that (cough) makes all the difference, is the use of 1.7.2 For reference (in both MRI Ruby and JRuby):
|
Same issue also with MRI Ruby 2.4.1 Nokogiri::XML("<?xml version=\"1.0\"?><root><aliens><alien><name>Alf</name></alien></aliens></root>").to_s.encoding
=> #<Encoding:ASCII-8BIT> |
Node#serialize used to return UTF-8 if no encoding was given. However this got broken in commit 53f9b66. Since UTF-8 is the default in XML and HTML5 specs, it makes sense to use UTF-8 in serialize as well and enforce this in the tests. Fixes sparklemotion#1659
This is a workaround for sparklemotion/nokogiri#1659.
This is a workaround for sparklemotion/nokogiri#1659.
If this isn't an installation issue ...
What problems are you experiencing?
The default document encoding seems to have switched from UTF-8 to ASCII-8BIT between version 1.7.2 and 1.8.0 (using jruby 9.1.12.0).
What's the output from
nokogiri -v
?These were run on OSX, but the same problem exists when running this on CentOS 7 with the following
description
fromnokogiri -v
description: jruby 9.1.12.0 (2.3.3) 2017-06-23 a053617 OpenJDK 64-Bit Server VM 24.141-b02 on 1.7.0_141-mockbuild_2017_05_09_15_35-b00 +jit [linux-x86_64]
Nokogiri (1.7.2)
and
Nokogiri (1.8.0)
Can you provide a self-contained script that reproduces what you're seeing?
jruby -e 'gem "nokogiri", "= 1.7.2"; require "nokogiri"; puts "encoding: #{Nokogiri::HTML::Document.new.to_s.encoding}"'
Which outputs:
encoding: UTF-8
jruby -e 'gem "nokogiri", "= 1.8.0"; require "nokogiri"; puts "encoding: #{Nokogiri::HTML::Document.new.to_s.encoding}"'
Which outputs:
encoding: ASCII-8BIT
The text was updated successfully, but these errors were encountered: