Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid encoding returned in JRuby with US-ASCII #583

Closed
dnagir opened this issue Dec 13, 2011 · 3 comments
Closed

Invalid encoding returned in JRuby with US-ASCII #583

dnagir opened this issue Dec 13, 2011 · 3 comments

Comments

@dnagir
Copy link

dnagir commented Dec 13, 2011

> jruby -v
jruby 1.6.5 (ruby-1.9.2-p136) (2011-10-25 9dcd388) (Java HotSpot(TM) Client VM 1.6.0_29) [darwin-i386-java]
> gem list | grep nokogiri
nokogiri (1.5.0 java)
>irb

And then:

require 'nokogiri'

s = "<!DOCTYPE html>\n<html>\n<head>\n<title>asd</title>\n<link href=\"/assets/application.css\" media=\"screen\" rel=\"stylesheet\" type=\"text/css\" />\n<script src=\"/assets/application.js\" type=\"text/javascript\"></script>\n\n</head>\n<body>\n<div class='page'>\n<div class='head'>\n<h1>asd</h1>\n</div>\n<div class='body'>\n<div class='nav-bar'>\n<a href=\"/users/sign_in\">Sign In</a>\n<a href=\"/users/new\">Register</a>\n</div>\n<div class='main'>\n<h2>Sign Up</h2>\n<form accept-charset=\"UTF-8\" action=\"/users\" class=\"formtastic user\" id=\"user_new\" method=\"post\" novalidate=\"novalidate\"><div style=\"margin:0;padding:0;display:inline\"><input name=\"utf8\" type=\"hidden\" value=\"&#x2713;\" /></div>\n<fieldset class=\"inputs\"><ol><li class=\"string input required stringish\" id=\"user_email_input\"><label class=\" label\" for=\"user_email\">Email<abbr title=\"required\">*</abbr></label><input id=\"user_email\" name=\"user[email]\" type=\"text\" value=\"\" />\n\n</li><li class=\"password input required stringish\" id=\"user_password_input\"><label class=\" label\" for=\"user_password\">Password<abbr title=\"required\">*</abbr></label><input id=\"user_password\" maxlength=\"128\" name=\"user[password]\" type=\"password\" />\n\n</li><li class=\"password input optional stringish\" id=\"user_password_confirmation_input\"><label class=\" label\" for=\"user_password_confirmation\">Password confirmation</label><input id=\"user_password_confirmation\" name=\"user[password_confirmation]\" type=\"password\" />\n\n</li></ol></fieldset>\n<fieldset class=\"buttons\"><ol><li class=\"commit button\"><input class=\"create\" name=\"commit\" type=\"submit\" value=\"Sign Up\" /></li>\n</ol></fieldset></form>\n  <a href=\"/users/sign_in\">Sign in</a><br />\n\n\n  <a href=\"/users/password/new\">Forgot your password?</a><br />\n\n\n\n\n\n</div>\n</div>\n<div class='foot'>\n<a href=\"/\">Home</a>\n</div>\n</div>\n</body>\n</html>\n"

s.encoding # UTF-8
h = Nokogiri::HTML(s)
h.to_s.valid_encoding? # true


asci = s.encode('US-ASCII')
asci.valid_encoding? # true

h = Nokogiri::HTML(a)
h.to_s.encoding # US-ASCII
h.to_s.valid_encoding? # false !!

Which means that the input string with valid encoding, returns a string with invalid encoding.

@yokolet
Copy link
Member

yokolet commented Dec 13, 2011

Thanks for testing on 1.9 mode.

I do know JRuby 1.9 mode gives me a lot of encoding related errors. I'll work on that.

@dnagir
Copy link
Author

dnagir commented Dec 14, 2011

Thanks a lot for that. Hope it won't take too much time to fix that :)

@yokolet
Copy link
Member

yokolet commented Jun 13, 2012

This bug has been fixed, perhaps, while fixing Nokogiri test errors/failures on 1.9 mode.

Today, I got:

true
true
#<Encoding:US-ASCII>
true

which is exactly the same result as libxml version.

So, I'm going to close this issue. If you still have the problem, feel free to reopen the issue.

@yokolet yokolet closed this as completed Jun 13, 2012
yokolet added a commit that referenced this issue Jun 13, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants