-
-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JRuby unable to create HTML document if URL unresponsive #703
Comments
I should note that this is on 1.5.5 RC2, since previous versions had the problem in #674. |
Okay, this issue is even bigger than I thought. It appears the provided document content is just flat out ignored and the resource specified by the URL is fetched instead. If the URL is changed to be a valid one, the document will use the body from that URL as the document body. |
@nirvdrum I'm happy to look into the issue. I'll get back to you shortly. |
@nirvdrum I apologize for my sloppy change, that was my first commit to Nokogiri and I had little context about what was happening. My understanding of the code agrees with your observations, when a url is provided we set the systemId of the InputSource and we ignore the data that is passed to the parser. This is causing the parser to parse the data from the given url instead of the data that was passed to the parser. That also explains why the url has to be resolved. The fix is easy, but I would like to make sure that the behavior is the consistent with the C Nokogiri before I push the change. |
I just pushed some changes to a branch 'fix_entity_resolving' that fixes our entity resolving logic. This should fix your problem and a bunch of other bugs with entity resolving and validation. @yokolet can you eyeball the changes, I feel it's very major and would like you to look at it. After that we can merge the changes to master. I'm going to open a new issue regarding handling entity resolving and validation in general. I believe there's more to be done in that area. |
Thanks @jvshahid. I really need to get up to speed with building nokogiri so I can try to be more valuable. My previous attempts were thwarted for one reason or another, but that's not much of an excuse. |
@nirvdrum the fix is merged on master and will be packaged in a rc package soon. |
Great! Thanks for looking into it and dealing with the larger issue around entity resolving. |
Closing the issue, please reopen if the problem still exist. Thanks again for reporting this bug. |
This seems related to the closed out #674. If I use a script similar to what @flavorjones provided, but instead of providing a valid URL, I use one that doesn't resolve with DNS, I get an error (and an unhelpful message):
I get the error:
Errors: [#<Nokogiri::XML::SyntaxError: googleaoeu.com>]
I don't believe the C variant tries to resolve the URL and rather just accepts it as. In any event, it proceeds just fine. As a secondary concern, the error message here isn't much help. It just outputs the hostname and says it's a SyntaxError. Some more context there would be more helpful.
The text was updated successfully, but these errors were encountered: