-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parsing an erroneous HTML tag #2259
Comments
Hi there, I can't repro your specific output, can you double check your input and output? To make sure we're looking at the same thing. Using your input: <figure>
<img src="api/files/16f0c553-4d76-4411-84b7-5049fe01bbe0">
<figcaption
<span contenteditable="false">Image 3.</span>
</figcaption>
</figure> With the HTML parser, we get: <figure>
<img src="api/files/16f0c553-4d76-4411-84b7-5049fe01bbe0">
<figcaption <span contenteditable="false">
Image 3.
</figcaption>
</figure> Note that is an element With the XML parser, we get: <figure>
<img src="api/files/16f0c553-4d76-4411-84b7-5049fe01bbe0">
<figcaption _span="" contenteditable="false">Image 3.
</figcaption>
</img></figure> The changes to the HTML parser in 1.18.2 from the changelog:
For input of HTML (not your example, but related):
We get the element
Which is weird, but is the HTML spec, and what current browsers do. #2230 changed to this behavior from the previous because on balance, too many issues were created by deviating from the spec. The optimist in me hopes that, because browsers will render it differently from the author's intent, those folks will review and fix their HTML. Now I think it probably is an issue that when using the XML parser / serializer, we output the element as |
Thank you for your concern, the problem is small, we have already fixed the editor.
we get:
Perhaps the Jsoup perceives |
Good day to all.
In jsoup v. 1.18.2 and 1.18.3, there was a problem with parsing an erroneous HTML tag, for example: <figcaption
(if it is not closed on the right '>').
Example:
Document document = Jsoup.parser(content, "", Parser.xml Parser());
The input is an HTML fragment:
In previous versions (<= 1.18.1), the parser automatically fixed this, but in the current version it also cuts off the closing tag.
In the document we get:
The text was updated successfully, but these errors were encountered: