Update dependency org.jsoup:jsoup to v1.19.1 #61
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
1.18.1
->1.19.1
Release Notes
jhy/jsoup (org.jsoup:jsoup)
v1.19.1
Changes
Jsoup.connect()
, when running on Java 11+, via the Java HttpClientimplementation. #2257.
System.setProperty("jsoup.useHttpClient", "true");
to enable making requests via the HttpClient instead ,which will enable http/2 support, if available. This will become the default in a later version of jsoup, so now is
a good time to validate it.
that as a Multi-Release
JAR.
HttpClient
impl is not available in your JRE, requests will continue to be made viaHttpURLConnection
(inhttp/1.1
mode).developers need to enable core library desugaring. The minimum Java version remains Java 8.
#2173
org.jsoup.UncheckedIOException
(replace withjava.io.UncheckedIOException
);moved previously deprecated method
Element Element#forEach(Consumer)
tovoid Element#forEach(Consumer())
. #2246Document#updateMetaCharsetElement(bool)
and#Document#updateMetaCharsetElement()
, as thesetting had no effect. When
Document#charset(Charset)
is called, the document's meta charset or XML encodinginstruction is always set. #2247
Improvements
Safelist
that preserves relative links, theisValid()
method will now consider theselinks valid. Additionally, the enforced attribute
rel=nofollow
will only be added to external links when configuredin the safelist. #2245
Element#selectStream(String query)
andElement#selectStream(Evaluator)
methods, that return aStream
ofmatching elements. Elements are evaluated and returned as they are found, and the stream can be
terminated early. #2092
Element
objects now implementIterable
, enabling them to be used in enhanced for loops.Reader
viaParser#parseFragmentInput(Reader, Element, String)
. #1177jsoup-examples.jar
. #1702#id .class
(and other similar descendant queries) by around 4.6x, by betterbalancing the Ancestor evaluator's cost function in the query
planner. #2254
<isindex>
tags, which would autovivify aform
element with labels. This is nolonger in the spec.
Elements.selectFirst(String cssQuery)
andElements.expectFirst(String cssQuery)
, to select the firstmatching element from an
Elements
list. #2263through the HTML parser's bogus comment handler. Serialization for non-doctype declarations no longer end with a
spurious
!
. #2275<
are normalized to_
to ensure validXML. For example,
<foo<bar>
becomes<foo_bar>
, as XML does not allow<
in element names, but HTML5does. #2276
Bug Fixes
;
in an attribute name, it could not be converted to a W3C DOM element, and so subsequent XPathqueries could miss that element. Now, the attribute name is more completely
normalized. #2244
"name". #2241
Connection
, skip cookies that have no name, rather than throwing a validationexception. #2242
java.lang.NoSuchMethodError: java.nio.ByteBuffer.flip()Ljava/nio/ByteBuffer;
could be thrown when calling
Response#body()
after parsing from a URL and the buffer size wasexceeded. #2250
null
InputStream inputs toJsoup.parse(InputStream stream, ...)
, by returningan empty
Document
. #2252template
tag containing anli
within an openli
would be parsed incorrectly, as it was not recognized as a"special" tag (which have additional processing rules). Also, added the SVG and MathML namespace tags to the list of
special tags. #2258
template
tag containing abutton
within an openbutton
would be parsed incorrectly, as the "in button scope"check was not aware of the
template
element. Corrected other instances including MathML and SVG elements,also. #2271
:nth-child
selector with a negative digit-less step, such as:nth-child(-n+2)
, would be parsed incorrectly as apositive step, and so would not match as expected. #1147
doc.charset(charset)
on an empty XML document would throw anIndexOutOfBoundsException
. #2266StructuralEvaluator
(e.g., a selector ancestor chain likeA B C
) byensuring cache reset calls cascade to inner members. #2277
doc.clone().append(html)
were not supported. When a document was cloned, itsParser
was not cloned but was a shallow copy of the original parser. #2281v1.18.3
Bug Fixes
-
,.
, or digits were incorrectly marked as invalid andremoved. 2235
v1.18.2
Improvements
down between -6% and -89%, and throughput improved up to +143% for small inputs. Most inputs sizes will see
throughput increases of ~ 20%. These performance improvements come through recycling the backing
byte[]
andchar[]
arrays used to read and parse the input. 2186
html()
andEntities.escape()
when the input contains UTF characters in a supplementary plane, byaround 49%. 2183
FormElement.elements()
now reflect changes made to the DOM,subsequently to the original parse. 2140
TreeBuilder
, theonNodeInserted()
andonNodeClosed()
events are now also fired for the outermost /root
Document
node. This enables source position tracking on the Document node (which was previously unset). Andit also enables the node traversor to see the outer Document node. 2182
Elements#set()
. 2212Bug Fixes
Element.cssSelector()
would fail if the element's class contained a*
character. 2169
untracked. 2175
html
, it should be parsed in QuirksMode. 2197
div:has(span + a)
, thehas()
component was not working correctly, as the inner combiningquery caused the evaluator to match those against the outer's siblings, not
children. 2187
:has()
components in a nested:has()
might incorrectlyexecute. 2131
Connection.Response#cookies()
will provide the last one set. Generally it is better to usethe Jsoup.newSession method to maintain a cookie jar, as that
applies appropriate path selection on cookies when making requests. 1831
attribute). 2207
created (
html
orbody
). 2204<
as part of a tag name, instead of emitting it as acharacter node. 2230
<
as the start of an attribute name, vs creating a new element. The previous behavior wasintended to parse closer to what we anticipated the author's intent to be, but that does not align to the spec or to
how browsers behave. 1483
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.