Skip to content

Commit

Permalink
Initialise current token to the virtual start token
Browse files Browse the repository at this point in the history
Ensures that in body fragment parsing, which adds the context element to the stack before there is a real token, has a current token during the track position for that first stack insert.

Fixes #2068
  • Loading branch information
jhy committed Nov 29, 2023
1 parent c7a5655 commit daef8bb
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 2 deletions.
2 changes: 2 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
* When tracking the source position of attributes, if source attribute name was mix-cased but the parser was
lower-case normalizing attribute names, the source position for that attribute was not tracked
correctly. [2067](https://github.com/jhy/jsoup/issues/2067)
* When tracking the source position of a body fragment parse, a null pointer exception was
thrown. [2068](https://github.com/jhy/jsoup/issues/2068)

---
Older changes for versions 0.1.1 (2010-Jan-31) through 1.17.1 (2023-Nov-27) may be found in
Expand Down
4 changes: 2 additions & 2 deletions src/main/java/org/jsoup/parser/TreeBuilder.java
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ abstract class TreeBuilder {
Document doc; // current doc we are building into
ArrayList<Element> stack; // the stack of open elements
String baseUri; // current base uri, for creating new elements
Token currentToken; // currentToken is used only for error tracking.
Token currentToken; // currentToken is used for error and source position tracking. Null at start of fragment parse
ParseSettings settings;
Map<String, Tag> seenTags; // tags we've used in this parse; saves tag GC for custom tags.

Expand All @@ -48,11 +48,11 @@ void initialiseParse(Reader input, String baseUri, Parser parser) {
reader = new CharacterReader(input);
trackSourceRange = parser.isTrackPosition();
reader.trackNewlines(parser.isTrackErrors() || trackSourceRange); // when tracking errors or source ranges, enable newline tracking for better legibility
currentToken = null;
tokeniser = new Tokeniser(this);
stack = new ArrayList<>(32);
seenTags = new HashMap<>();
start = new Token.StartTag(this);
currentToken = start; // init current token to the virtual start token.
this.baseUri = baseUri;
}

Expand Down
17 changes: 17 additions & 0 deletions src/test/java/org/jsoup/parser/PositionTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -470,6 +470,23 @@ private void printRange(Node node) {
assertEquals("id:3-5=6-7; ", xmlLcPos .toString());
}

@Test void tracksFrag() {
// https://github.com/jhy/jsoup/issues/2068
String html = "<h1 id=1>One</h1>\n<h2 id=2>Two</h2><h10>Ten</h10>";
Document shellDoc = Document.createShell("");

List<Node> nodes = TrackingHtmlParser.parseFragmentInput(html, shellDoc.body(), shellDoc.baseUri());
StringBuilder track = new StringBuilder();

// nodes is the top level nodes - want to descend to check all tracked OK
nodes.forEach(node -> node.nodeStream().forEach(descend -> {
accumulatePositions(descend, track);
accumulateAttributePositions(descend, track);
}));

assertEquals("h1:0-9~12-17; id:4-6=7-8; #text:9-12; #text:17-18; h2:18-27~30-35; id:22-24=25-26; #text:27-30; h10:35-40~43-49; #text:40-43; ", track.toString());
}

static void accumulateAttributePositions(Node node, StringBuilder sb) {
if (node instanceof LeafNode) return; // leafnode pseudo attributes are not tracked
for (Attribute attribute : node.attributes()) {
Expand Down

0 comments on commit daef8bb

Please sign in to comment.